Cohort analysis with R – “layer-cake graph”

Cohort Analysis is one of the most powerful and demanded techniques available to marketers for assessing long-term trends in customer retention and calculating life-time value.

If you studied custora’s university, you could be interested in amazing “layer-cake graph” they propose for Cohort Analysis.

cohort_graph_B4Custora says: “The distinctive “layer-cake graph” produced by looking at cohorts in calendar time can provide powerful insights into the health of your business. At a given point in time, what percentage of your revenue or profit came from new vs. repeat customers? Tracking how that ratio has changed over time can give you insight into whether you’re fueling top-line growth solely through new customer acquisition – or whether you’re continuing to nurture those relationships with your existing customers over time.”

Usually, we focus on calculating life-time value or comparing cohorts, but I was really impressed with this useful analytical approach and tried to do the same chart in R. Now, we can see what I’ve got.

After we processed a great deal of data it should be on the following structure. There are Cohort01, Cohort02, etc. – cohort’s name due to customer signup date or first purchase date and M1, M2, etc. – period of cohort’s life-time (first month, second month, etc.):


For example, Cohort-1 signed up in January (M1) and brought us $270,000 during the first month (M1). Cohort-5 signed up in May (M5) and brought us $31,000 in September (M9).

Ok. Suppose you’ve done data process and got cohort.sum data frame as a result and it looks like the table above. You can reproduce this data frame with the following code:

cohort.sum <- data.frame(cohort=c('Cohort01', 'Cohort02', 'Cohort03', 'Cohort04', 'Cohort05', 'Cohort06', 'Cohort07', 'Cohort08', 'Cohort09', 'Cohort10', 'Cohort11', 'Cohort12'),

Let’s create the “layer-cake” chart with the following R code:

#connect necessary libraries
#we need to melt data
cohort.chart <- melt(cohort.sum, id.vars = "cohort")
colnames(cohort.chart) <- c('cohort', 'month', 'revenue')

#define palette
blues <- colorRampPalette(c('lightblue', 'darkblue'))

#plot data
p <- ggplot(cohort.chart, aes(x=month, y=revenue, group=cohort))
p + geom_area(aes(fill = cohort)) +
 scale_fill_manual(values = blues(nrow(cohort.sum))) +
 ggtitle('Total revenue by Cohort')

And we will take such amazing chart:


You can see that monthly revenue is highly dependent on new customers who do their first purchases. But during the time company accumulates several layers of incomes from existing (loyal) customers and reduced dependence. Further, it seems like there was some activity (e.g. promo) in the eighth month (M8) and a few cohorts responded. Really helpful chart.