Customer segmentation – LifeCycle Grids with R

I want to share a very powerful approach for customer segmentation in this post. It is based on customer’s lifecycle, specifically on frequency and recency of purchases. The idea of using these metrics comes from the RFM analysis. Recency and frequency are very important behavior metrics. We are interested in frequent and recent purchases, because frequency affects client’s lifetime value and recency affects retention. Therefore, these metrics can help us to understand the current phase of the client’s lifecycle. When we know each client’s phase, we can split customer base into groups (segments) in order to:

  • understand the state of affairs,
  • effectively using marketing budget through accurate targeting,
  • use different offers for every group,
  • effectively using email marketing,
  • increase customers’ life-time and value, finally.

For this, we will use a matrix called LifeCycle Grids. We will study how to process initial data (transaction) to the matrix, how to visualize it, and how to do some in-depth analysis. We will do all these steps with the R programming language.

Let’s create a data sample with the following code:

click to expand R code

# loading libraries
library(dplyr)
library(reshape2)
library(ggplot2)

# creating data sample
set.seed(10)
data <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),
product=sample(c('NULL','a','b','c'), 5000, replace=TRUE,
prob=c(0.15, 0.65, 0.3, 0.15)))
order <- data.frame(orderId=c(1:1000),
clientId=sample(c(1:300), 1000, replace=TRUE))
gender <- data.frame(clientId=c(1:300),
gender=sample(c('male', 'female'), 300, replace=TRUE, prob=c(0.40, 0.60)))
date <- data.frame(orderId=c(1:1000),
orderdate=sample((1:100), 1000, replace=TRUE))
orders <- merge(data, order, by='orderId')
orders <- merge(orders, gender, by='clientId')
orders <- merge(orders, date, by='orderId')
orders <- orders[orders$product!='NULL', ]
orders$orderdate <- as.Date(orders$orderdate, origin="2012-01-01")
rm(data, date, order, gender)

The head of our data sample looks like:

  orderId clientId product gender orderdate
1   1       254       a    female 2012-04-03
2   1       254       b    female 2012-04-03
3   1       254       c    female 2012-04-03
4   1       254       b    female 2012-04-03
5   2       151       a    female 2012-01-31
6   2       151       b    female 2012-01-31

You can see that there is a gender of customer in the table. We will use it as an example of some in-depth analysis later. I recommend you to use any additional features, that you have, for seeking insights. It can be source of client, channel, campaign, geo data and so on.

A few words about LifeCycle Grids. It is a matrix with 2 dimensions:

  • frequency, which is expressed in number of purchased items or placed orders,
  • recency, which is expressed in days or months since the last purchase.

The first step is to think about suitable grids for your business. It is impossible to work with infinite segments. Therefore, we need to define some boundaries of frequency and recency, which should help us to split customers into homogeneous groups (segments). The analysis of the distribution of the frequency and the recency in our data set combined with the knowledge of business aspects can help us to find suitable boundaries.

Therefore, we need to calculate two values:

  • number of orders that were placed by each client (or in some cases, it can be the number of items),
  • time lapse from the last purchase to the reporting date.

Then, plot the distribution with the following code:

click to expand R code

# reporting date
today <- as.Date('2012-04-11', format='%Y-%m-%d')

# processing data
orders <- dcast(orders, orderId + clientId + gender + orderdate ~ product, value.var='product', fun.aggregate=length)

orders <- orders %>%
 group_by(clientId) %>%
 mutate(frequency=n(),
 recency=as.numeric(today-orderdate)) %>%
 filter(orderdate==max(orderdate)) %>%
 filter(orderId==max(orderId)) %>%
 ungroup()

# exploratory analysis
ggplot(orders, aes(x=frequency)) +
 theme_bw() +
 scale_x_continuous(breaks=c(1:10)) +
 geom_bar(alpha=0.6, binwidth=1) +
 ggtitle("Dustribution by frequency")

ggplot(orders, aes(x=recency)) +
 theme_bw() +
 geom_bar(alpha=0.6, binwidth=1) +
 ggtitle("Dustribution by recency")

lcg_dist_1lcg_dist_2

Early behavior is most important, so finer detail is good there. Usually, there is a significant difference between customers who bought 1 time and those who bought 3 times, but is there any difference between customers who bought 50 times and other who bought 53 times? That is why it makes sense to set boundaries from lower values to higher gaps. We will use the following boundaries:

  • for frequency: 1, 2, 3, 4, 5, >5,
  • for recency: 0-6, 7-13, 14-19,  20-45, 46-80, >80

Next, we need to add segments to each client based on the boundaries. Also, we will create new variable ‘cart’, which includes products from the last cart, for doing in-depth analysis.

click to expand R code

orders.segm <- orders %>%
 mutate(segm.freq=ifelse(between(frequency, 1, 1), '1',
 ifelse(between(frequency, 2, 2), '2',
 ifelse(between(frequency, 3, 3), '3',
 ifelse(between(frequency, 4, 4), '4',
 ifelse(between(frequency, 5, 5), '5', '>5')))))) %>%
 mutate(segm.rec=ifelse(between(recency, 0, 6), '0-6 days',
 ifelse(between(recency, 7, 13), '7-13 days',
 ifelse(between(recency, 14, 19), '14-19 days',
 ifelse(between(recency, 20, 45), '20-45 days',
 ifelse(between(recency, 46, 80), '46-80 days', '>80 days')))))) %>%
 # creating last cart feature
 mutate(cart=paste(ifelse(a!=0, 'a', ''),
 ifelse(b!=0, 'b', ''),
 ifelse(c!=0, 'c', ''), sep='')) %>%
 arrange(clientId)

# defining order of boundaries
orders.segm$segm.freq <- factor(orders.segm$segm.freq, levels=c('>5', '5', '4', '3', '2', '1'))
orders.segm$segm.rec <- factor(orders.segm$segm.rec, levels=c('>80 days', '46-80 days', '20-45 days', '14-19 days', '7-13 days', '0-6 days'))

We have everything need to create LifeCycle Grids. We need to combine clients into segments with the following code:

click to expand R code

lcg <- orders.segm %>%
 group_by(segm.rec, segm.freq) %>%
 summarise(quantity=n()) %>%
 mutate(client='client') %>%
 ungroup()

The classic matrix can be created with the following code:

click to expand R code

lcg.matrix <- dcast(lcg, segm.freq ~ segm.rec, value.var='quantity', fun.aggregate=sum)

lcg_matrix

However, I suppose a good visualization is obtained through the following code:

click to expand R code

ggplot(lcg, aes(x=client, y=quantity, fill=quantity)) +
 theme_bw() +
 theme(panel.grid = element_blank())+
 geom_bar(stat='identity', alpha=0.6) +
 geom_text(aes(y=max(quantity)/2, label=quantity), size=4) +
 facet_grid(segm.freq ~ segm.rec) +
 ggtitle("LifeCycle Grids")

lcg_1_1

I’ve added colored borders for a better understanding of how to work with this matrix. We have four quadrants:

  • yellow – here are our best customers, who have placed quite a few orders and made their last purchase recently. They have higher value and higher potential to buy again. We have to take care of them.
  • green – here are our new clients, who placed several orders (1-3) recently. Although they have lower value, they have potential to move into the yellow zone. Therefore, we have to help them move into the right quadrant (yellow).
  • red – here are our former best customers. We need to understand why they are former and, maybe, try to reactivate them.
  • blue – here are our onetime-buyers.

Hint: it is possible to highlight customer groups with different colors like the following examples:


click to expand R code

lcg.adv <- lcg %>%
 mutate(rec.type = ifelse(segm.rec %in% c(">80 days", "46-80 days", "20-45 days"), "not recent", "recent"),
 freq.type = ifelse(segm.freq %in% c(">5", "5", "4"), "frequent", "infrequent"),
 customer.type = interaction(rec.type, freq.type))

ggplot(lcg.adv, aes(x=client, y=quantity, fill=customer.type)) +
 theme_bw() +
 theme(panel.grid = element_blank()) +
 facet_grid(segm.freq ~ segm.rec) +
 geom_bar(stat='identity', alpha=0.6) +
 geom_text(aes(y=max(quantity)/2, label=quantity), size=4) +
 ggtitle("LifeCycle Grids")

# with background
ggplot(lcg.adv, aes(x=client, y=quantity, fill=customer.type)) +
 theme_bw() +
 theme(panel.grid = element_blank()) +
 geom_rect(aes(fill = customer.type), xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf, alpha = 0.1) +
 facet_grid(segm.freq ~ segm.rec) +
 geom_bar(stat='identity', alpha=0.7) +
 geom_text(aes(y=max(quantity)/2, label=quantity), size=4) +
 ggtitle("LifeCycle Grids")

Does it make sense to make the same offer to all of these customers? Certainly, it doesn’t! It makes sense to create different approaches not only for each quadrant, but for border cells as well.

What I really like about this model of segmentation is that it is stable and alive simultaneously. It is alive in terms of customers flow. Every day, with or without purchases, it will provide customers flow from one cell to another. And it is stable in terms of working with segments. It allows to work with customers who are on the same lifecycle phase. That means you can create suitable campaigns / offers / emails for each or several close cells and use them constantly.

Ok, it’s time to study how we can do some in-depth analysis. R allows us to create subsegments and visualize them effectively. It can be helpful to distribute each cell via some features. For instance, there can distribute customers by gender. For the other example, where our products have different lifecycles, it can be helpful to analyze which product/s was/were in the last cart or we can combine these features. Let’s do this with the following code:

click to expand R code

lcg.sub <- orders.segm %>%
 group_by(gender, cart, segm.rec, segm.freq) %>%
 summarise(quantity=n()) %>%
 mutate(client='client') %>%
 ungroup()

ggplot(lcg.sub, aes(x=client, y=quantity, fill=gender)) +
 theme_bw() +
 scale_fill_brewer(palette='Set1') +
 theme(panel.grid = element_blank())+
 geom_bar(stat='identity', position='fill' , alpha=0.6) +
 facet_grid(segm.freq ~ segm.rec) +
 ggtitle("LifeCycle Grids by gender (propotion)")

lcg_2or even:

click to expand R code

ggplot(lcg.sub, aes(x=gender, y=quantity, fill=cart)) +
 theme_bw() +
 scale_fill_brewer(palette='Set1') +
 theme(panel.grid = element_blank())+
 geom_bar(stat='identity', position='fill' , alpha=0.6) +
 facet_grid(segm.freq ~ segm.rec) +
 ggtitle("LifeCycle Grids by gender and last cart (propotion)")

lcg_3

Therefore, there is a lot of space for creativity. If you want to know much more about LifeCycle Grids and strategies for working with quadrants, I highly recommend that you read Jim Novo’s works, e.g. this blogpost.

Thank you for reading this!

  • Pingback: Distilled News | Data Analytics & R()

  • Pingback: Customer segmentation – LifeCycle Grids, CLV and CAC with R | Analyze Core()

  • Pingback: Cohort analysis and LifeCycle Grids mixed segmentation with R | Analyze Core()

  • max

    Hi and thank you for your post , i have used the same example like you but i get the graph without colored borders?
    why?
    thanks

    • AnalyzeCore

      I’ve added colored borders manually in order to give a description how to read the grids. Thank you!

    • max

      ok thanks , but How to know for example if one client change from group to another group ?

    • AnalyzeCore

      There are several ways. You can find some ideas in this post https://analyzecore.com/2015/04/01/cohort-analysis-and-lifecycle-grids-mixed-segmentation-with-r/
      Further, I’m going to publish a separate post about.

    • max

      thank you , i’will try to undestand

      Great post! My only suggestions would be:

      to work with à real data in order to show the importance of your method

      Thanks

    • AnalyzeCore

      Thank you for the advice! But real data are confidential in most of the cases.

    • max

      Hi,

      for your next post,
      Haw can i add varaible CAC and grossmarg?

      1.CAC:cost aquisation customer : for instance you send mail for your customer so the cost is always zero

      2.grossmarg:that means how much I earn per product

      i’am very gratful if you can explain to me How can i get these variables knowing that i work with a real dataset

      Thanks!

  • Pingback: Sales funnel visualization with R | AnalyzeCore()

  • Pingback: Sales Funnel visualization with R()

  • Pingback: Measuring business health with Delta LifeCycle Grids and R | AnalyzeCore()

  • Scott Horvath

    Thanks for the post. How do you know whether the ‘best customers’ are not just a ‘new customer’ who has bought five items?

    • AnalyzeCore

      Thank you for the question, Scott! There are several prospectives on the surface:

      1) Using orders/purchases instead of items. You can use orders instead of items as in my example. In this case, it doesn’t matter how many items were bought but the fact of purchase matters. The more purchases the better client. Furthermore, customers could place several orders in dozen minutes in online business. Therefore, if you think this was the one contact with your company you can calculate more than 1 order per day (e.g. 3 orders) as the one purchase.

      2) Using items with suitable boundaries. If a customer purchases several items frequently you need to define boundaries in suitable for your business way. They could be 1-100 items, 101-1000 and so on. Actually, LifeCycle Grids approach starts with analyzing customer’s behavior and identifying recency and frequency boundaries.

      3) Combining with Cohort analysis. This is the most accurate way I think. You can combine LCG with Cohort analysis (by first purchase date). It means you would see both the best and the new customers who did 5 purchases in the same cell, but in the different cohorts. Read more about in my following post https://analyzecore.com/2015/04/01/cohort-analysis-and-lifecycle-grids-mixed-segmentation-with-r/

      Hope this helps.

    • Scott Horvath

      Many thanks for your response.

      I just want to confirm that my interpretation of the chart [0] is correct.

      I read that 4 unique customers (best customers) made more than five ‘orders’ with their last order occurring 14-19 days ago.

      I also read that 11 unique customers (new customers) made 3 orders with their last order occurring 14-19 days ago. If we consider one of these 11 customers that have been flagged as ‘new’, how do you know that the first (or second) of their three orders wasn’t made 46-80 days ago? If they had, then I believe they would be incorrectly flagged as ‘new’.

      Very eager to hear your thoughts!

      Scott

      [0] i0.wp.com/analyzecore.com/wp-content/uploads/2015/02/lcg_1_1.png

    • AnalyzeCore

      You are right. The brand new customers are only in the cell 1 purchase with their last order occurring 0-6 days ago. The colored blocks are indicative and were used for explanation of the approach and you can use another definition for cells or for blocks. For instance, if you read the post I mentioned you would find another example:
      – new customer (1-2 purchases and 0-60 days recency),
      – under risk new customer (1-2 purchases and 61-180 days recency),
      – 1x buyer (1-2 purchases and >180 days recency),
      – engaged customer (3-4 purchases and 0-60 days recency),
      – under risk engaged customer (3-4 purchases and 61-180 days recency),
      – former engaged customer (3-4 purchases and >180 days recency),
      – best customer (>4 purchases and 0-60 days recency),
      – under risk best customer (>4 purchases and 61-180 days recency),
      – former best customer (>4 purchases and >180 days recency).
      Therefore, it is up to you how to combine cells and name them.

      But I want to pay your attention that LCG approach is about lifecycle phase that we try to identify through purchases and their recency. That is why the customer, who made 3 orders with the last order occurring 14-19 days ago can be ‘new’ for you even the first order was 80 days ago, because in our relationships were 3 orders only and this wasn’t enough to say that customer is loyal and we can’t offer something special for. This model is simple and flexible but needs some work for adapting to exact business model.

    • Scott Horvath

      Really appreciate the response! I think I will go and try and make something like this. I will reference this post, too!

    • AnalyzeCore

      Thank you, Scott!

  • Pingback: Customer segmentation using lifecycle grids using R | Jixta()

  • Jason Aizkalns

    Great stuff. Some food for thought, if you add two additional factors/labels for recency/frequency, you can get away with not adding the manual color boundaries. For example:


    lcg$segm.rec.label 80 days", "46-80 days", "20-45 days"), "not recent", "recent")
    lcg$segm.freq.label 5", "5", "4"), "frequent", "infrequent")

    # Take a look at the this...
    # interaction(lcg$segm.rec.label, lcg$segm.freq.label)

    ggplot(lcg, aes(x=client, y=quantity, fill=interaction(segm.rec.label, segm.freq.label))) +
    theme_bw() +
    theme(panel.grid = element_blank())+
    geom_bar(stat='identity', alpha=0.6) +
    geom_text(aes(y=max(quantity)/2, label=quantity), size=4) +
    facet_grid(segm.freq ~ segm.rec) +
    ggtitle("LifeCycle Grids")

    This generates the following graph:

    • AnalyzeCore

      Jason, thanks for the comment! It is really useful!

    • Jason Aizkalns

      No problem – great stuff — keep it up. Another alternative would be to use `geom_rect` and fill the background (see this post: http://stackoverflow.com/q/9847559/2572423). You can probably use that approach and still color by other factors (such as Gender and Product in your examples).

    • AnalyzeCore

      Sure! Will think how to use these idea! Thanks!

  • Pingback: Sales Funnel visualization with R – The Future of Market Analysis()

  • Mohammad Abdullah

    Good Morning All , how can we get the clients name on each group of segmentation to start targeting them in a campaign e.g.: Promotions , discount ,….etc.

  • Mohammad Abdullah

    I have a question
    What shall we do in the code if we have at least
    100 item ?
    This is in the line code where (order.segm)?

  • Pingback: Marketing Multi-Channel Attribution model with R (part 2: practical issues) - AnalyzeCore - data is beautiful, data is a story()

  • Pingback: Marketing Multi-Channel Attribution model with R (part 2: practical issues) – Cloud Data Architect()