Shopping cart analysis with R – Multi-layer pie chart

This post was updated on 12/05/2015.

In this post, we will review a very interesting type of visualization – the Multi-layer Pie Chart – and use it for one of the marketing analytics tasks – the shopping carts analysis. We will go from the initial data processing to the shopping carts analysis visualization. I will share the R code in that you shouldn’t write code for every layer of chart. You can also find an example about how to create a Multi-layer Pie Chart here.

Ok, let’s suppose we have a list of first orders/carts that were bought by our clients. Each order consists one or several products (or category of products). Our task is to visualize a relationship between products and see the share of orders that includes each product or combination of products. The Multi-layer Pie Chart can help us to draw each product and its intersections with others.

After we loaded the necessary libraries with the following code:

# loading libraries
library(dplyr)
library(reshape2)
library(plotrix)

we will simulate an example of the data set. Suppose we sell 4 products (or product categories): a, b, c and d and each product can be sold with a different probability. Also, a client can purchase any combinations of products, e.g. “a” or “a,b,a,d” and so on. Let’s do this with the following code:

# creating an example of orders
set.seed(15)
df <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),
product=sample(c('NULL','a','b','c','d'), 5000, replace=TRUE,
prob=c(0.15, 0.65, 0.3, 0.15, 0.1)))
df <- df[df$product!='NULL', ]

After this, we will process data for creating data frame for analysis. Specifically, we will:

  • remove the duplicates. For example, if the order consists of more than one similar product (“a,b,a,d”), we want to exclude the effect of quantity,
  • combine products to the new feature ‘cart’ that will include all unique products in the cart,
  • calculate number of carts (‘num’ column).
# processing initial data
# we need to be sure that product's names are unique
df$product <- paste0("#", df$product, "#")

prod.matrix <- df %>%
 # removing duplicated products from each order
 group_by(orderId, product) %>%
 arrange(product) %>%
 unique() %>%
 # combining products to cart and calculating number of products
 group_by(orderId) %>%
 summarise(cart=paste(product,collapse=";"),
 prod.num=n()) %>%
 # calculating number of carts
 group_by(cart, prod.num) %>%
 summarise(num=n()) %>%
 ungroup()

Let’s take a look on the resulting data frame with the head(prod.matrix) function:

  cart            prod.num num
1  #a#              1     123
2  #a#;#b#          2     241
3  #a#;#b#;#c#      3     168
4  #a#;#b#;#c#;#d#  4      71
5  #a#;#b#;#d#      3     125
6  #a#;#c#          2     105

From this point we start working on our Multi-layer Pie Chart. My idea is to place orders that include one product into the core of the chart. Therefore, we’ve calculated the total number of products in each combination (‘prod.num’ value) and will split data frame for two data frames: the first one (one.prod) that will include carts with one product and the second one (sev.prod) with more than one product.

# calculating total number of orders/carts
tot <- sum(prod.matrix$num)

# spliting orders for sets with 1 product and more than 1 product
one.prod <- prod.matrix %>% filter(prod.num == 1)

sev.prod <- prod.matrix %>%
 filter(prod.num > 1) %>%
 arrange(desc(prod.num))

Therefore, the data is ready for plotting. We will define parameters for the chart with the following code:

# defining parameters for pie chart
iniR <- 0.2 # initial radius
cols <- c("#ffffff", "#fec44f", "#fc9272", "#a1d99b", "#fee0d2",
 "#2ca25f", "#8856a7", "#43a2ca", "#fdbb84", "#e34a33",
 "#a6bddb", "#dd1c77", "#ffeda0", "#756bb1")
prod <- df %>%
 select(product) %>%
 arrange(product) %>%
 unique()
prod <- c('NO', c(prod$product))
colors <- as.list(setNames(cols[ c(1:(length(prod)))], prod))

Note: we’ve defined the color palette with fourteen colors including white color for spaces. This means if you have more than thirteen unique products in the data set, you need to add extra colors. Finally, we will plot the Multi-layer Pie Chart with the following code:

# 0 circle: blank
pie(1, radius=iniR, init.angle=90, col=c('white'), border = NA, labels='')

# drawing circles from last to 2nd
for (i in length(prod):2) {
 p <- grep(prod[i], sev.prod$cart)
 col <- rep('NO', times=nrow(sev.prod))
 col[p] <- prod[i]
 floating.pie(0,0,c(sev.prod$num, tot-sum(sev.prod$num)), radius=(1+i)*iniR, startpos=pi/2, col=as.character(colors [ c(col, 'NO')]), border="#44aaff")
}

# 1 circle: orders with 1 product
floating.pie(0,0,c(tot-sum(one.prod$num),one.prod$num), radius=2*iniR, startpos=pi/2, col=as.character(colors [ c('NO',one.prod$cart)]), border="#44aaff")

# legend
legend(1.5, 2*iniR, gsub("_"," ",names(colors)[-1]), col=as.character(colors [-1]), pch=19, bty='n', ncol=1)

pie-chart1

In case you want to add some statistics on plot, e.g. total number of each combination or share of combinations in total amount, we just need to create this table and add it on plot with the following code:


# creating a table with the stats
stat.tab <- prod.matrix %>%
 select(-prod.num) %>%
 mutate(share=num/tot) %>%
 arrange(desc(num))

library(scales)
stat.tab$share <- percent(stat.tab$share) # converting values to percents

# adding a table with the stats
addtable2plot(-2.5, -1.5, stat.tab, bty="n", display.rownames=FALSE,
hlines=FALSE, vlines=FALSE, title="The stats")

pie-chart2

Therefore, we’ve studied how The Multi-layer Pie Chart can help us to draw each product and its intersections with others.

  • Ryan Schork

    Great post! My only suggestions would be:

    1) to order the different combinations in terms of frequency so it is very easy to determine most frequent combination, second most frequent, etc.

    2) Provide a complementary table with the percentages so people can reference it if they want to know the actual percentages and relative difference between sections without estimating.

    Other than that I think it is a great way to display the information. Gives the world’s most maligned visualization (pie charts) some dignity 🙂

    • AnalyzeCore

      Thank you Ryan!
      The only reason I didn’t calculate an extra stats because I guess the association rules algorithm is very good for this purpose. But I totally agree that it can be useful for some people and I’ve added the extra stats on plot 🙂

  • Mila

    Thank you for the post, very informative!

    There is a little slip of the pen in “Note: I will do this by using the simple nchart() function”. The function is clearly should be nchar().

    • AnalyzeCore

      Thank you, Mila!

  • Pingback: Sequence of shopping carts analysis with R – Sankey diagram | Analyze Core()

  • cristine

    Hi ,

    Thanks for this post, Great post! , where is the data ? can you help me : i have this problem :

    and i like to transforme this table like this : (table1) to table(2)

    can you please give the script of creating the frequency matrix

    Thanks in advance

    • AnalyzeCore

      Hi Cristine and thank you for reading! You can transform the table via several approaches. I prefer the following:
      table2 <- dcast(table1, id_customer ~ id_MUSIC, fun.aggregate=length)

      Let me know if this doesn't work for you.

    • cristine

      Hi , thank you but i have this error 🙁 Error: could not find function “dcast”

      This is my code:
      table1 <- read.table("C:/Users/Cristine/Desktop/data.csv", sep=";", quote=""")
      head(table1)
      table2 <- dcast(table1, id_customer ~ id_MUSIC, fun.aggregate=length)

      I install the packages("reshape") but the error still persist

      waiting for your reply, thank you very much !

    • AnalyzeCore

      You need to install and launch package reshape2 before the code. Your code:

      install.packages(reshape2)
      library(reshape2)

      table1 <- read.table("C:/Users/Cristine/Desktop/data.csv", sep=";", quote=""")
      head(table1)
      table2 <- dcast(table1, id_customer ~ id_MUSIC, fun.aggregate=length)

    • cristine

      i’m sorry i am a new user of R ,i use the same code:

      install.packages(reshape2)
      library(reshape2)

      table1 <- read.table("C:/Users/Cristine/Desktop/data.csv", sep=";", quote=""")
      head(table1)
      table2 <- dcast(table1, id_customer ~ id_MUSIC, fun.aggregate=length)

      , but i have another Error:

      Using V2 as value column: use value.var to override.

      Error in eval(expr, envir, enclos) : object 'ID_customer' not found

      Thanks in advance

    • AnalyzeCore

      You’ve got table1 with V1 and V2 columns.
      1) You need to add header=TRUE parameter into read.table function,
      2) You should use correct column names (ID_customer, id_MUSIC)

      Try the following code:

      library(reshape2)
      table1 <- read.table("C:/Users/Cristine/Desktop/data.csv", sep=";", quote=""", header=TRUE)
      head(table1)
      table2 <- dcast(table1, ID_customer ~ id_MUSIC, fun.aggregate=length)

    • cristine

      it work but i have this line in the screen :> table2 <- dcast(table1, ID_customer ~ id_MUSIC, fun.aggregate=length)

      Using id_MUSIC as value column: use value.var to override.

      I think is not a problem

      Thanks for clear explanation , thank you very much I am a student, I need this code in my master project, really thank you very much for your help; I'll read your other topic, it is very nice post

      Best regards

    • AnalyzeCore

      It’s ok. You haven’t used value.var parameter, because it is the same with id_MUSIC in your case. Please, read help with ?dcast command for better understanding dcast function.
      And you are welcome!

    • cristine

      Thank you for the good expanation

    • cristine

      Hi, haw are you

      please can you help me as i have mentioned i’m a new user R, and i need R for my master project , when i try to use dcast for a data (8000 row) i have this error .such as i work on window 7 and my RAM=16 GO

      Attached please find my error

      waiting your response Thank you very much I am all gratitude.

    • AnalyzeCore

      Hi Cristine, please try to update R, RStudio and libraries you installed to the latest version. If this doesn’t work for you,
      it would be helpful if you can share your data set and R code with me. Please, email me to bryl.serg(at)gmail.com

    • cristine

      Hello thank you for quik reply, i have tried with the latest version of R , but dont it dosesn’t work 🙁
      I will send you my data set and my code R.
      should I install ubuntu or haw can i allocate more memory to R in windows

      Waiting your reply thanks

    • cristine

      have you received my email?
      thanks

    • AnalyzeCore

      Try to include header=TRUE parameter to the read.table() function. Please, read the below message from me:

  • max

    Thank you for the post, very informative!

    Can you help me haw to make the mutate function for the real data set , in fact i have tried this code but it does not run:

    prod.matrix = prod.matrix %>%mutate(cart = paste(df$product, sep=”))

    Thank you It was really helpful.

    • AnalyzeCore

      Thank you for feedback!
      You need to use column names in the paste() function. Try to substitute df$product with column names (that are product names, actually) of prod.matrix (e.g. a, b, c and d in my example).

    • max

      First i try to run your example in order to have a clear idea but :

      I have been tried this code for your example but it does not work:

      prod.matrix <- dcast(df, id_order ~ product, fun.aggregate = NULL)

      prod.matrix % mutate(cart = paste((df[,2]), sep=’;’))

      prod.matrix$cart <- gsub("NA", "", prod.matrix$cart)

      also i have tried this:

      prod.matrix % mutate(cart = paste(colnames(df[2]), sep=’;’))

      also does not work , i think may be there is a missing code in your code

      I will wait for your feedback

    • AnalyzeCore

      It is not surprising. The issues are there:
      paste(colnames(df[2]), sep=’;’) and paste((df[,2]), sep=’;’)

      In case of paste(colnames(df[2]), sep=’;’) you are trying to add new column ‘cart’ with the second column name (you’ve omitted comma also) of df data frame, in other words with word ‘product’.

      Here is my logic: prod.matrix % mutate(cart = paste(a, b, c, d, sep=”))
      I said: I want to work with prod.matrix data frame and I want to create (mutate) new column ‘cart’ and fill it with values from columns a, b, c and d.

      If you are interested in writing some kind of universal code, please follow this example:
      http://stackoverflow.com/questions/14568662/paste-multiple-columns-together-in-r

    • max

      Thank you but the question

      1)if we have more than 4 product ? haw to do it ?

      2)olso another question if i change a , b, c, d by 4,10,12,20 this prod.matrix % mutate(cart = paste(4, 10, 12, 20, sep=”)) does get :

      cart 4,10,12,20

      cart 4,10,12,20

      for all data and not :

      cart 4,10

      cart 4,12,
      cart 4

      ………………
      so there’s a difference ?between set.seed(15)

      df <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),

      product=sample(c('NULL','a','b','c','d'), 5000, replace=TRUE,

      prob=c(0.15, 0.65, 0.3, 0.15, 0.1)))

      df <- df[df$product!='NULL', ]

      and

      set.seed(15)

      df <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),

      product=sample(c('NULL','4','10','12','20'), 5000, replace=TRUE,

      prob=c(0.15, 0.65, 0.3, 0.15, 0.1)))

      df <- df[df$product!='NULL', ]

      if the name of variable is the problem haw to solve this problem ?

      Thank you

    • AnalyzeCore

      1) add names of these products manually to the code or use the approach I shared with the link. It would be:
      # extract columns to paste together
      cols <- colnames(prod.matrix)[-1]
      # create a new column `cart` with columns collapsed together
      prod.matrix$cart <- apply(prod.matrix[ , cols], 1, paste, collapse='')

      2) the issue is that you changed character value for numeric and got 0 (zeros) instead of NAs in the prod.matrix. Therefore, you can use one of the ways:
      a) df$product <- as.character(df$product) # convert numeric to character

      b) add 'id' word before the number with the paste() function
      c) substitute zeros with NAs

    • max

      thanks , i’will let you know

    • max

      nice work! it is very helpful, nice visualization using pie charts

    • max

      hello,

      This My code:

      df<- read.table("dataset.csv", sep=";", quote=""", header=TRUE)

      df$product <- as.character(df$product)

      df=mutate(product=paste("id",sep="",df$product),df)

      prod.matrix <- dcast(df, orderId ~ product, fun.aggregate = NULL)

      cols <- colnames(prod.matrix)[-1]

      prod.matrix$cart <- apply(prod.matrix[ , cols], 1, paste, collapse='')

      prod.matrix$cart <- gsub("NA",'', prod.matrix$cart)

      head(prod.matrix)

      how can i send to you the output of my data in order to show you the error ?

      thanks a lot

    • AnalyzeCore

      bryl.serg(at)gmail.com

    • max

      thanks,I have been sent an email to you

      My email:maxwellalex222@yahoo.com

  • Pingback: Shopping cart analysis with R – Multi-layer pie chart()

  • Pingback: Shopping cart analysis with R – Multi-layer pie chart – The Future of Market Analysis()

  • Andrew

    Hi Sergey,

    Loving all your posts, please keep it up!!

    Just having some trouble getting this plot to come up correctly.

    Works fine upto to the point of adding the legend (even thought the legend shows, #a#, etc).

    But after this point, adding the percentages table makes the plot go completely blank.

    Any thoughts?

    Cheers,
    Andrew

  • Duy Thọ Nguyễn

    It seemed to me that the chart not working properly in a Rmarkdown document (could not fully displayed). I tried to find another method to visualization and found that https://github.com/timelyportfolio/sunburstR/tree/master/inst/examples
    could be a candidate. Unfortunately, I could not re-make your data visualization. Would you please to take a look at it.
    KInd regards