# Shopping cart analysis with R – Multi-layer pie chart

This post was updated on 12/05/2015.

In this post, we will review a very interesting type of visualization – the Multi-layer Pie Chart – and use it for one of the marketing analytics tasks – the shopping carts analysis. We will go from the initial data processing to the shopping carts analysis visualization. I will share the R code in that you shouldn’t write code for every layer of chart. You can also find an example about how to create a Multi-layer Pie Chart here.

Ok, let’s suppose we have a list of first orders/carts that were bought by our clients. Each order consists one or several products (or category of products). Our task is to visualize a relationship between products and see the share of orders that includes each product or combination of products. The Multi-layer Pie Chart can help us to draw each product and its intersections with others.

After we loaded the necessary libraries with the following code:

```# loading libraries
library(dplyr)
library(reshape2)
library(plotrix)
```

we will simulate an example of the data set. Suppose we sell 4 products (or product categories): a, b, c and d and each product can be sold with a different probability. Also, a client can purchase any combinations of products, e.g. “a” or “a,b,a,d” and so on. Let’s do this with the following code:

```# creating an example of orders
set.seed(15)
df <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),
product=sample(c('NULL','a','b','c','d'), 5000, replace=TRUE,
prob=c(0.15, 0.65, 0.3, 0.15, 0.1)))
df <- df[df\$product!='NULL', ]
```

After this, we will process data for creating data frame for analysis. Specifically, we will:

• remove the duplicates. For example, if the order consists of more than one similar product (“a,b,a,d”), we want to exclude the effect of quantity,
• combine products to the new feature ‘cart’ that will include all unique products in the cart,
• calculate number of carts (‘num’ column).
```# processing initial data
# we need to be sure that product's names are unique
df\$product <- paste0("#", df\$product, "#")

prod.matrix <- df %>%
# removing duplicated products from each order
group_by(orderId, product) %>%
arrange(product) %>%
unique() %>%
# combining products to cart and calculating number of products
group_by(orderId) %>%
summarise(cart=paste(product,collapse=";"),
prod.num=n()) %>%
# calculating number of carts
group_by(cart, prod.num) %>%
summarise(num=n()) %>%
ungroup()
```

Let’s take a look on the resulting data frame with the head(prod.matrix) function:

```  cart            prod.num num
1  #a#              1     123
2  #a#;#b#          2     241
3  #a#;#b#;#c#      3     168
4  #a#;#b#;#c#;#d#  4      71
5  #a#;#b#;#d#      3     125
6  #a#;#c#          2     105```

From this point we start working on our Multi-layer Pie Chart. My idea is to place orders that include one product into the core of the chart. Therefore, we’ve calculated the total number of products in each combination (‘prod.num’ value) and will split data frame for two data frames: the first one (one.prod) that will include carts with one product and the second one (sev.prod) with more than one product.

```# calculating total number of orders/carts
tot <- sum(prod.matrix\$num)

# spliting orders for sets with 1 product and more than 1 product
one.prod <- prod.matrix %>% filter(prod.num == 1)

sev.prod <- prod.matrix %>%
filter(prod.num > 1) %>%
arrange(desc(prod.num))
```

Therefore, the data is ready for plotting. We will define parameters for the chart with the following code:

```# defining parameters for pie chart
iniR <- 0.2 # initial radius
cols <- c("#ffffff", "#fec44f", "#fc9272", "#a1d99b", "#fee0d2",
"#2ca25f", "#8856a7", "#43a2ca", "#fdbb84", "#e34a33",
"#a6bddb", "#dd1c77", "#ffeda0", "#756bb1")
prod <- df %>%
select(product) %>%
arrange(product) %>%
unique()
prod <- c('NO', c(prod\$product))
colors <- as.list(setNames(cols[ c(1:(length(prod)))], prod))
```

Note: we’ve defined the color palette with fourteen colors including white color for spaces. This means if you have more than thirteen unique products in the data set, you need to add extra colors. Finally, we will plot the Multi-layer Pie Chart with the following code:

```# 0 circle: blank
pie(1, radius=iniR, init.angle=90, col=c('white'), border = NA, labels='')

# drawing circles from last to 2nd
for (i in length(prod):2) {
p <- grep(prod[i], sev.prod\$cart)
col <- rep('NO', times=nrow(sev.prod))
col[p] <- prod[i]
floating.pie(0,0,c(sev.prod\$num, tot-sum(sev.prod\$num)), radius=(1+i)*iniR, startpos=pi/2, col=as.character(colors [ c(col, 'NO')]), border="#44aaff")
}

# 1 circle: orders with 1 product
floating.pie(0,0,c(tot-sum(one.prod\$num),one.prod\$num), radius=2*iniR, startpos=pi/2, col=as.character(colors [ c('NO',one.prod\$cart)]), border="#44aaff")

# legend
legend(1.5, 2*iniR, gsub("_"," ",names(colors)[-1]), col=as.character(colors [-1]), pch=19, bty='n', ncol=1)
```

In case you want to add some statistics on plot, e.g. total number of each combination or share of combinations in total amount, we just need to create this table and add it on plot with the following code:

```
# creating a table with the stats
stat.tab <- prod.matrix %>%
select(-prod.num) %>%
mutate(share=num/tot) %>%
arrange(desc(num))

library(scales)
stat.tab\$share <- percent(stat.tab\$share) # converting values to percents

# adding a table with the stats
hlines=FALSE, vlines=FALSE, title="The stats")

```

Therefore, we’ve studied how The Multi-layer Pie Chart can help us to draw each product and its intersections with others.

• Ryan Schork

Great post! My only suggestions would be:

1) to order the different combinations in terms of frequency so it is very easy to determine most frequent combination, second most frequent, etc.

2) Provide a complementary table with the percentages so people can reference it if they want to know the actual percentages and relative difference between sections without estimating.

Other than that I think it is a great way to display the information. Gives the world’s most maligned visualization (pie charts) some dignity 🙂

• AnalyzeCore

Thank you Ryan!
The only reason I didn’t calculate an extra stats because I guess the association rules algorithm is very good for this purpose. But I totally agree that it can be useful for some people and I’ve added the extra stats on plot 🙂

• Mila

Thank you for the post, very informative!

There is a little slip of the pen in “Note: I will do this by using the simple nchart() function”. The function is clearly should be nchar().

• AnalyzeCore

Thank you, Mila!

• cristine

Hi ,

Thanks for this post, Great post! , where is the data ? can you help me : i have this problem :

and i like to transforme this table like this : (table1) to table(2)

can you please give the script of creating the frequency matrix

• AnalyzeCore

Hi Cristine and thank you for reading! You can transform the table via several approaches. I prefer the following:
table2 <- dcast(table1, id_customer ~ id_MUSIC, fun.aggregate=length)

Let me know if this doesn't work for you.

• cristine

Hi , thank you but i have this error 🙁 Error: could not find function “dcast”

This is my code:
table2 <- dcast(table1, id_customer ~ id_MUSIC, fun.aggregate=length)

I install the packages("reshape") but the error still persist

• AnalyzeCore

You need to install and launch package reshape2 before the code. Your code:

install.packages(reshape2)
library(reshape2)

table2 <- dcast(table1, id_customer ~ id_MUSIC, fun.aggregate=length)

• cristine

i’m sorry i am a new user of R ,i use the same code:

install.packages(reshape2)
library(reshape2)

table2 <- dcast(table1, id_customer ~ id_MUSIC, fun.aggregate=length)

, but i have another Error:

Using V2 as value column: use value.var to override.

• AnalyzeCore

You’ve got table1 with V1 and V2 columns.
2) You should use correct column names (ID_customer, id_MUSIC)

Try the following code:

library(reshape2)
table2 <- dcast(table1, ID_customer ~ id_MUSIC, fun.aggregate=length)

• cristine

it work but i have this line in the screen :> table2 <- dcast(table1, ID_customer ~ id_MUSIC, fun.aggregate=length)

Using id_MUSIC as value column: use value.var to override.

I think is not a problem

Thanks for clear explanation , thank you very much I am a student, I need this code in my master project, really thank you very much for your help; I'll read your other topic, it is very nice post

Best regards

• AnalyzeCore

It’s ok. You haven’t used value.var parameter, because it is the same with id_MUSIC in your case. Please, read help with ?dcast command for better understanding dcast function.
And you are welcome!

• cristine

Thank you for the good expanation

• cristine

Hi, haw are you

please can you help me as i have mentioned i’m a new user R, and i need R for my master project , when i try to use dcast for a data (8000 row) i have this error .such as i work on window 7 and my RAM=16 GO

waiting your response Thank you very much I am all gratitude.

• AnalyzeCore

Hi Cristine, please try to update R, RStudio and libraries you installed to the latest version. If this doesn’t work for you,
it would be helpful if you can share your data set and R code with me. Please, email me to bryl.serg(at)gmail.com

• cristine

Hello thank you for quik reply, i have tried with the latest version of R , but dont it dosesn’t work 🙁
I will send you my data set and my code R.
should I install ubuntu or haw can i allocate more memory to R in windows

• cristine

thanks

• AnalyzeCore

• max

Thank you for the post, very informative!

Can you help me haw to make the mutate function for the real data set , in fact i have tried this code but it does not run:

prod.matrix = prod.matrix %>%mutate(cart = paste(df\$product, sep=”))

Thank you It was really helpful.

• AnalyzeCore

Thank you for feedback!
You need to use column names in the paste() function. Try to substitute df\$product with column names (that are product names, actually) of prod.matrix (e.g. a, b, c and d in my example).

• max

First i try to run your example in order to have a clear idea but :

I have been tried this code for your example but it does not work:

prod.matrix <- dcast(df, id_order ~ product, fun.aggregate = NULL)

prod.matrix % mutate(cart = paste((df[,2]), sep=’;’))

prod.matrix\$cart <- gsub("NA", "", prod.matrix\$cart)

also i have tried this:

prod.matrix % mutate(cart = paste(colnames(df[2]), sep=’;’))

also does not work , i think may be there is a missing code in your code

I will wait for your feedback

• AnalyzeCore

It is not surprising. The issues are there:
paste(colnames(df[2]), sep=’;’) and paste((df[,2]), sep=’;’)

In case of paste(colnames(df[2]), sep=’;’) you are trying to add new column ‘cart’ with the second column name (you’ve omitted comma also) of df data frame, in other words with word ‘product’.

Here is my logic: prod.matrix % mutate(cart = paste(a, b, c, d, sep=”))
I said: I want to work with prod.matrix data frame and I want to create (mutate) new column ‘cart’ and fill it with values from columns a, b, c and d.

If you are interested in writing some kind of universal code, please follow this example:
http://stackoverflow.com/questions/14568662/paste-multiple-columns-together-in-r

• max

Thank you but the question

1)if we have more than 4 product ? haw to do it ?

2)olso another question if i change a , b, c, d by 4,10,12,20 this prod.matrix % mutate(cart = paste(4, 10, 12, 20, sep=”)) does get :

cart 4,10,12,20

cart 4,10,12,20

for all data and not :

cart 4,10

cart 4,12,
cart 4

………………
so there’s a difference ?between set.seed(15)

df <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),

product=sample(c('NULL','a','b','c','d'), 5000, replace=TRUE,

prob=c(0.15, 0.65, 0.3, 0.15, 0.1)))

df <- df[df\$product!='NULL', ]

and

set.seed(15)

df <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),

product=sample(c('NULL','4','10','12','20'), 5000, replace=TRUE,

prob=c(0.15, 0.65, 0.3, 0.15, 0.1)))

df <- df[df\$product!='NULL', ]

if the name of variable is the problem haw to solve this problem ?

Thank you

• AnalyzeCore

1) add names of these products manually to the code or use the approach I shared with the link. It would be:
# extract columns to paste together
cols <- colnames(prod.matrix)[-1]
# create a new column `cart` with columns collapsed together
prod.matrix\$cart <- apply(prod.matrix[ , cols], 1, paste, collapse='')

2) the issue is that you changed character value for numeric and got 0 (zeros) instead of NAs in the prod.matrix. Therefore, you can use one of the ways:
a) df\$product <- as.character(df\$product) # convert numeric to character

b) add 'id' word before the number with the paste() function
c) substitute zeros with NAs

• max

thanks , i’will let you know

• max

nice work! it is very helpful, nice visualization using pie charts

• max

hello,

This My code:

df\$product <- as.character(df\$product)

df=mutate(product=paste("id",sep="",df\$product),df)

prod.matrix <- dcast(df, orderId ~ product, fun.aggregate = NULL)

cols <- colnames(prod.matrix)[-1]

prod.matrix\$cart <- apply(prod.matrix[ , cols], 1, paste, collapse='')

prod.matrix\$cart <- gsub("NA",'', prod.matrix\$cart)

how can i send to you the output of my data in order to show you the error ?

thanks a lot

• AnalyzeCore

bryl.serg(at)gmail.com

• max

thanks,I have been sent an email to you

My email:maxwellalex222@yahoo.com

• Andrew

Hi Sergey,

Just having some trouble getting this plot to come up correctly.

Works fine upto to the point of adding the legend (even thought the legend shows, #a#, etc).

But after this point, adding the percentages table makes the plot go completely blank.

Any thoughts?

Cheers,
Andrew

• Duy Thọ Nguyễn

It seemed to me that the chart not working properly in a Rmarkdown document (could not fully displayed). I tried to find another method to visualization and found that https://github.com/timelyportfolio/sunburstR/tree/master/inst/examples
could be a candidate. Unfortunately, I could not re-make your data visualization. Would you please to take a look at it.
KInd regards