AnalyzeCore by Sergey Bryl' — data is beautiful, data is a story

Shopping cart analysis with R – Multi-layer pie chart

Sergey Bryl'

Analytics & Data Science Team Lead at MacPaw Inc.Oct 17, 2014

*This post was updated on 12/05/2015.*

In this post, we will review a very interesting type of visualization – the **Multi-layer Pie Chart** – and use it for one of the marketing analytics tasks – the shopping carts analysis.

We will go from the initial data processing to the shopping carts analysis visualization. I will share the R code in that you shouldn’t write code for every layer of a chart. You can also find an example of how to create a Multi-layer Pie Chart here.

Ok, let’s suppose we have a list of first orders/carts that were bought by our clients. Each order consists one or several products (or category of products). Our task is to visualize a relationship between products and see the share of orders that includes each product or combination of products. The Multi-layer Pie Chart can help us to draw each product and its intersections with others.

After we loaded the necessary libraries with the following code:

# loading libraries library(dplyr) library(reshape2) library(plotrix)

we will simulate an example of the data set. Suppose we sell 4 products (or product categories): **a**, **b**, **c** and **d** and each product can be sold with a different probability. Also, a client can purchase any combinations of products, e.g. “a” or “a,b,a,d” and so on. Let’s do this with the following code:

# creating an example of orders set.seed(15) df <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE), product=sample(c('NULL','a','b','c','d'), 5000, replace=TRUE, prob=c(0.15, 0.65, 0.3, 0.15, 0.1))) df <- df[df$product!='NULL', ]

After this, we will process data for creating data frame for analysis. Specifically, we will:

- remove the duplicates. For example, if the order consists of more than one similar product (“
**a**,b,**a**,d”), we want to exclude the effect of quantity, - combine products to the new feature ‘cart’ that will include all unique products in the cart,
- calculate number of carts (‘num’ column).

# processing initial data # we need to be sure that product's names are unique df$product <- paste0("#", df$product, "#") prod.matrix <- df %>% # removing duplicated products from each order group_by(orderId, product) %>% arrange(product) %>% unique() %>% # combining products to cart and calculating number of products group_by(orderId) %>% summarise(cart=paste(product,collapse=";"), prod.num=n()) %>% # calculating number of carts group_by(cart, prod.num) %>% summarise(num=n()) %>% ungroup()

Let’s take a look on the resulting data frame with the head(prod.matrix) function:

cart prod.num num 1 #a# 1 123 2 #a#;#b# 2 241 3 #a#;#b#;#c# 3 168 4 #a#;#b#;#c#;#d# 4 71 5 #a#;#b#;#d# 3 125 6 #a#;#c# 2 105

From this point, we start working on our Multi-layer Pie Chart. My idea is to place orders that include one product into the core of the chart. Therefore, we’ve calculated the total number of products in each combination (‘prod.num’ value) and will split data frame for two data frames: the first one (one.prod) that will include carts with one product and the second one (sev.prod) with more than one product.

# calculating total number of orders/carts tot <- sum(prod.matrix$num) # spliting orders for sets with 1 product and more than 1 product one.prod <- prod.matrix %>% filter(prod.num == 1) sev.prod <- prod.matrix %>% filter(prod.num > 1) %>% arrange(desc(prod.num))

Therefore, the data is ready for plotting. We will define parameters for the chart with the following code:

# defining parameters for pie chart iniR <- 0.2 # initial radius cols <- c("#ffffff", "#fec44f", "#fc9272", "#a1d99b", "#fee0d2", "#2ca25f", "#8856a7", "#43a2ca", "#fdbb84", "#e34a33", "#a6bddb", "#dd1c77", "#ffeda0", "#756bb1") prod <- df %>% select(product) %>% arrange(product) %>% unique() prod <- c('NO', c(prod$product)) colors <- as.list(setNames(cols[ c(1:(length(prod)))], prod))

Note: we’ve defined the color palette with fourteen colors including white color for spaces. This means if you have more than thirteen unique products in the data set, you need to add extra colors. Finally, we will plot the Multi-layer Pie Chart with the following code:

# 0 circle: blank pie(1, radius=iniR, init.angle=90, col=c('white'), border = NA, labels='') # drawing circles from last to 2nd for (i in length(prod):2) { p <- grep(prod[i], sev.prod$cart) col <- rep('NO', times=nrow(sev.prod)) col[p] <- prod[i] floating.pie(0,0,c(sev.prod$num, tot-sum(sev.prod$num)), radius=(1+i)*iniR, startpos=pi/2, col=as.character(colors [ c(col, 'NO')]), border="#44aaff") } # 1 circle: orders with 1 product floating.pie(0,0,c(tot-sum(one.prod$num),one.prod$num), radius=2*iniR, startpos=pi/2, col=as.character(colors [ c('NO',one.prod$cart)]), border="#44aaff") # legend legend(1.5, 2*iniR, gsub("_"," ",names(colors)[-1]), col=as.character(colors [-1]), pch=19, bty='n', ncol=1)

In case you want to add some statistics on plot, e.g. a total number of each combination or share of combinations in total amount, we just need to create this table and add it on the plot with the following code:

# creating a table with the stats stat.tab <- prod.matrix %>% select(-prod.num) %>% mutate(share=num/tot) %>% arrange(desc(num)) library(scales) stat.tab$share <- percent(stat.tab$share) # converting values to percents # adding a table with the stats addtable2plot(-2.5, -1.5, stat.tab, bty="n", display.rownames=FALSE, hlines=FALSE, vlines=FALSE, title="The stats")

Therefore, we’ve studied how The Multi-layer Pie Chart can help us to draw each product and its intersections with others.

Sergey Bryl'

Analytics & Data Science Team Lead at MacPaw Inc.Oct 17, 2014

Get new post notification

%d bloggers like this: