This is the second post about the Marketing Multi-channel Attribution Model with Markov chains (here is the first one). Even though the concept of the first-order Markov chains is pretty simple, you can face other issues and challenges when implementing the approach in practice. In this article, we will review some of them. I tried to organize this article in a way that you can use it as a framework or can help you to create your own.

The main steps that we will review are the following:

- splitting paths depending on purchases counts
- replacing some channels/touch points
- a unique channel/touchpoint case
- consequent duplicated channels in the path and higher order Markov chains
- paths that haven’t led to a conversion
- customer journey duration
- attributing revenue and costs comparisons

As usually, we start by simulating the data sample for experiments that includes customer ids, date stamp of contact with a marketing channel, marketing channel and conversion mark (0/1).

**click to expand R code**

library(tidyverse) library(reshape2) library(ChannelAttribution) library(markovchain) library(visNetwork) library(expm) library(stringr) ##### simulating the "real" data ##### set.seed(454) df_raw <- data.frame(customer_id = paste0('id', sample(c(1:20000), replace = TRUE)), date = as.Date(rbeta(80000, 0.7, 10) * 100, origin = "2016-01-01"), channel = paste0('channel_', sample(c(0:7), 80000, replace = TRUE, prob = c(0.2, 0.12, 0.03, 0.07, 0.15, 0.25, 0.1, 0.08))) ) %>% group_by(customer_id) %>% mutate(conversion = sample(c(0, 1), n(), prob = c(0.975, 0.025), replace = TRUE)) %>% ungroup() %>% dmap_at(c(1, 3), as.character) %>% arrange(customer_id, date) df_raw <- df_raw %>% mutate(channel = ifelse(channel == 'channel_2', NA, channel))

In addition, I’ve replaced channel_2 with NA values. The initial data sample looks like:

### 1. Splitting paths depending on purchases counts

*It makes sense to attribute paths of the first purchase and of the n-th purchase separately*.

**click to expand R code**

##### splitting paths ##### df_paths <- df_raw %>% group_by(customer_id) %>% mutate(path_no = ifelse(is.na(lag(cumsum(conversion))), 0, lag(cumsum(conversion))) + 1) %>% ungroup()

**click to expand R code**

df_paths_1 <- df_paths %>% filter(path_no == 1) %>% select(-path_no)

### 2. Replacing/removing touchpoints

*It makes sense to replace or remove some channels*when:

**Direct**-> C3 -> conversion” we transform to “C1 -> C2 ->

**С2**-> C3 -> conversion”

**Direct**-> C3 -> C1 -> C2 -> conversion” we transform to “C3 -> C1 -> C2 -> conversion”

**NA**-> C3 -> conversion” we transform to “C1 -> C2 -> C3 -> conversion”

**NA**->

**Direct**-> C2 -> conversion” we transform to “C3 ->

**C3**-> C2 -> conversion”.

**click to expand R code**

##### replace some channels ##### df_path_1_clean <- df_paths_1 %>% # removing NAs filter(!is.na(channel)) %>% # adding order of channels in the path group_by(customer_id) %>% mutate(ord = c(1:n()), is_non_direct = ifelse(channel == 'channel_6', 0, 1), is_non_direct_cum = cumsum(is_non_direct)) %>% # removing Direct (channel_6) when it is the first in the path filter(is_non_direct_cum != 0) %>% # replacing Direct (channel_6) with the previous touch point mutate(channel = ifelse(channel == 'channel_6', channel[which(channel != 'channel_6')][is_non_direct_cum], channel)) %>% ungroup() %>% select(-ord, -is_non_direct, -is_non_direct_cum)

### 3. One- and multi-channel paths issue

*It makes sense to split a unique channel and multi-channel paths*.

- Split data for paths with one or more unique channels
- Calculate total conversions for one-channel paths and compute the Markov model for multi-channel paths
- Summarize results for each channel.

**click to expand R code**

##### one- and multi-channel paths ##### df_path_1_clean <- df_path_1_clean %>% group_by(customer_id) %>% mutate(uniq_channel_tag = ifelse(length(unique(channel)) == 1, TRUE, FALSE)) %>% ungroup() df_path_1_clean_uniq <- df_path_1_clean %>% filter(uniq_channel_tag == TRUE) %>% select(-uniq_channel_tag) df_path_1_clean_multi <- df_path_1_clean %>% filter(uniq_channel_tag == FALSE) %>% select(-uniq_channel_tag) ### experiment ### # attribution model for all paths df_all_paths <- df_path_1_clean %>% group_by(customer_id) %>% summarise(path = paste(channel, collapse = ' > '), conversion = sum(conversion)) %>% ungroup() %>% filter(conversion == 1) mod_attrib <- markov_model(df_all_paths, var_path = 'path', var_conv = 'conversion', out_more = TRUE) mod_attrib$removal_effects mod_attrib$result d_all <- data.frame(mod_attrib$result) # attribution model for splitted multi and unique channel paths df_multi_paths <- df_path_1_clean_multi %>% group_by(customer_id) %>% summarise(path = paste(channel, collapse = ' > '), conversion = sum(conversion)) %>% ungroup() %>% filter(conversion == 1) mod_attrib_alt <- markov_model(df_multi_paths, var_path = 'path', var_conv = 'conversion', out_more = TRUE) mod_attrib_alt$removal_effects mod_attrib_alt$result # adding unique paths df_uniq_paths <- df_path_1_clean_uniq %>% filter(conversion == 1) %>% group_by(channel) %>% summarise(conversions = sum(conversion)) %>% ungroup() d_multi <- data.frame(mod_attrib_alt$result) d_split <- full_join(d_multi, df_uniq_paths, by = c('channel_name' = 'channel')) %>% mutate(result = total_conversions + conversions) sum(d_all$total_conversions) sum(d_split$result)

### 4. Higher order of Markov chains and consequent duplicated channels in the path

*It doesn’t matter to skip or not duplicates for the first-order Markov chains*.

**order**” parameter. This means we can compute transition probabilities based on the previous two, three or more channels.

**click to expand R code**

##### Higher order of Markov chains and consequent duplicated channels in the path ##### # computing transition matrix - 'manual' way df_multi_paths_m <- df_multi_paths %>% mutate(path = paste0('(start) > ', path, ' > (conversion)')) m <- max(str_count(df_multi_paths_m$path, '>')) + 1 # maximum path length df_multi_paths_cols <- colsplit(string = df_multi_paths_m$path, pattern = ' > ', names = c(1:m)) colnames(df_multi_paths_cols) <- paste0('ord_', c(1:m)) df_multi_paths_cols[df_multi_paths_cols == ''] <- NA df_res <- vector('list', ncol(df_multi_paths_cols) - 1) for (i in c(1:(ncol(df_multi_paths_cols) - 1))) { df_cache <- df_multi_paths_cols %>% select(num_range("ord_", c(i, i+1))) %>% na.omit() %>% group_by_(.dots = c(paste0("ord_", c(i, i+1)))) %>% summarise(n = n()) %>% ungroup() colnames(df_cache)[c(1, 2)] <- c('channel_from', 'channel_to') df_res[[i]] <- df_cache } df_res <- do.call('rbind', df_res) df_res_tot <- df_res %>% group_by(channel_from, channel_to) %>% summarise(n = sum(n)) %>% ungroup() %>% group_by(channel_from) %>% mutate(tot_n = sum(n), perc = n / tot_n) %>% ungroup() df_dummy <- data.frame(channel_from = c('(start)', '(conversion)', '(null)'), channel_to = c('(start)', '(conversion)', '(null)'), n = c(0, 0, 0), tot_n = c(0, 0, 0), perc = c(0, 1, 1)) df_res_tot <- rbind(df_res_tot, df_dummy) # comparing transition matrices trans_matrix_prob_m <- dcast(df_res_tot, channel_from ~ channel_to, value.var = 'perc', fun.aggregate = sum) trans_matrix_prob <- data.frame(mod_attrib_alt$transition_matrix) trans_matrix_prob <- dcast(trans_matrix_prob, channel_from ~ channel_to, value.var = 'transition_probability') # computing attribution - 'manual' way channels_list <- df_path_1_clean_multi %>% filter(conversion == 1) %>% distinct(channel) channels_list <- c(channels_list$channel) df_res_ini <- df_res_tot %>% select(channel_from, channel_to) df_attrib <- vector('list', length(channels_list)) for (i in c(1:length(channels_list))) { channel <- channels_list[i] df_res1 <- df_res %>% mutate(channel_from = ifelse(channel_from == channel, NA, channel_from), channel_to = ifelse(channel_to == channel, '(null)', channel_to)) %>% na.omit() df_res_tot1 <- df_res1 %>% group_by(channel_from, channel_to) %>% summarise(n = sum(n)) %>% ungroup() %>% group_by(channel_from) %>% mutate(tot_n = sum(n), perc = n / tot_n) %>% ungroup() df_res_tot1 <- rbind(df_res_tot1, df_dummy) # adding (start), (conversion) and (null) states df_res_tot1 <- left_join(df_res_ini, df_res_tot1, by = c('channel_from', 'channel_to')) df_res_tot1[is.na(df_res_tot1)] <- 0 df_trans1 <- dcast(df_res_tot1, channel_from ~ channel_to, value.var = 'perc', fun.aggregate = sum) trans_matrix_1 <- df_trans1 rownames(trans_matrix_1) <- trans_matrix_1$channel_from trans_matrix_1 <- as.matrix(trans_matrix_1[, -1]) inist_n1 <- dcast(df_res_tot1, channel_from ~ channel_to, value.var = 'n', fun.aggregate = sum) rownames(inist_n1) <- inist_n1$channel_from inist_n1 <- as.matrix(inist_n1[, -1]) inist_n1[is.na(inist_n1)] <- 0 inist_n1 <- inist_n1['(start)', ] res_num1 <- inist_n1 %*% (trans_matrix_1 %^% 100000) df_cache <- data.frame(channel_name = channel, conversions = as.numeric(res_num1[1, 1])) df_attrib[[i]] <- df_cache } df_attrib <- do.call('rbind', df_attrib) # computing removal effect and results tot_conv <- sum(df_multi_paths_m$conversion) df_attrib <- df_attrib %>% mutate(tot_conversions = sum(df_multi_paths_m$conversion), impact = (tot_conversions - conversions) / tot_conversions, tot_impact = sum(impact), weighted_impact = impact / tot_impact, attrib_model_conversions = round(tot_conversions * weighted_impact) ) %>% select(channel_name, attrib_model_conversions)

**trans_matrix_prob_m**vs.

**trans_matrix_prob**), the removal effects and attribution results are the same (

**df_attrib**vs.

**mod_attrib_alt$result**) for the package (that skipped duplicated subsequent channels) as with “manual” calculations (with duplicates).

### 5. Deal with paths that haven’t led to conversion

- At the least, you can model:

- customers’ flow through marketing channels,
- a projection of how many customers will touch the exact marketing channel after N touches or how many conversions you can obtain if you attract traffic (N visits) from this or that channel,
- what channels have a higher probability of user churn and thus know whether or not we should change the acquisition through them.

- an advanced approach that allows to use transition probabilities and other features like visits recency, frequency, durations, demographics, etc. in order to predict conversions with machine learning algorithms.

**click to expand R code**

##### Generic Probabilistic Model ##### df_all_paths_compl <- df_path_1_clean %>% group_by(customer_id) %>% summarise(path = paste(channel, collapse = ' > '), conversion = sum(conversion)) %>% ungroup() %>% mutate(null_conversion = ifelse(conversion == 1, 0, 1)) mod_attrib_complete <- markov_model( df_all_paths_compl, var_path = 'path', var_conv = 'conversion', var_null = 'null_conversion', out_more = TRUE ) trans_matrix_prob <- mod_attrib_complete$transition_matrix %>% dmap_at(c(1, 2), as.character) ##### viz ##### edges <- data.frame( from = trans_matrix_prob$channel_from, to = trans_matrix_prob$channel_to, label = round(trans_matrix_prob$transition_probability, 2), font.size = trans_matrix_prob$transition_probability * 100, width = trans_matrix_prob$transition_probability * 15, shadow = TRUE, arrows = "to", color = list(color = "#95cbee", highlight = "red") ) nodes <- data_frame(id = c( c(trans_matrix_prob$channel_from), c(trans_matrix_prob$channel_to) )) %>% distinct(id) %>% arrange(id) %>% mutate( label = id, color = ifelse( label %in% c('(start)', '(conversion)'), '#4ab04a', ifelse(label == '(null)', '#ce472e', '#ffd73e') ), shadow = TRUE, shape = "box" ) visNetwork(nodes, edges, height = "2000px", width = "100%", main = "Generic Probabilistic model's Transition Matrix") %>% visIgraphLayout(randomSeed = 123) %>% visNodes(size = 5) %>% visOptions(highlightNearest = TRUE)

**click to expand R code**

##### modeling states and conversions ##### # transition matrix preprocessing trans_matrix_complete <- mod_attrib_complete$transition_matrix trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy %>% mutate(transition_probability = perc) %>% select(channel_from, channel_to, transition_probability)) trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to, levels = c(levels(trans_matrix_complete$channel_from))) trans_matrix_complete <- dcast(trans_matrix_complete, channel_from ~ channel_to, value.var = 'transition_probability') trans_matrix_complete[is.na(trans_matrix_complete)] <- 0 rownames(trans_matrix_complete) <- trans_matrix_complete$channel_from trans_matrix_complete <- as.matrix(trans_matrix_complete[, -1]) # creating empty matrix for modeling model_mtrx <- matrix(data = 0, nrow = nrow(trans_matrix_complete), ncol = 1, dimnames = list(c(rownames(trans_matrix_complete)), '(start)')) # adding modeling number of visits model_mtrx['channel_5', ] <- 1000 c(model_mtrx) %*% (trans_matrix_complete %^% 5) # after 5 steps c(model_mtrx) %*% (trans_matrix_complete %^% 100000) # after 100000 steps

### 6. Customer journey duration

*Usually, we compute the model regularly for the company’s standard reporting period*.

**click to expand R code**

##### Customer journey duration ##### # computing time lapses from the first contact to conversion/last contact df_multi_paths_tl <- df_path_1_clean_multi %>% group_by(customer_id) %>% summarise(path = paste(channel, collapse = ' > '), first_touch_date = min(date), last_touch_date = max(date), tot_time_lapse = round(as.numeric(last_touch_date - first_touch_date)), conversion = sum(conversion)) %>% ungroup() # distribution plot ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) + theme_minimal() + geom_histogram(fill = '#4e79a7', binwidth = 1) # cumulative distribution plot ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) + theme_minimal() + stat_ecdf(geom = 'step', color = '#4e79a7', size = 2, alpha = 0.7) + geom_hline(yintercept = 0.95, color = '#e15759', size = 1.5) + geom_vline(xintercept = 23, color = '#e15759', size = 1.5, linetype = 2)

**click to expand R code**

### for generic probabilistic model ### df_multi_paths_tl_1 <- melt(df_multi_paths_tl[c(1:50), ] %>% select(customer_id, first_touch_date, last_touch_date, conversion), id.vars = c('customer_id', 'conversion'), value.name = 'touch_date') %>% arrange(customer_id) rep_date <- as.Date('2016-01-10', format = '%Y-%m-%d') ggplot(df_multi_paths_tl_1, aes(x = as.factor(customer_id), y = touch_date, color = factor(conversion), group = customer_id)) + theme_minimal() + coord_flip() + geom_point(size = 2) + geom_line(size = 0.5, color = 'darkgrey') + geom_hline(yintercept = as.numeric(rep_date), color = '#e15759', size = 2) + geom_rect(xmin = -Inf, xmax = Inf, ymin = as.numeric(rep_date), ymax = Inf, alpha = 0.01, color = 'white', fill = 'white') + theme(legend.position = 'bottom', panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.ticks.x = element_blank(), axis.ticks.y = element_blank()) + guides(colour = guide_legend(override.aes = list(size = 5)))

**id10072**finished the path in a conversion so we can add its retrospective touchpoints into the model. On the other hand, i

**d10010**‘s,

**id1001**‘s and

**id10005**‘s paths are fruitless as of reporting date but customer id10010 will purchase on January 19, 2016, customer id1001 will contact with a marketing channel on January 15, 2016, but won’t purchase and customer id10005 won’t have any new contacts with marketing channels, it is fruitless.

**click to expand R code**

df_multi_paths_tl_2 <- df_path_1_clean_multi %>% group_by(customer_id) %>% mutate(prev_touch_date = lag(date)) %>% ungroup() %>% filter(conversion == 1) %>% mutate(prev_time_lapse = round(as.numeric(date - prev_touch_date))) # distribution ggplot(df_multi_paths_tl_2, aes(x = prev_time_lapse)) + theme_minimal() + geom_histogram(fill = '#4e79a7', binwidth = 1) # cumulative distribution ggplot(df_multi_paths_tl_2, aes(x = prev_time_lapse)) + theme_minimal() + stat_ecdf(geom = 'step', color = '#4e79a7', size = 2, alpha = 0.7) + geom_hline(yintercept = 0.95, color = '#e15759', size = 1.5) + geom_vline(xintercept = 12, color = '#e15759', size = 1.5, linetype = 2)

**click to expand R code**

# extracting data for generic model df_multi_paths_tl_3 <- df_path_1_clean_multi %>% group_by(customer_id) %>% mutate(prev_time_lapse = round(as.numeric(date - lag(date)))) %>% summarise(path = paste(channel, collapse = ' > '), tot_time_lapse = round(as.numeric(max(date) - min(date))), prev_touch_tl = prev_time_lapse[which(max(date) == date)], conversion = sum(conversion)) %>% ungroup() %>% mutate(is_fruitless = ifelse(conversion == 0 & tot_time_lapse > 20 & prev_touch_tl > 10, TRUE, FALSE)) %>% filter(conversion == 1 | is_fruitless == TRUE)

### 7. Attributing revenue and costs comparison

**var_value**” with a column of revenues into the markov_model() function. Therefore, it is possible to compare channels’ gross margin.

### Conclusions

- the Markov model can be effectively used for LTV prediction
- we can include additional types of interactions into the model (for example, banner impressions, tv ads, calls to callcenter and so on) and evaluate their influence
- develop “likely to buy” predictive model using ML algorithms

Pingback: Marketing Multi-Channel Attribution model with R (part 2: practical issues) — AnalyzeCore – data is beautiful, data is a story – Marketing Analytics()

Pingback: Multi-Channel Attribution With R – Curated SQL()