Marketing Multi-Channel Attribution model with R (part 1: Markov chains concept)

As we know, a customer usually goes through a path/sequence of different channels/touchpoints before a purchase in e-commerce or conversion in other areas. In Google Analytics we can find some touchpoints more likely to assist to conversion than others that more likely to be last-click touchpoint. As most of the channels are paid for (in terms of money or time spent), it is vital to have an algorithm for distributing conversions and the value between those channels and compare with their costs instead of crediting e.g. last non-direct channel only. This is a Multi-Channel Attribution Model problem.

A definition by Google Analytics helps: an Attribution Model is a rule, or set of rules, that determines how credit for sales and conversions is assigned to touchpoints in conversion paths.

Nowadays, Google Analytics provides seven (!) predefined attribution models and even a custom model that you can adapt to your case. However, there are some aspects that I don’t like about the Google Analytics approach, which is why I started research on this area. I’m sure this is a very interesting field for analysts and marketers. I’m going to publish a sequence of posts about alternative (relatively to Google Analytics) Attribution Model concepts, some ideas for solving issues that you would face in practice when implementing them, and R code for computing them (as always).

What I don’t like about the GA approach:

  • You have to make a choice or managerial decision regarding which model to use and why. You can see different results with different models but which one is more correct? In other words, GA provides heuristic models with their pros and cons,
  • The data are aggregated and anonymized and you can’t mine deeper if you want,
  • You can’t take into account paths without conversions but this would be interesting.

Pros of GA:

  • You don’t need to organize a storage and infrastructure for collecting data,
  • You are provided with a range of heuristic models,
  • It is pretty easy and free to use.

Therefore, if you are relatively small company it would be logical to use the GA’s approach but if you see the results of attribution would have a significant impact on marketing budgets, product prices, understanding customer journeys, etc. or you have the necessary data collected, you can explore ideas that I’m going to share.

I focused on the Markov chains concept for attribution in this article mainly. In the second post of the series, we will study practical aspects of its implementation.

Attribution Model based on Markov chains concept

Using Markov chains allow us to switch from heuristic models to probabilistic ones. We can represent every customer journey (sequence of channels/touchpoints) as a chain in a directed Markov graph where each vertex is a possible state (channel/touchpoint) and the edges represent the probability of transition between the states (including conversion.) By computing the model and estimating transition probabilities we can attribute every channel/touchpoint.

Let’s start with a simple example of the first-order or “memory-free” Markov graph for better understanding the concept. It is called “memory-free” because the probability of reaching one state depends only on the previous state visited.

For instance, customer journeys contain three unique channels C1, C2, and C3. In addition, we should manually add three special states to each graph: (start), (conversion) and (null). These additional states represent starting point, purchase or conversion, and unsuccessful conversion. Transitions from identical channels are possible (e.g. C1 -> C1) but can be omitted for different reasons.

Let’s assume we have three customer journeys:

C1 -> C2 -> C3 -> purchase

C1 -> unsuccessful conversion

C2 -> C3 -> unsuccessful conversion

Due to the approach, we will add extra states (see column 2 of the following table) and split for pairs (see column 3):

1 - Customer journey2 - Transformation3 - Splitting for pairs
C1 -> C2 -> C3 -> purchase(start) -> C1 -> C2 -> C3 -> (conversion)(start) -> C1, C1 -> C2, C2 -> C3, C3 -> (conversion)
C1(start) -> C1 -> (null)(start) -> C1, C1 -> (null)
C2 -> C3(start) -> C2 -> C3 -> (null)(start) -> C2, C2 -> C3, C3 -> (null)

After this, we need to calculate the probabilities of the transition from state to state:

fromtoprobabilitytotal probability
(start)C11/366.7%
(start)C11/3
(start)C21/333.3%
total from (start)3/3
C1C21/250%
C1(null)1/250%
total from C12/2
C2C31/2100%
C2C31/2
total from C22/2
C3(conversion)1/250%
C3(null)1/250%
total from C32/2

Finally, we can plot the model:

Screenshot 2016-07-22 14.26.50

The last step is to estimate every channel/touchpoint. It is pretty easy to do this by using the principle of Removal Effect.  The core of Removal Effect is to remove each channel from the graph consecutively and measure how many conversions (or how much value) could be made (earned) without the one. The logic is the following: if we obtain N conversions without a certain channel/touchpoint compared to total conversions T of the complete model, that means the channel reflects the change in total conversions (or value). After all, channels/touchpoints are estimated: we have to weight them because the total sum of (T – Ni) would be bigger than T and normally it is.

Another effective way to measure the Removal Effect is in percentages e.g. the channel affected conversion probabilities by X %.

Let’s see how this works in our simplified example. Removing a channel/touchpoint from the graph means we should replace it in channel pairs. In case the channel exists in a «from» state, we will replace it with NA (and then omit this pair) and we will replace the channel with (null) if it is in a «to» state. In other words, we will not have paths from the channel and will transit from the other channels to (null) state when the channel is in a «to» part.

Removal Effect for C1 channel looks like the following:

Screenshot 2016-07-25 21.26.57

Therefore, the probability of conversion of the complete model is 33.3% (0.667 * 0.5 * 1 * 0.5 + 0.333 * 1 * 0.5.) The probability of conversion after removing the C1 channel is 16.7% (0.333 * 1 * 0.5.) Therefore, the channel C1 removal effect is 0.5 (1 – 0.167 / 0.333.) In other words, if we didn’t have the channel C1 in customer journeys we would lose 50% of conversions.

The removal effect of both C2 and C3 is 1 because we would lose all 100% conversion (1 – 0 / 0.333).

In addition, we need to weight the indexes and multiply them by total number of conversions (1 in our case):

  • C1: 0.5 / (0.5 + 1 + 1) = 0.2 * 1 conversion = 0.2
  • C2: 1 / (0.5 + 1 + 1) = 0.4 * 1 conversion = 0.4
  • C3: 1 / (0.5 + 1 + 1) = 0.4 * 1 conversion = 0.4

Therefore, we distributed 1 conversion for all channels.

I think the method is clear for you now. Let’s do this with R language. We will create the simplified example as above and simulate a dataset that looks like real data of customer journeys in addition.

Hint: there is a great R-package (ChannelAttribution) available on CRAN. It is really nice because of its simplicity and speed but I found some issues in the early stages of my investigation. That is why I’ve done all the work manually. Luckily, the author of the package (Davide Altomare) replied to my comments and fixed the bugs. Therefore, I’ve obtained the equal results of my manual approach and the package from the 1.8 version. Thus, please be sure that you have the latest version of the package installed.

The following is the R code for simple example that we reviewed above:

click to expand R code
library(dplyr)
library(reshape2)
library(ggplot2)
library(ChannelAttribution)
library(markovchain)

##### simple example #####
# creating a data sample
df1 <- data.frame(path = c('c1 > c2 > c3', 'c1', 'c2 > c3'), conv = c(1, 0, 0), conv_null = c(0, 1, 1))

# calculating the models
mod1 <- markov_model(df1,
var_path = 'path',
var_conv = 'conv',
var_null = 'conv_null',
out_more = TRUE)

# extracting the results of attribution
df_res1 <- mod1$result

# extracting a transition matrix
df_trans1 <- mod1$transition_matrix
df_trans1 <- dcast(df_trans1, channel_from ~ channel_to, value.var = 'transition_probability')

### plotting the Markov graph ###
df_trans <- mod1$transition_matrix

# adding dummies in order to plot the graph
df_dummy <- data.frame(channel_from = c('(start)', '(conversion)', '(null)'),
channel_to = c('(start)', '(conversion)', '(null)'),
transition_probability = c(0, 1, 1))
df_trans <- rbind(df_trans, df_dummy)

# ordering channels
df_trans$channel_from <- factor(df_trans$channel_from,
levels = c('(start)', '(conversion)', '(null)', 'c1', 'c2', 'c3'))
df_trans$channel_to <- factor(df_trans$channel_to,
levels = c('(start)', '(conversion)', '(null)', 'c1', 'c2', 'c3'))
df_trans <- dcast(df_trans, channel_from ~ channel_to, value.var = 'transition_probability')

# creating the markovchain object
trans_matrix <- matrix(data = as.matrix(df_trans[, -1]),
nrow = nrow(df_trans[, -1]), ncol = ncol(df_trans[, -1]),
dimnames = list(c(as.character(df_trans[, 1])), c(colnames(df_trans[, -1]))))
trans_matrix[is.na(trans_matrix)] <- 0
trans_matrix1 <- new("markovchain", transitionMatrix = trans_matrix)

# plotting the graph
plot(trans_matrix1, edge.arrow.size = 0.35)

We have obtained the visualization of Markov graph, transition matrix (df_trans1 data frame) and the attribution results that look pretty the same with our calculations (df_res1 data frame):

transition_probs_1

Screenshot 2016-08-03 00.57.52

Let’s simulate a dataset that looks like real data of customer journeys. We assume that all paths were finished with the purchase/conversion.

click to expand R code
# simulating the "real" data
set.seed(354)
df2 <- data.frame(client_id = sample(c(1:1000), 5000, replace = TRUE),
date = sample(c(1:32), 5000, replace = TRUE),
channel = sample(c(0:9), 5000, replace = TRUE,
prob = c(0.1, 0.15, 0.05, 0.07, 0.11, 0.07, 0.13, 0.1, 0.06, 0.16)))
df2$date <- as.Date(df2$date, origin = "2015-01-01")
df2$channel <- paste0('channel_', df2$channel)

# aggregating channels to the paths for each customer
df2 <- df2 %>%
group_by(client_id) %>%
summarise(path = paste(channel, collapse = ' > '),
# assume that all paths were finished with conversion
conv = 1,
conv_null = 0) %>%
ungroup()

# calculating the models (Markov and heuristics)
mod2 <- markov_model(df2,
var_path = 'path',
var_conv = 'conv',
var_null = 'conv_null',
out_more = TRUE)

# heuristic_models() function doesn't work for me, therefore I used the manual calculations
# instead of:
#h_mod2 <- heuristic_models(df2, var_path = 'path', var_conv = 'conv')

df_hm <- df2 %>%
mutate(channel_name_ft = sub('>.*', '', path),
channel_name_ft = sub(' ', '', channel_name_ft),
channel_name_lt = sub('.*>', '', path),
channel_name_lt = sub(' ', '', channel_name_lt))
# first-touch conversions
df_ft <- df_hm %>%
group_by(channel_name_ft) %>%
summarise(first_touch_conversions = sum(conv)) %>%
ungroup()
# last-touch conversions
df_lt <- df_hm %>%
group_by(channel_name_lt) %>%
summarise(last_touch_conversions = sum(conv)) %>%
ungroup()

h_mod2 <- merge(df_ft, df_lt, by.x = 'channel_name_ft', by.y = 'channel_name_lt')

# merging all models
all_models <- merge(h_mod2, mod2$result, by.x = 'channel_name_ft', by.y = 'channel_name')
colnames(all_models)[c(1, 4)] <- c('channel_name', 'attrib_model_conversions')

The results are the following (all_models data frame):

Screenshot 2016-08-03 02.15.54

In addition, let’s create some visualizations for transition matrix and the difference between heuristic models and attribution model with the following code:

click to expand R code
############## visualizations ##############
# transition matrix heatmap for "real" data
df_plot_trans <- mod2$transition_matrix

cols <- c("#e7f0fa", "#c9e2f6", "#95cbee", "#0099dc", "#4ab04a", "#ffd73e", "#eec73a",
 "#e29421", "#e29421", "#f05336", "#ce472e")
t <- max(df_plot_trans$transition_probability)

ggplot(df_plot_trans, aes(y = channel_from, x = channel_to, fill = transition_probability)) +
 theme_minimal() +
 geom_tile(colour = "white", width = .9, height = .9) +
 scale_fill_gradientn(colours = cols, limits = c(0, t),
 breaks = seq(0, t, by = t/4),
 labels = c("0", round(t/4*1, 2), round(t/4*2, 2), round(t/4*3, 2), round(t/4*4, 2)),
 guide = guide_colourbar(ticks = T, nbin = 50, barheight = .5, label = T, barwidth = 10)) +
 geom_text(aes(label = round(transition_probability, 2)), fontface = "bold", size = 4) +
 theme(legend.position = 'bottom',
 legend.direction = "horizontal",
 panel.grid.major = element_blank(),
 panel.grid.minor = element_blank(),
 plot.title = element_text(size = 20, face = "bold", vjust = 2, color = 'black', lineheight = 0.8),
 axis.title.x = element_text(size = 24, face = "bold"),
 axis.title.y = element_text(size = 24, face = "bold"),
 axis.text.y = element_text(size = 8, face = "bold", color = 'black'),
 axis.text.x = element_text(size = 8, angle = 90, hjust = 0.5, vjust = 0.5, face = "plain")) +
 ggtitle("Transition matrix heatmap")

# models comparison
all_mod_plot <- melt(all_models, id.vars = 'channel_name', variable.name = 'conv_type')
all_mod_plot$value <- round(all_mod_plot$value)
# slope chart
pal <- colorRampPalette(brewer.pal(10, "Set1"))
ggplot(all_mod_plot, aes(x = conv_type, y = value, group = channel_name)) +
 theme_solarized(base_size = 18, base_family = "", light = TRUE) +
 scale_color_manual(values = pal(10)) +
 scale_fill_manual(values = pal(10)) +
 geom_line(aes(color = channel_name), size = 2.5, alpha = 0.8) +
 geom_point(aes(color = channel_name), size = 5) +
 geom_label_repel(aes(label = paste0(channel_name, ': ', value), fill = factor(channel_name)),
 alpha = 0.7,
 fontface = 'bold', color = 'white', size = 5,
 box.padding = unit(0.25, 'lines'), point.padding = unit(0.5, 'lines'),
 max.iter = 100) +
 theme(legend.position = 'none',
 legend.title = element_text(size = 16, color = 'black'),
 legend.text = element_text(size = 16, vjust = 2, color = 'black'),
 plot.title = element_text(size = 20, face = "bold", vjust = 2, color = 'black', lineheight = 0.8),
 axis.title.x = element_text(size = 24, face = "bold"),
 axis.title.y = element_text(size = 16, face = "bold"),
 axis.text.x = element_text(size = 16, face = "bold", color = 'black'),
 axis.text.y = element_blank(),
 axis.ticks.x = element_blank(),
 axis.ticks.y = element_blank(),
 panel.border = element_blank(),
 panel.grid.major = element_line(colour = "grey", linetype = "dotted"),
 panel.grid.minor = element_blank(),
 strip.text = element_text(size = 16, hjust = 0.5, vjust = 0.5, face = "bold", color = 'black'),
 strip.background = element_rect(fill = "#f0b35f")) +
 labs(x = 'Model', y = 'Conversions') +
 ggtitle('Models comparison') +
 guides(colour = guide_legend(override.aes = list(size = 4)))

We obtained the heatmap of the transition matrix:

trans_matrix_heatmap

And models comparison indicates substantial differences to existing heuristics such as «first-click» and «last-click» as well as alternative attribution approach:

slope_chart

In the next post, we will study how to do attribution based on the first-order Markov chains in practice. We will see that while the method is pretty simple, there will be a lot of questions that we need to answer and, therefore, apply to the R script. Specifically, we will study how to:

  • define the retrospective period for analysis,
  • manage several purchases per reporting period,
  • deal with paths without purchase,
  • replace «direct» (or any other unknown) channel with a non-direct or known one,
  • calculate both conversions and revenue,
  • etc.

Therefore, the next post will be practical oriented, don’t miss it 🙂

Useful links:

P.S.: just found that Lunametrics also has published the post about.

  • Great Post Sergey!

    • AnalyzeCore

      Thank you, Joao! I know you probably expected this post 🙂

  • ankur verma

    Fantastic post! I’m looking forward to your second part! Excellent read…

  • Pingback: Attribution model with R (part 1: Markov chains concept) – Mubashir Qasim()

  • Amit Ugle

    This is fantastic Sergey. I’m eagerly waiting for the second post. Can you please make a series with real datasets..Thanks a ton….attribution modeling is a core to every marketer and as a marketer, I would love to see a 4-5 part detailed series on blogs with some real datasets from SEM, Google Analytics, CRM, AdWords, PPC, facebook etc

  • Amit Ugle

    Can you please explain this part…..Therefore, the probability of conversion of the complete model is 33.3% (0.667 * 0.5 * 1 * 0.5 + 0.333 * 1 * 0.5.) The probability of conversion after removing the C1 channel is 16.7% (0.333 * 1 * 0.5.) …in fact it would be awesome if you write a separate blog on removal effect..it’s quite importan tto understand that effect in detail….I’m not quite following how did you computed the above calculations…please answer in detail…really appreciate ur help…amazing post again!!!!!

    • AnalyzeCore

      There are 2 paths in the model that lead to conversion (start) -> C1 -> C2 -> C3 -> (conversion) and (start) -> C2 -> C3 -> (conversion) with the transitions probabilities. Therefore, the probability of conversion for the first path is 0.667 * 0.5 * 1 * 0.5 and for the second one is 0.333 * 1 * 0.5

  • Great article! How would you include exponential (time) decay using the probabilistic approach?

    • AnalyzeCore

      I always start with a business goal and then investigate a method for solving. Therefore, it is tough to give a helpful answer without a context but I would try to include time tags to the states (touchpoints) via converting e.g. C1 -> C2 -> conversion to C1-1day -> C1-2day -> C2-1day -> conversion. But again, there would be better ideas when knowing a context.

    • Sure thing so the basic context for time decay is to give more credit to the interactions that happened closer to the conversion. For instance if you clicked on an email 4 weeks ago and had other channels before and after the channels the happened less the 4 weeks would hold more initial credit for the conversion than the email click and other channels or touch points before.

    • AnalyzeCore

      This heuristic model is available in the GA.

    • They do it include it but it has more benefit when used as a component for other models.

  • Brian

    Nice post. I looked at this package a bit ago. I tend to think a classification predictive model (e.g. GBDT) or survival model (discrete time) to be much better because you can control for external as well as internal circumstances, use adstock, etc. Have you considered?

    • AnalyzeCore

      Sure, specifically classification.

  • Great post – good to see people applying data science to marketing / CRO!

    It is possible to pull Google Analytics raw data out using their API or an ETL service like Fivetran (not sure how Segment and RJMetrics fare with GA)

    • AnalyzeCore

      If I’m not mistaken you can’t extract multi-channel touches by user from GA. Maybe, from Big Query, but need to check this. If GA can return such data it is possible to extract raw data directly from R. Thank you!

    • You’re right Sergey, only with BigQuery but you need to be a Google Analytics Premium client, OR have access to clickstream data. @ryan_farley90:disqus there are several open source solutions out there that allow you to own this data, my favorite Snowplow Analytics.

    • Jan Hornych

      You can do it, if you store each session into a custom dimension. The other problem is that GA is rewriting direct access, to avoid that you have to reduce the campaign attribution window, we usually use 4 hours. It is nice data to start playing with, on the other hand there are more issues to be resolved besides this. Missing impressions, multiple devices of one users, missing data in GA, etc.

    • With a bit of extra setup adding the cookie id, session id and hit timestamp to custom dimensions, you can replicate the BigQuery exports of GA360 to get multi touch, there is an example in the “Transforming the data into a form suitable for the the model” stage here: http://code.markedmondson.me/predictClickOpenCPU/ – in that case custom dimenion3 held the cookieID and hit timestamp.

  • Similar to Ryan Farley, but my company uses the enterprise version of HubSpot. Any idea if it’s possible to pull the data out from that? I’d like to test it against HubSpot’s attribution models.

    P.S. I am a noob at this, just got R a few weeks ago.

    • I don’t *believe* that the HubSpot API allows you to get page-level data, I checked with them and am currently using Enterprise tier but I also may have been on lower tier when I checked. Sorry if that’s not super helpful but I had to reread documentation and talk to them because I couldn’t believe it.

  • Pingback: Attribution model with R (part 1: Markov chains concept)()

  • Pingback: Attribution model with R (part 1: Markov chains concept) - KORTX.CO()

  • Pingback: Attribution model with R (part 1: Markov chains concept) – WebProfIT Consulting()

  • Amedeo Bellodi

    Very useful post.
    I was interested in your hypothesis “Transitions from identical channels are possible (e.g. C1 -> C1) but can be omitted for different reasons.” : are you sure that the effect of omitting those transitions is neglectable? What reasons support your hypothesis?
    Thank you in advance.

    • AnalyzeCore

      Thank you! I’m going to post about in the second part of the article.

  • salmi

    Hello and thanks for your post!
    1.Can you explaine how you are doing this :
    C1: 0.5 / (0.5 + 1 + 1) = 0.2 * 1 conversion = 0.2
    C2: 1 / (0.5 + 1 + 1) = 0.4 * 1 conversion = 0.4
    C3: 1 / (0.5 + 1 + 1) = 0.4 * 1 conversion = 0.4

    2. how can i get the data from GA ( id_client, date and channel ) ?

    3.you have mentioned this : ” assume that all paths were finished with conversion” , but in the real case how to do ?

    Waiting your reply , thank you very much .

    • AnalyzeCore

      1. We should distribute conversions proportionally to the Removal effect of each channel. In my example, Removal effects are 0.5, 1 and 1 for C1, C2 and C3 channels and 1 conversion. Therefore, 20% of the 1 conversion goes to C1, 40% to C2 and 40% to C3.
      2. I’m sure it is not a trivial issue. Please read other comments, maybe you find helpful thoughts.
      3. First of all, we want to attribute channels that brought conversions. Therefore, we will do calculations based on paths that were finished with conversion in the real case. We can’t distribute value between channels if value is 0. But we would include paths without conversions to the model for other purposes that I’m going to discuss in the next post.

    • salmi

      thanks Sergey for you response, for the seconde 2, i have read other comments but as you mentionned is not trivial issue , hope that in the your next post you explain haw we can get the data from GA ( id_client, date and channel ) , I’m eagerly waiting for the second post. Nice post! thanks

  • Javier Cuevas

    ¡Great post! 🙂 I use the method and i have 15 states for analyze, but the graphic state (first plot) is very near. ¿How I can configuration the plot (more possible legible ) ? ¡Thank!

  • Alessio Chris Venturini

    Great post! Really amazing! I got a bit lost at the end.
    This line:
    prob = c(0.1, 0.15, 0.05, 0.07, 0.11, 0.07, 0.13, 0.1, 0.06, 0.16)), gives the probability of transition to channel 0, channel 1, etc…?
    How do you compute in R the credit for each channel during the user’s journey using the real data?

    Looking forward to read next post! (when will be available?)

    Thanks a million!

    • AnalyzeCore

      Thank you!
      No, that line is used for data set simulating. There are probabilities of occurrence of marketing channels in this data set.
      The approach for computing on real data is pretty the same but with some specific, that depends on business specific or your goals. I am going to share some thoughts about in the next article once I have enough free time.

  • Aaron Dooley

    Thank you so much for this post. It was very helpful in getting me started. I ran into an issue when applying this to my data and I can not understand why R is creating this markov chain plot. In the Markov matrix, from (start) to (conversion) is 0, while (start) to other variables is not 0. However, when plotted the (start) to (conversion) is 1, while there are no line from (start) to any other variable. Do you know what may cause this?https://uploads.disquscdn.com/images/de9bcb2d661d5b37bc83613a02055e9ca7c2815c814806a13e5bbc44d3d0c0d6.png https://uploads.disquscdn.com/images/a477712f45853807d0d33629afca76283685cc3994404e02764b755e5addf453.png

    • AnalyzeCore

      Try to check (conversion) -> (start) transition probability (maybe you mixed up “from” and “to” states) and check the dummies part of the script carefully.
      In addition, there can be an issue with levels in channels ordering part of script.
      It is hard to solve this without dataset, try to go through the script step by step and check results. Or try to find which transitions were mixed up and assume why.
      If you could share dataset and script with me I would try to debug it when I have free time (bryl.serg@gmail.com)

    • Dean Spasov

      Had the same issue. Apparently, for some reason you need to explicitly set the levels of the factors in this step:

      # ordering channels
      df_trans$channel_from <- factor(df_trans$channel_from,
      levels = c('(start)', '(conversion)', '(null)', 'c1', 'c2', 'c3'))
      df_trans$channel_to <- factor(df_trans$channel_to,
      levels = c('(start)', '(conversion)', '(null)', 'c1', 'c2', 'c3'))

  • Gourmand

    This is an awesome post, I was waiting for some post like that for a long time, thanks a lot!

    Can I ask you two questions though? When you say:
    “Therefore, the probability of conversion of the complete model is 33.3% (0.667 * 0.5 * 1 * 0.5 + 0.333 * 1 * 0.5.) The probability of conversion after removing the C1 channel is 16.7% (0.333 * 1 * 0.5.) Therefore, the channel C1 removal effect is 0.5 (0.167 / 0.333.) In other words, if we didn’t have the channel C1 in customer journeys we would lose 50% of conversions.
    The removal effect of both C2 and C3 is 1 because we would lose all 100% conversion.”

    Do you mean that ‘removal_effect(channel) = 1 – conversion_rate_without_channel/total_conversion_rate’ ?
    Because if it’s just ‘removal_effect(channel) = conversion_rate_without_channel/total_conversion_rate’, removal effect of both C2 and C3 would be 0 (0 / 0.333) and not 1, right?

    Also, I’m trying to implement this in python (and there is no handy library so it’s homemade), and when I compute the removal effects, for each channel, I use a DFS (http://eddmann.com/posts/depth-first-search-and-breadth-first-search-in-python/) on the graph (without the channel) to find all paths to conversion and then sum the probabilities of all the paths to get the conversion rate of the graph without the channel. The thing is, it becomes quickly impossible to compute the model with a very dense graph. Do you have an idea of a more efficient algorithm?

    Thanks a lot again, this is great!

    • AnalyzeCore

      Thanks for reading the blog! Regarding your questions:
      1) Yes, exactly. I’ve added this to the article. Thank you for noticing!
      2) At the beginning, I did all work manually because that package had bugs. My approach was the following: split paths for channel pairs (“from” -> “to” states) and count the same pairs. Then, check each channel: if it is in “from” state than remove that pair, if it is in “to” state than replace it with “null” state (not converted). Use these calculations for computing transition matrix for each channel. And yes, unfortunately, it is not a fast approach because it needs number of transition matrices multiplications for each channel.

  • Abhishek Sinha

    Very good read Sergey. Waiting for your next post

  • Case E

    Thanks for this inspiring post!
    One question remains: while same-state transitions do appear (e.g. channel 0 -> channel 0 -> channel 0), the transition matrix doesn’t show any probabilities for these same-state transitions. Is this a bug in the ChannelAttribution package, or is there a workaround?

    • AnalyzeCore

      Actually, you shouldn’t apply an attribution model for this sequence (or any sequences with one unique channel), because you don’t have a reason for attributing any other channel, you exactly know that “channel 0” brought a conversion.
      I’m planning to write more about in the next article.

    • Case E

      Thanks for the quick response. Maybe I used a somewhat unfortunate example. Suppose I have a number of paths, comprising of channels 0 to 9 as in the post above. One of these paths in particular, looks like [channel 0 -> channel 0 -> channel 1].

      In that case an attribution model would be appropriate. Also, I would expect that my transition matrix contains a probability > 0, going from channel 0 to channel 0. These (same-state) probabilities seem to be missing in the transition matrix heatmap in the post above. I’m trying to figure out why this is the case.

    • AnalyzeCore

      As I know, ChannelAttribution package omits same-state transitions, because mathematically for the first-order Markov chains it doesn’t matter to include or omit same-state transitions. A final result of attribution should be the same. This matters for the higher order models.

    • Case E

      Makes sense! Looking forward to your next article.

    • Ano

      I was also wondering over this, as Anderl et al propose in the article “Mapping the Customer Journey: A Graph-Based Framework for Online Attribution Modeling”, they do indeed allow channel 0 -> channel 0 -> channel 1 to be part of the transistion probability.
      Lets say this journey ends in a conversion i.e. (c0 -> c0 -> c1 -> Conversion(Con)).
      The matrix for just this customer journey would therefore be a 3×3 matrix,with P0,0=0.5, P0,1=0.5, P0,Con=0
      P1,0=0, P1,1=0, P1,Con=1
      PCon,0=0, PCon,1=0, PCon,Con=1

      Sorry for the ugly matrix, but arent Case E onto something here?

    • AnalyzeCore

      I will include an example into the next article.

    • Ano

      Cool, looking forward to it! 🙂

  • Marc Laurent-Atthalin

    Very interesting. Looking forward to the next one on this topic!

  • Peter Wouters

    Really great article! Any idea when we can expect the next post?

    • AnalyzeCore

      This month I hope )))

    • Mark

      Im interested in how the computations of the removal effect looks like.
      In the theoretical example it would be 0.2, 0.4, 0.4. But in the output of the code it is 0.1984570, 0.4007715, 0.4007715. Which is close but not exactly. I found this to be calculated in some other way(but dont know how?), when doing a little more complex example.

    • AnalyzeCore

      Unfortunately, the package is not ideal and have some bugs but provides simple and fast way for computing Markov model. That is why I mentioned it.
      The way of calculating is exactly the same: https://www.slideshare.net/adavide1982/markov-model-for-the-multichannel-attribution-problem
      Personally I use my own script and it works more accurate but slower. The idea is the following: you can replace every channel consequently by NA when it is in ‘from’ state (and do na.omit() then) and with ‘(null)’ when it is in ‘to’ state and compute Removal effect.

  • Peter Niessen

    Great post, very well organized.

    Q: shouldn’t removal effects all sum to 1?
    When run the code for example #2 (‘real data’) and look at mod2$removal effects I get:

    $removal_effects
    channel removal_effects
    channel_3 0.258353
    channel_4 0.395936
    channel_9 0.486638
    channel_6 0.445304
    channel_2 0.205818
    channel_1 0.490531
    channel_0 0.344931
    channel_7 0.348381
    channel_8 0.232753
    channel_5 0.290391

    What’s going on with this?

    • AnalyzeCore

      I can say that sum of weighted removal effects should be 1. There are removal effects before weighting in the mod2$removal.