Introduction to Markov Chain Attribution Modeling in Digital Marketing Part#1

Mina Mehdinia
8 min readMay 31, 2024

--

Understanding the customer journey has always been tough for marketers. In recent years, as customers use more blogs, reviews, and social media, their paths to making a purchase have grown longer. This makes it hard for marketers to figure out which channel really leads to sales. In this blog, we will go through one of the data-driven models, like the Markov Model, to find the channel importance given the customer journeys.

Overview

Before purchasing a product (converting), a customer goes through multiple channels (touchpoints) such as Google ads, Facebook, Organic search, or Direct. It’s imperative to understand the customer journey to build a data-driven model based on different touchpoints.

Example of a Customer Journey

For the past few decades, the attribution model has relied heavily on heuristic models like first click, last click, time decay, and position based. It has no merit when the path is short. Marketers are looking for an effective segmentation of customers based on their likelihood of conversion to reach out to them in a timely manner through advertising campaigns and promotions to maximize the return on investment.

What is Markov Chain Attribution Modeling?

A Markov Chain is a mathematical system that undergoes transitions from one state (or event) to another on a state space. It is characterized by the fact that the next state depends only on the current state and not on the sequence of events that preceded it. In the context of digital marketing, Markov Chain attribution modeling views the customer journey as a series of transitions between channels (or touchpoints) that lead toward a conversion.

Benefits of Markov Chain Attribution

Markov Chain models offer several advantages over simpler attribution models like last-click or first-click:

  • Holistic View: It considers the entire conversion path, not just the first or last touch.
  • Flexibility: It can easily incorporate different types of customer interactions.
  • Quantitative Analysis: It provides probabilistic insights into how likely a channel influences conversions.

Understanding Markov Chain Attribution in Digital Marketing

Let’s explore Markov Chain Attribution through a digital marketing campaign, specifically detailing how various online interactions contribute to a customer’s journey from awareness to conversion. This method allows marketers to identify the value of each touchpoint in influencing the customer’s decision to make a purchase.

Digital Marketing Campaign Scenario

Imagine you’re running a digital marketing campaign for an online apparel store. The customer journey might involve several touchpoints, each representing a state in the Markov Chain model:

  1. State 1 (Instagram Ad): A potential customer sees a Instagram ad for your store and clicks on it. This is their first exposure to your brand and the starting point of their journey.
  2. State 2 (Google Search): After clicking the ad, the visitor lands on a google to visit your website. They browse different items but don’t make a purchase yet.
  3. State 3 (Email Subscription): Before leaving the site, the visitor subscribes to your newsletter, indicating an interest in receiving more information.
  4. State 5 (Purchase): After receiving the email, the visitor returns to your website using the discount code provided and completes a purchase.

Record transitions between these states based on customer data. For example, how many move from viewing the instagram ad (State 1) to visiting the product page (State 2). Let us try to understand the customer journey in detail. In the below example, different customers take different paths within a given timeframe.

Markov Model

Markov model starts by building the network based on millions of paths followed by different customers. In this blog, we are only looking at 3paths one of which is converting (where the customer journey end in a purchase) and 2 are null (where the customer journey ends in a purchase), therefore, the network looks simple.

These Paths can be shown as bellow:

First we need to count the occurrences of each transition between states. Let’s count:

Now, let’s calculate the probabilities for each transition. The probability of transitioning from one state to another is the number of times that transition occurs divided by the total number of outcomes from the starting state.

At the end we have something like that:

To find the overall probability of conversion in this Markov Chain model, we will use the transition probabilities calculated earlier to follow through the possible paths that lead to a conversion. There are three paths that leads to conversion.

The path to Conversion:

The total probability of conversion in path#1 is 1/3(i.e. 33%)

The total probability of conversion in path#3 is 1/4(i.e. 25%)

The total probability of conversion in path#4 is 1/6(i.e. 16%)

Now, Sum these probabilities to get the overall probability of reaching a conversion starting from “Start”:

The total probability of conversion at 75% in the context of Markov Chain Attribution Modeling provides a quantifiable measure of how likely it is that a customer journey starting from the initial touchpoint (e.g., seeing an ad or receiving an email) will end in a conversion (such as making a purchase).

Interpretation of Probability of Conversion

  1. High Conversion Likelihood: A 75% probability of conversion indicates that three out of every four paths through the modeled customer journey lead to a purchase. This suggests a highly effective sequence of touchpoints that effectively guide customers from initial awareness to the final action of buying.
  2. Effectiveness of Marketing Mix: This high probability points to a strong alignment between the customer’s needs and the marketing efforts. It suggests that the touchpoints are well-placed and that the message resonates well with the target audience, successfully moving them through the funnel.
  3. Benchmark for Optimization: While 75% is a strong conversion rate, the focus can also be on the remaining 25% of paths that did not convert. This provides a benchmark for further analysis and optimization. Marketers can look into why these paths didn’t convert and what can be improved, whether it involves tweaking the message, repositioning touchpoints, or addressing gaps in the customer experience.

Understanding the Removal Effect in Markov Chain Attribution Modeling

One of the key computations in this method is the “Removal Effect.” This metric helps marketers understand the importance of individual touchpoints by measuring how the removal of a particular node (touchpoint) affects the overall likelihood of conversion. Let’s see what happens.

Removing Instagram Ad:

With Instagram Ad removed, Paths 1, 2, and 3 are no longer valid because they include the Instagram Ad. Only Path 4 remains.

Removing Email:

With Email removed, Path 4 is no longer valid because it includes the Email. Only Path 3 remains valid as it does not pass through Email.

Removing Google Search:

Removing Google Search eliminates all paths that lead to conversion because each conversion path requires passing through Google Search.

P(Conversion without Google Search) = 0

Calculate Removal Effects:

It is calculated according to the formula:

Interpretation:

  • Instagram Ad: Removing the Instagram Ad leads to a reduction in conversion probability by about 77.77%, signifying its crucial role in the conversion pathway.
  • Email: The impact of removing Email results in a 66.67% decrease in conversion probability, indicating its substantial influence in nurturing leads.
  • Google Search: Removing Google Search shows a complete dependency with a 100% reduction in conversion probability, highlighting its essential role in securing conversions.

In the above table, we compute the scenarios by removing each channel from the conversion path. However, the removal effects of all nodes do not sum up to 100%. Therefore, the removal effects must be Normalized.

How to Normalize the Removal Effect:

Normalizing the Removal Effect can be beneficial, particularly in contexts where you want to compare the relative importance of different channels across various scenarios or different attribution models. Normalizing helps in making these comparisons more meaningful by adjusting the range of Removal Effects to a standard scale, such as between 0 and 1, where 1 represents the maximum possible impact a channel could have on conversion.

Normalization can be done by dividing each Removal Effect by the sum of all Removal Effects. This approach scales all values so that their sum equals 1, allowing each value to represent a part of a whole.

The normalized Removal Effect is calculated by dividing the Removal Effect of each channel by the total sum of all Removal Effects:

After normalization:

  • Instagram Ad has a normalized Removal Effect of approximately 31.81%, showing it has a significant impact.
  • Email has a normalized Removal Effect of approximately 27.29%, indicating it also plays a major role but slightly less than Instagram Ad.
  • Google Search remains the most critical channel with a normalized effect of approximately 40.90%.

Challenges and Future Directions

Calculating a Markov chain is not straightforward in the real world since the data is often complex and noisy. One effective method for tackling this complexity is using the ChannelAttribution library. In the next blog, we’ll explore how to use the ChannelAttribution library and how to interpret its results, providing a practical guide to implementing this advanced attribution model.

Conclusion

Markov Chain Attribution offers a sophisticated approach to understanding digital marketing dynamics and optimizing channel performance. By capturing the subtle interactions and transitions between various customer touchpoints, it allows marketers to fine-tune their strategies based on quantitative insights, ultimately leading to more effective marketing investments and strategies. As we move forward, leveraging advanced tools like the ChannelAttribution library will be crucial in effectively implementing and deriving maximum value from Markov Chain models in complex real-world scenarios.

--

--

Mina Mehdinia

Formally trained data scientist from four-year university program. Proficient in developing, training, and evaluating machine learning and deep learning models