Marketing Attribution using Markov chains

9 min readMay 30, 2024

Introduction

Consider that you’re a marketer working for a company that sells products to customers, as a marketer, you will likely have multiple marketing channels like Instagram, Facebook, Youtube, etc…. or at least multiple campaigns. The more marketing you run, the more interactions a customer will have with your business. We call all of these interactions for individual customers as a journey.

For example, let’s say customer A first finds out about your business in a Facebook ad. After weeks of exposure to different ads, organic content, and offline interactions, customer A gets converted (Buys the product). Below is an example of their customer journey.

This may seem simple enough, but the reality is that it is rarely so linear or clear. So much can happen in a customer journey, both online and offline, and we can’t easily understand what goes on from when a customer first discovers our brand to the time that they convert. This is the messy middle

Problem Description

As a marketer, we will have multiple marketing channels or at least multiple campaigns. so we need to allocate conversion credit to each touchpoint in a users’ journey and therefore attribute conversions to specific marketing channels. In simpler words, we need to find the contribution of each marketing channel in the conversion of users to purchase so that we can invest more in the channel that contributes more to the customer conversions to increase our sales

What is contribution?

Contribution is the science of understanding how your multi-channel marketing strategy contributes to your end result of conversions. If you increase spend in Facebook ads, what happens to conversions or search volume in Google ads? Because everything is intrinsically linked, we need to better understand the relationship between all of our channels.

Solution

One of the simple solutions to this problem is the Attribution model. There are multiple attribution models to understand which marketing channel is contributing most to conversions,

Last touch attribution model: This is often the default model in most advertising platforms and, as with most models, it is a position-based attribution model. It provides all the credit to the last interaction before a user converts
First touch attribution model: This model provides all the credits to the first Ad shown to the customer, it is used when a completely a new product is launched, especially in a scenario where the first impression will be the impression
Linear attribution model: First and last touch are common models, but the issue is they only consider one of the many touchpoints. The step up from this is linear attribution which provides equal credit to all
Time decay attribution model: If you want something more similar to last touch, the time decay model is for you. It provides a significant portion of the credit to the last touch and very little credit to the first touch.

Markov Chains

Even though we have a number of attribution models, we are going to solve this attribution problem using Markov chains.

Consider that we are company A and we are selling Bluetooth headphones to customers, so we are advertising our product through various marketing channels let's assume we advertise our product through 5 channels

Instagram
Online Display
Paid Search
Facebook
Online Video

We want to find the individual contribution of each channel in the conversion of users, Let's solve this using the simple probabilistic model called Markov chains

Now our Ads reach a certain population of people and each one of the persons goes through different types of Ads at different order

For example, a person X can go through the following path
Online display Ad followed by a Instagram Ad followed by a Paid search Ad and finally he buys the bluetooth from us

Similarly, each person can go through different Ad channels in a different order

Constructing a Markov chain

Let's construct a Directed edge graph where each node is a Marketing channel or event and the edge between them represents the probability of a customer going from one node (channel) to another node (channel)

After constructing the Markov chain , we shall compute the value for each edge using probability

For Example:

The value of the edge from Instagram node to online display node is the probabilty of a customer encountering an Online Display Ad after an Instagram Ad

We can compute this probability using our data

Data overview

Our Dataset has 5 Columns and each row contains details about a specific Ad shown to a customer

Cookie: A unique id that represents a unique user
Time: timestamp of the event
Interaction: The type of interaction between the Ad and the user
Conversion: Binary variable that indicates whether a conversion took place or not (whether the user purchased or not)
Conversion value: Value of the potential conversion event
Channel: Channel through which Ad was shown to the customers

Importing the necessary libraries

# importing the libraries
import pandas as pd
import numpy as np
import seaborn as sns
import os
import matplotlib.pyplot as plt

Exploring Data

# printing the first 5 rows of the dataframe
df.head(5)

# checking the shape of the dataset
df.shape
print(f"Our Dataset has {df.shape[0]} rows and {df.shape[1]} no of columns")

# printing  the features
for i in df.columns:
    print(i)

Univariate Analysis

Cookie

# checking how many unique cookies do we have
df['cookie'].nunique()

2. Interaction

# checking the distribution of interaction
df['interaction'].value_counts(normalize=True).plot(kind='bar',color=sns.color_palette('pastel'))

# checking the total no of conversions
df['conversion'].sum()

3. Time

earliest_time = df['time'].min()
latest_time = df['time'].max()

print("We have the data\n")
print("From ", pd.to_datetime(earliest_time, format="%Y-%m-%dT%H:%M:%SZ"))
print("To:", pd.to_datetime(latest_time, format="%Y-%m-%dT%H:%M:%SZ"))

4. Channel

# checking the unique channels
for i in df.channel.unique():
    print(i)

# checking the distribution of channels
df['channel'].value_counts().plot(kind='bar',color=sns.color_palette('pastel'),rot=0)
plt.ylabel("No of touchpoints")
plt.show()

# checking the total no of conversions
df['conversion'].sum()

Pre-Processing

Sorting the data using the cookie and time

# sorting the df using the
df=df.sort_values(['cookie','time'],ascending=[False,True])

Creating a feature “Visit Order” which captures the chronological order of the event

# creating a feature visit_order
df['visit_order']=df.groupby('cookie').cumcount()
df.head(5)

Creating a new data frame that has 2 columns Cookie and the order of events undergone by the respective cookie

def combine(x):
    y=list(x.unique())
    return y

# creating a new feature which captures the order of Ads for each cookie
paths=df.groupby('cookie')['channel'].aggregate(combine)

# dropping the duplicates and getting the cookie and conversion column
dropped=df.drop_duplicates('cookie',keep='last')[['cookie','conversion']]

# merging the both dataframes
dff=pd.merge(dropped,paths,how='left',on='cookie')

Defining a function to add starting and ending events for each path

Start represents the starting event
Conversion represents an event in which the customer got converted that is, the customer made a purchase
Null represents an event in which the customer didn't get converted that is, the customer didn’t make any purchase

# custom function to add 'Start','Conversion','Null' events
def check(d):
    if d['conversion']==0:
        return ['Start']+d['channel']+['Null']
    else:
        return ['Start']+d['channel']+['Conversion']

# creating a new column
dff['path']=dff.apply(check,axis=1)

# selecting only the cookie and path column
df=dff[['cookie','path']]

# printing the top 20 rows
df.head(20)

# getting the paths
paths_list=df['path']

# counting how many Conversions
total_conversions=0
for i in paths_list:
    if 'Conversion' in i:
        total_conversions+=1

# calculating the conversion rate
conversion_rate=total_conversions/len(paths_list)
conversion_rate

The conversion rate is 0.07

Unique Channels

# getting the set of unique channels
unique_channels=set( j for i in paths_list for j in i)
unique_channels

Possible Transitions

Generating all possible transitions between all of the events(nodes)

# getting all the possible transitions between the nodes

dic={}

# generating the uniuqe pairs of channels
for x in unique_channels:
    # Conversion and Null should not be in the beginning
    if x!='Conversion' and x!='Null':
        for y in unique_channels:
            # Start should not be in the ending
            if y!='Start':
                # no same channels should be in the pair
                if x!=y:
                    dic[x+'->'+y]=0

possible_transitions=dic
print(possible_transitions)

# checking if each transition present in paths of each user if so increment the transition value
for transition in possible_transitions.keys():
    for path in paths_list:
        path="->".join(path)
        if transition in path:
            possible_transitions[transition]+=1
possible_transitions

In the Above dictionary

key: it is the path or sequence in which customer encounters Ads from various marketing channels
Value: it is the no of customers who went through the path

For example, The key ‘Online Video — -> Online Display’ has the value 775 it indicates 775 customers went from Online Display

Transition Probabilities

Calculating the transition probabilities

# definfing a list which as all of the states (Nodes)
lst=list(unique_channels)
lst

Let's create a transition probability matrix that stores the transition probabilities

# creating the Transition Prob matrix wh
prob_matrix=np.matrix(np.zeros((8,8)))
print(prob_matrix)

# filling the probability matrix
for i,j in possible_transitions.items():
    i=i.split('->')

    # findind the row
    row=lst.index(i[0])
    # finding the column
    col=lst.index(i[1])

    # setting the value of prob matrix
    prob_matrix[row,col]=j
prob_matrix

# calculating probabilities (dividing each element by the row sum)
prob_matrix=prob_matrix/prob_matrix.sum(axis=1)

# replacing the Nan with Zeros
np.nan_to_num(prob_matrix, copy=False, nan=0.0, posinf=None, neginf=None)
prob_matrix

fig, ax = plt.subplots(figsize=(15,7))
sns.heatmap(prob_matrix,annot=True,xticklabels=lst,yticklabels=lst,cmap="Greens",ax=ax)

The above heatmap represents the transition probability matrix

An element Aij represents the probability of going from the ith state to the jth state

Removal Effect and Contribution of each channel

To find the contribution of a Channel i

PCi : Calculate the total probability
P’Ci : Remove the node that corresponds to channel i and then calculate the total probability
Contribution of Channel i : 1-(PCi/P’Ci)

# definfing a function which computes the removal effect of a channel

def removal_effect(channel,data):

    # calculating the probability of conversion before the removal of channel
    before_removing=data['conversion'].sum()/len(data)
    conversions=data['conversion'].sum()

    # removing the channel
    removed=data[data['channel']!=channel]

    #calculating the prob of conversion after removing the channel
    after_removing=removed['conversion'].sum()/len(data)

    # calculating the contribution
    contribution=1-(after_removing/before_removing)

    return contribution

Importing the data

# inbetween we have modified the original df so lets import it freshly
df=pd.read_csv("drive/MyDrive/attribution data.csv")

Let’s go through each marketing channel and find its contribution

# iterating over the channels and finding the contribution of each channel
contributions={}
for channel in lst:
    contributions[channel]=removal_effect(channel,df)
contributions

re_sum=np.sum([i for i in contributions.values()])
attributions={k: (v / re_sum) *total_conversions for k, v in contributions.items()}
del attributions['Start'],attributions['Conversion'],attributions['Null']

# plotting the individual contributions of each channels
attributions=pd.Series(attributions)
attributions.plot(kind='bar',rot=0,figsize=(7,5),color=sns.color_palette('colorblind'))
plt.ylabel("conversions")
plt.show()

Conclusion

The plot generated through the Markov chains clearly shows that Paid search and Facebook ads contributed more to the conversions followed by Online Video Channels and Online Display