Introduction
Consider that you’re a marketer working for a company that sells products to customers, as a marketer, you will likely have multiple marketing channels like Instagram, Facebook, Youtube, etc…. or at least multiple campaigns. The more marketing you run, the more interactions a customer will have with your business. We call all of these interactions for individual customers as a journey.
For example, let’s say customer A first finds out about your business in a Facebook ad. After weeks of exposure to different ads, organic content, and offline interactions, customer A gets converted (Buys the product). Below is an example of their customer journey.
This may seem simple enough, but the reality is that it is rarely so linear or clear. So much can happen in a customer journey, both online and offline, and we can’t easily understand what goes on from when a customer first discovers our brand to the time that they convert. This is the messy middle
Problem Description
As a marketer, we will have multiple marketing channels or at least multiple campaigns. so we need to allocate conversion credit to each touchpoint in a users’ journey and therefore attribute conversions to specific marketing channels. In simpler words, we need to find the contribution of each marketing channel in the conversion of users to purchase so that we can invest more in the channel that contributes more to the customer conversions to increase our sales
What is contribution?
Contribution is the science of understanding how your multi-channel marketing strategy contributes to your end result of conversions. If you increase spend in Facebook ads, what happens to conversions or search volume in Google ads? Because everything is intrinsically linked, we need to better understand the relationship between all of our channels.
Solution
One of the simple solutions to this problem is the Attribution model. There are multiple attribution models to understand which marketing channel is contributing most to conversions,
- Last touch attribution model: This is often the default model in most advertising platforms and, as with most models, it is a position-based attribution model. It provides all the credit to the last interaction before a user converts
- First touch attribution model: This model provides all the credits to the first Ad shown to the customer, it is used when a completely a new product is launched, especially in a scenario where the first impression will be the impression
- Linear attribution model: First and last touch are common models, but the issue is they only consider one of the many touchpoints. The step up from this is linear attribution which provides equal credit to all
- Time decay attribution model: If you want something more similar to last touch, the time decay model is for you. It provides a significant portion of the credit to the last touch and very little credit to the first touch.
Markov Chains
Even though we have a number of attribution models, we are going to solve this attribution problem using Markov chains.
Consider that we are company A and we are selling Bluetooth headphones to customers, so we are advertising our product through various marketing channels let's assume we advertise our product through 5 channels
- Online Display
- Paid Search
- Online Video
We want to find the individual contribution of each channel in the conversion of users, Let's solve this using the simple probabilistic model called Markov chains
Now our Ads reach a certain population of people and each one of the persons goes through different types of Ads at different order
For example, a person X can go through the following path
Online display Ad followed by a Instagram Ad followed by a Paid search Ad and finally he buys the bluetooth from us
Similarly, each person can go through different Ad channels in a different order
Constructing a Markov chain
Let's construct a Directed edge graph where each node is a Marketing channel or event and the edge between them represents the probability of a customer going from one node (channel) to another node (channel)
After constructing the Markov chain , we shall compute the value for each edge using probability
For Example:
The value of the edge from Instagram node to online display node is the probabilty of a customer encountering an Online Display Ad after an Instagram Ad
We can compute this probability using our data
Data overview
Our Dataset has 5 Columns and each row contains details about a specific Ad shown to a customer
- Cookie: A unique id that represents a unique user
- Time: timestamp of the event
- Interaction: The type of interaction between the Ad and the user
- Conversion: Binary variable that indicates whether a conversion took place or not (whether the user purchased or not)
- Conversion value: Value of the potential conversion event
- Channel: Channel through which Ad was shown to the customers
Importing the necessary libraries
# importing the libraries
import pandas as pd
import numpy as np
import seaborn as sns
import os
import matplotlib.pyplot as plt
Exploring Data
# printing the first 5 rows of the dataframe
df.head(5)
# checking the shape of the dataset
df.shape
print(f"Our Dataset has {df.shape[0]} rows and {df.shape[1]} no of columns")
# printing the features
for i in df.columns:
print(i)
Univariate Analysis
- Cookie
# checking how many unique cookies do we have
df['cookie'].nunique()
2. Interaction
# checking the distribution of interaction
df['interaction'].value_counts(normalize=True).plot(kind='bar',color=sns.color_palette('pastel'))
# checking the total no of conversions
df['conversion'].sum()
3. Time
earliest_time = df['time'].min()
latest_time = df['time'].max()
print("We have the data\n")
print("From ", pd.to_datetime(earliest_time, format="%Y-%m-%dT%H:%M:%SZ"))
print("To:", pd.to_datetime(latest_time, format="%Y-%m-%dT%H:%M:%SZ"))
4. Channel
# checking the unique channels
for i in df.channel.unique():
print(i)
# checking the distribution of channels
df['channel'].value_counts().plot(kind='bar',color=sns.color_palette('pastel'),rot=0)
plt.ylabel("No of touchpoints")
plt.show()
# checking the total no of conversions
df['conversion'].sum()
Pre-Processing
Sorting the data using the cookie and time
# sorting the df using the
df=df.sort_values(['cookie','time'],ascending=[False,True])
Creating a feature “Visit Order” which captures the chronological order of the event
# creating a feature visit_order
df['visit_order']=df.groupby('cookie').cumcount()
df.head(5)
Creating a new data frame that has 2 columns Cookie and the order of events undergone by the respective cookie
def combine(x):
y=list(x.unique())
return y
# creating a new feature which captures the order of Ads for each cookie
paths=df.groupby('cookie')['channel'].aggregate(combine)
# dropping the duplicates and getting the cookie and conversion column
dropped=df.drop_duplicates('cookie',keep='last')[['cookie','conversion']]
# merging the both dataframes
dff=pd.merge(dropped,paths,how='left',on='cookie')
Defining a function to add starting and ending events for each path
- Start represents the starting event
- Conversion represents an event in which the customer got converted that is, the customer made a purchase
- Null represents an event in which the customer didn't get converted that is, the customer didn’t make any purchase
# custom function to add 'Start','Conversion','Null' events
def check(d):
if d['conversion']==0:
return ['Start']+d['channel']+['Null']
else:
return ['Start']+d['channel']+['Conversion']
# creating a new column
dff['path']=dff.apply(check,axis=1)
# selecting only the cookie and path column
df=dff[['cookie','path']]
# printing the top 20 rows
df.head(20)
# getting the paths
paths_list=df['path']
# counting how many Conversions
total_conversions=0
for i in paths_list:
if 'Conversion' in i:
total_conversions+=1
# calculating the conversion rate
conversion_rate=total_conversions/len(paths_list)
conversion_rate
The conversion rate is 0.07
Unique Channels
# getting the set of unique channels
unique_channels=set( j for i in paths_list for j in i)
unique_channels
Possible Transitions
Generating all possible transitions between all of the events(nodes)
# getting all the possible transitions between the nodes
dic={}
# generating the uniuqe pairs of channels
for x in unique_channels:
# Conversion and Null should not be in the beginning
if x!='Conversion' and x!='Null':
for y in unique_channels:
# Start should not be in the ending
if y!='Start':
# no same channels should be in the pair
if x!=y:
dic[x+'->'+y]=0
possible_transitions=dic
print(possible_transitions)
# checking if each transition present in paths of each user if so increment the transition value
for transition in possible_transitions.keys():
for path in paths_list:
path="->".join(path)
if transition in path:
possible_transitions[transition]+=1
possible_transitions
In the Above dictionary
- key: it is the path or sequence in which customer encounters Ads from various marketing channels
- Value: it is the no of customers who went through the path
For example, The key ‘Online Video — -> Online Display’ has the value 775 it indicates 775 customers went from Online Display
Transition Probabilities
Calculating the transition probabilities
# definfing a list which as all of the states (Nodes)
lst=list(unique_channels)
lst
Let's create a transition probability matrix that stores the transition probabilities
# creating the Transition Prob matrix wh
prob_matrix=np.matrix(np.zeros((8,8)))
print(prob_matrix)
# filling the probability matrix
for i,j in possible_transitions.items():
i=i.split('->')
# findind the row
row=lst.index(i[0])
# finding the column
col=lst.index(i[1])
# setting the value of prob matrix
prob_matrix[row,col]=j
prob_matrix
# calculating probabilities (dividing each element by the row sum)
prob_matrix=prob_matrix/prob_matrix.sum(axis=1)
# replacing the Nan with Zeros
np.nan_to_num(prob_matrix, copy=False, nan=0.0, posinf=None, neginf=None)
prob_matrix
fig, ax = plt.subplots(figsize=(15,7))
sns.heatmap(prob_matrix,annot=True,xticklabels=lst,yticklabels=lst,cmap="Greens",ax=ax)
The above heatmap represents the transition probability matrix
- An element Aij represents the probability of going from the ith state to the jth state
Removal Effect and Contribution of each channel
To find the contribution of a Channel i
- PCi : Calculate the total probability
- P’Ci : Remove the node that corresponds to channel i and then calculate the total probability
- Contribution of Channel i : 1-(PCi/P’Ci)
# definfing a function which computes the removal effect of a channel
def removal_effect(channel,data):
# calculating the probability of conversion before the removal of channel
before_removing=data['conversion'].sum()/len(data)
conversions=data['conversion'].sum()
# removing the channel
removed=data[data['channel']!=channel]
#calculating the prob of conversion after removing the channel
after_removing=removed['conversion'].sum()/len(data)
# calculating the contribution
contribution=1-(after_removing/before_removing)
return contribution
Importing the data
# inbetween we have modified the original df so lets import it freshly
df=pd.read_csv("drive/MyDrive/attribution data.csv")
Let’s go through each marketing channel and find its contribution
# iterating over the channels and finding the contribution of each channel
contributions={}
for channel in lst:
contributions[channel]=removal_effect(channel,df)
contributions
re_sum=np.sum([i for i in contributions.values()])
attributions={k: (v / re_sum) *total_conversions for k, v in contributions.items()}
del attributions['Start'],attributions['Conversion'],attributions['Null']
# plotting the individual contributions of each channels
attributions=pd.Series(attributions)
attributions.plot(kind='bar',rot=0,figsize=(7,5),color=sns.color_palette('colorblind'))
plt.ylabel("conversions")
plt.show()
Conclusion
The plot generated through the Markov chains clearly shows that Paid search and Facebook ads contributed more to the conversions followed by Online Video Channels and Online Display