MACHINE LEARNING PROJECT (DIGITAL MARKETING DOMAIN)

8 min readApr 22, 2022

(Attribute_Marketing_Models)

Machine Learning is changing the world now, you name the industry and somebody has built a Machine Learning model to solve a problem in that industry.

Whether its Medical Domain, HR domain, Sales Domain all kinds of Machine Learning models are there to solve the problems in the respective domain.

A very interesting domain is Digital Marketing, Nowadays There is no company which is not relying on Digital Marketing to get the business, so in this blog we gonna talk about,

“HOW MACHINE LEARNING IS CHANGING THE DIGITAL MARKETING INDUSTRY?”

Let’s start with a very simple example, let’s say a user want to buy a Mobile phone and lets assume he wants to buy a iPhone specifically, now he went to directly amazon to see the price and different models, then he spent some time on amazon and then closed the app.

Now Amazon as a company understood that the user could be a potential buyer so it does not want to let go customer like this.

There are some channels on which Amazon has spent money to acquire customer, when i say channels, I mean Digital Marketing Channels, examples could be-:

GOOGLE ADS
FACEBOOK FEED
INSTAGRAM FEED
YOUTUBE ADS
TWITTER FEED

And many more…

Now lets say that user after closing Amazon App:-

The User went to Facebook and there he saw the FACEBOOK AD IMPRESSION of iPhone, he ignored the ad and move on.
Then he went to google and search about the iPhone, there he saw the GOOGLE AD IMPRESSION, he ignored the ad and move on.
Now, the user was on Instagram, again INSTAGRAM AD IMPRESSION was observed to user, he moved on.
Again on the Youtube, he saw the ad

Now after a while, he went to amazon website directly and did the purchase.

Company got the customer and let’s say the user spent around 1 lakh rupees to purchase the iPhone.

Now the important question is,

“HOW MUCH CONTRIBUTION EACH CHANNEL DID FOR THE PURCHASE OF 1 LAKH RUPEES?

“HOW MUCH DID EACH AD IMPRESSION/CLICK CONTRIBUTE TO FINAL PURCHASE”?

Now this question is very simple but the finding the solution to this could be very challenging because we need to understand that what finally hit a customer to go for the product, what exactly works in the marketing campaign and more importantly what did not work out so that they can plan their Marketing Funds accordingly.

For example lets say from 80 percent of the sales from this marketing campaign comes from Youtube ADs, 10 percent comes from Google Search, 5 percent comes from Instagram, 5 percent from the Facebook.

Now if a company can get this detailed data about the percentage revenue from different channels then it will give more weightage of marketing budget to Youtube than other marketing channels.

This can literally help companies to understand the market in depth and will increase their revenue over time.

POSSIBLE APPROACHES

Approach-1-(Rule Based/Intuitive system)

These are some simple rule based systems which are more like gut-feeling driven.
Here we can say clicks are more Important than Impressions, Impressions are simply when you see an ad but you did not click it.
Search ads are more influential than Display ads, this is a simple intution that if a person searches something on the internet, then there are more chances that he/she will buy it. But when a person just sees a ad on any social media there are less chances for that conversion.
One simple rule is LAST INTERACTION ATTRIBUTION MODEL, for example, lets say i went to facebook, i saw an AD of Headphone, then i went to youtube, i saw the same ad , then i went to Google and i searched those Headphones and i clicked on the Google Ad then i finally purchased the Headphone, so according to LAST INTERACTION ATTRIBUTION MODEL,We will give 90% credit to the last channel(Google ADs) and rest small percentages to rest of the channels.
Another simple rule based system could be FIRST ATTRIBUTION CHANNEL, this means we are assuming that the first ad which user saw was responsible for the conversion, here 90% credit is given to the first channel and rest to other channels, This normally happens with luxurious items or Any New Product, lets say A NEW MACBOOK where the channel could be the youtube presentation by APPLE.
Another could be a TIME DECAY CHANNEL where we assume that with time the percentage credit to channels should Increase, for example lets say we showed the Smartphone ad to the user on Google Search AD, then we showed the ad to user on Instagram feed, then we showed the ad to the user on Facebook feed then we showed the ad on youtube. Now lets say user bought the smartphone, now according to the Time Decay model we will give maximum credits percentage of conversion to Youtube lets say 60%, then Facebook lets say 20%, then Instagram lets say 10%, then at last Google Ads lets say 5%.

Approach-2-(Regression Based Machine Learning Models)

Why not map this problem as a Machine Learning Problem???

So lets say we have some old data which tells us about the revenue by different channels and the total sales for that campaign, now we can build a simple regression model where features will be the different channels and sales will be the target variable.

Regression will tell us how independent features(different channels) are mathematically related to dependent features (sales).

y= x1(Youtube) + x2(Facebook) + x3(Instagram) + x4(GoogleAds)

If I just find the values x1,x2,x3 and x4 then it will give me a good approximation of, how much Youtube is contibuting, how much Facebook is contributing, how much Instagram is contibuting and how much GoogleAds is contibuting.

Thats what we want,Right…

Now before going to another approach which is a Marcov Chains, lets see some code of regression approach for Attribute Modelling

Data for Regression Based Attribute Model

I have downloaded the data from kaggale, this seems a dummy data but not bad for understanding the regression approach.

Link for the data:- https://www.kaggle.com/datasets/sazid28/advertising.csv

Data has 3 features or channels , First is TV, Second is Radio, Third is Newspaper. In this data we have revenue amount by different channels and then we have a total sales column.

For example- In first row, Total Sales is 22.1 lakhs, 2 lakh 30 thousand comes from TV, 37 thousand comes from Radio, 69 thousand comes from Newspaper, other amount can be assumed from direct sales.

Performance Metric

We need to check the Performance of the Model

1.RMSE(ROOT MEAN SQUARE ERROR)

2. R² (Coefficient of Determinant)

3. Adjusted R²

Lets see some code for Regression Model…

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

import warnings
warnings.filterwarnings("ignore")#simply loading the data
data=pd.read_csv("Advertising.csv")
data2=data.drop(['Unnamed: 0'],axis=1)#pair plot
sns.pairplot(data2)

Observation

Ideally we want no correlation between features or independent features and here we can see that there is no correlation between TV and Radio, there is no correlation between TV and Newspaper, There is no correlation between Newspaper and Radio and thats a good sign.
We can also see that there is positive correlation between TV and Sales, Radio and Sales, Newspaper and Sales, this simply means all three of them is contributing in sales someway or the other.
We can see the histogram of radio is almost uniformly distributed, it means the revenue by radio has same frequencies in the the available ranges.
We can see the histogram of newspaper is power-law distributed, it means most of the revenue by newspaper is low compared to TV and Radio.
We can see the histogram of TV is approximately Left-Skewed, it means most of the revenue is towards the high side when compared to TV and Radio.

#lets find the correlation between feature
#HEATMAP
plt.figure(figsize=(8, 6))
heatmap = sns.heatmap(data2.corr(), vmin=-1, vmax=1, annot=True)
heatmap.set_title('Correlation Heatmap', fontdict={'fontsize':12}, pad=12)

Observation

There is a high correlation number between TV and Sales which is 0.78 means as we market our product on TV, our sales increases significantly
Then there is correlation between Radio and Sales which is 0.58 means Radio is also contibuting significantly to sales but not as much as TV.
Then we have also a positive correaltion between Sales and Newspaper but less compared to TV and Radio, it simply means giving advertisement to newspaper adds value in revenue but not as compared to TV and radio.
Other Correlation ratios are very small so we can ignore them.

#Just Dividing data into input(features) and output(target variable)
x2=data2.iloc[:,0:3]
y2=data2['sales']

Modelling

We gonna Train Linear regression ,Random Forest Regressor, GBDT Regressor

For the more detailed code, please refer https://github.com/Nishesh2115/Attribute_Marketing-Marketing_Mix_Models/blob/main/Attribution%20Modelling%20In%20Marketing.ipynb

Feature Importance Score for Random Forest

This kind of equation really helps us to understand the contribution of each channel to revenue.sales = (0.6828260179480447 * TV) + (0.30882519413197784 *  Radio) + (0.008348787919977457 * Newspaper)

Feature Importance of Linear Regression

sales = (0.0475577623724944 * TV) + (0.1801944159143916 *  Radio) + (0.006464961489965146 * Newspaper)Linear regression is giving more importance to Radio then TV and Newspaper.

Feature Importance of GBDT

sales = (0.7091736793518066 * TV) + (0.28797149658203125 *  Radio) + (0.002854815451428294 * Newspaper)GBDT is giving high importance to TV

CONCLUSION

from prettytable import PrettyTable
columns=(["ALGORITHM", "MEAN_SQUARE_ERROR","R2 VALUE","ADJUSTED R2 VALUE"])
myTable = PrettyTable()
myTable.add_column(columns[0], ["Random Forest","Linear regression","Linear Regression with L2 Regularizer", "GBDT" ])
myTable.add_column(columns[1], ["0.333"," 2.62","2.63","0.486"])
myTable.add_column(columns[2], ["0.982","0.859","0.858","0.9738"])
myTable.add_column(columns[3],["0.981","0.856","0.856","0.973"])

print(myTable)

So out of all four algorithms, Random Forest has the least mean square error which is 0.33, we want mean sqaure error to be minimum.
Out of all four algorithms, Random forest has the maximum R2 value which is 0.982.
Out of all four algorithms, Random forest has the maximum R2 value which is 0.981.
Also Random Forest is giving us the very good Feature Importance scores

sales = (0.6828260179480447 TV) + (0.30882519413197784 Radio) + (0.008348787919977457 * Newspaper)

It can be simply interpretated that 68.2% of the sales coming from the TV marketing, 30% of the sales coming from the Radio Marketing and only 0.8% sales coming from the Newspaper marketing

So the best algorithm in this case study will be RANDOM FOREST.

Approach-3 is Markov Model, let me write an another blog for that concept.

For more Data Science Case Studies, you can follow me on medium.

Thanks for Reading…

Nishesh Gogia

If you enjoyed this, follow me here for more
Interested in collaborating? Let’s connect on Instagram