Data-Driven Marketing Optimization

Harnessing the Power of Machine Learning Models, Visualizations, and Hypothesis Testing for Marketing Success

7 min readAug 9, 2023

Context:

As a Data Analyst, I have been tasked by the Chief Marketing Officer (CMO) to analyse the marketing campaign data and provide data-driven insights and suggestions. My goal is to use the data to gain valuable insights, identify patterns, and offer informed recommendations that can optimise our marketing strategies and decision-making process to get more “purchases” of bank policy.

Dataset:

Bank Marketing Dataset

Predicting Term Deposit Subscriptions

www.kaggle.com

Analysis Process:

Understand, clean/ transform the data and build Machine Learning Models that predict if a customer purchases policy or not.
Use SHAP to understand feature importance.
Gain insights based on feature importance using data visualisation.
Perform Hypothesis testing to evaluate the statistical importance of insights.
Recommend marketing strategy for better results.

Building Machine Learning Model:

Loading the data

import pandas as pd
import numpy as np
df=pd.read_csv('BankMarketing.csv')

Checking the data information. Could see no NULL values

df.info()

df.head()

Pre-processing:

There are many categorical columns in our dataset, we need to convert them to numeric.
To compare the monthly trend of the success rate of the campaign we need to fetch the month from the ‘date’ column.
We have a column ‘duration’ in our dataset which represents call duration with customers, the important thing to note here is this attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call, y is obviously known. So we need to remove this column before building the ML model.
Shuffle the data to enhance the model’s generalization and improve its ability to handle unseen data.
Normalise the data to prevent any feature from dominating others due to differences in their scales.

Functions to convert categorical values to numeric:

def categorical_to_numerical(df,col):
    df=pd.get_dummies(data=df,columns=col)
    return df

def categorical_to_numerical_drop_first(df,col):
    df=pd.get_dummies(data=df,columns=col,drop_first=True)
    return df

We use the second function ‘categorical_to_numerical_drop_first’ to prevent multicollinearity by dropping the first encoded category.

df=categorical_to_numerical(df,['marital','job','education','contact','poutcome'])
df=categorical_to_numerical_drop_first(df,['housing','loan','Purchased?','default'])

Create column ‘month’ and fetch the month from the ‘date’ column

df['month']=np.nan

for index,row in df.iterrows():
    df['month'].iloc[index]=row['date'].split('/')[0]

Drop the ‘duration’ and ‘date’ columns

df.drop(['date','duration'],axis=1,inplace=True)

Shuffle the data

df = df.sample(frac=1, random_state=3)

Normalizing data

from sklearn.preprocessing import MinMaxScaler

minmax=MinMaxScaler()

min_max_data=minmax.fit_transform(df.drop('Purchased?_yes',axis=1))
x=pd.DataFrame(min_max_data,columns=df.columns.drop('Purchased?_yes'))

Splitting the data to train and test

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,df['Purchased?_yes'],test_size=0.25)

Building the Random Forest model

from sklearn.ensemble import RandomForestClassifier

classifier=RandomForestClassifier()
classifier.fit(x_train,y_train)

Testing the model and accuracy

y_cap=classifier.predict(x_test)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_cap)

print(f"Accuracy: {accuracy*100:.2f}")

Understanding feature importance using SHAP

import shap
shap.initjs()

explainer = shap.Explainer(classifier)
shap_values = explainer.shap_values(x_test)
shap.summary_plot(shap_values[1], x_test)

Findings and factors to investigate further:

House Loan or Personal Loan holders may show reduced interest in purchasing a policy.
The impact of months on policy purchase seems to be mixed and requires further investigation.
Elderly customers may be more inclined to purchase a policy.
Customers with higher bank balances may have a higher likelihood of purchasing a policy.
A higher number of campaigns might lead to a lower likelihood of customers purchasing a policy.
Past customers who have made previous purchases may exhibit a higher tendency to purchase the current policy.
Further exploration is needed to analyze the relationship between customers’ job roles and their success rate in purchasing policies.

The success rate of customers purchasing with respect to loan

Customers with no housing loan tend to purchase policies more than the ones with housing loans.

The same goes for customers with personal loans.

Impact of months on the policy purchase

We observe that our campaign frequency is higher during the spring and summer seasons. However, the effectiveness of these campaigns is relatively lower. On the other hand, we conduct fewer campaigns during the winter and fall seasons, but their success rate is significantly better.

Elderly customer's impact on the policy purchase

Let us first have a look at which age group people buy most of the policies:

The majority of our policies are purchased by customers aged between 25 and 50 years. This correlation can be because of the fact that our campaigns primarily target individuals within this age range. Consequently, it is logical that a significant portion of our policies are acquired by customers in this demographic:

Let’s analyse the success rate with respect to age:

We did a lot of campaigns for customers with age between 25–50 years but the success rate is less out of very few campaigns done for elderly customers, the success rate is significantly high.

Surprisingly young people aged less than 25 years tend to accept the policy more than customers within the age group 25–75 years.

Impact of bank balance on the policy purchase

Customers with bank balances of more than 80K tend to purchase more.

Impact of campaigns on the policy purchase

Customers tend to make purchases when there are fewer campaigns. Increasing the number of campaigns dramatically reduces the success rate.

Customers who accepted previous offers tend to purchase the current policy more

Impact of the job role of customers on the policy purchase

The focus of our campaigns primarily centres around customers in blue-collar, management, technician, and service roles; however, the success rate remains relatively low. On the other hand, students and retired customers appear to display a higher inclination towards purchasing policies.

Hypothesis Testing:

Out of the analysis done above, one amusing aspect is that both elder and younger customers show a higher tendency to purchase policies. However, it’s important to determine if this factor holds strong statistical significance before making any decisions. This prompts us to investigate whether we should halt campaigns for other age groups based on this observation.

Alternate Hypothesis: Customers with age<25 and >75 have more success rate.

NULL Hypothesis: Customers with age<25 and >75 will not have more success rates than other age groups.

Create a “test” group with customers of age <25 and >75 and rest in to “control” group:

df['group']=np.nan

for index,row in df.iterrows():
    if row['age']>75 or row['age']<25 or row['balance']>df['balance'].mean():
        df['group'].iloc[index]='test'
    else:
        df['group'].iloc[index]='control'

Performing Z-test:

from statsmodels.stats.proportion import proportions_ztest, proportion_confint
control_results = df[df['group'] == 'control']['Purchased?_yes']
treatment_results = df[df['group'] == 'test']['Purchased?_yes']
num_control = control_results.count()

num_treatment = treatment_results.count()
successes = [control_results.sum(), treatment_results.sum()]
nobs = [num_control, num_treatment]


z_stat, pval = proportions_ztest(successes, nobs = nobs)
(lower_con, lower_treat), (upper_con, upper_treat) = proportion_confint(successes, nobs=nobs, alpha=0.05)

print(f'P-Value  {pval:.3f}')

The P-Value is not less than 0.05, hence there is no statistical significance to reject NULL Hypothesis. Hence the success rate of the elder and younger groups may be due to randomness.

Suggestions to optimize marketing:

Prioritize customers who do not have house loans or personal loans: Campaigns should focus on this group since loans can create financial constraints, potentially limiting their ability to afford a policy.
Emphasize marketing efforts during the Winter and Fall seasons: While the success rate is promising, our campaigns during these periods need improvement. Allocating additional resources for more impactful campaigns during Winter and Fall could increase policy sales.
More marketing for elderly and young customers: We didn’t do much advertising for older and younger customers, but they seem to respond well. However, this doesn’t mean we should stop promoting to other age groups. Our Hypothesis tests show that this conclusion isn’t strongly proven. So, we need to be careful about spending a lot on older and younger customers. Instead, let’s invest a bit more at first to see if it’s a good strategy.
Boost marketing for high-balance customers: A special campaign drive should be started which concentres on customers with balances of more than 80K as they tend to purchase policy more.
Limit campaigns to a maximum of four per customer: Data indicates that as the number of campaigns increases, customer interest in purchasing policies decreases. Conducting too many campaigns might signal desperation, leading customers to perceive the policy as less valuable.
Extra attention to customers who have purchased the policy previously: Customers who have bought the policy before might find it valuable and worthwhile. This could be due to positive experiences with our customer support.
Focus efforts on students and retired employees: This aligns with our previous conclusion of marketing more on young and elderly people.

By following these tactics, the marketing campaign is expected to outperform the current one.

Thanks for staying engaged until the end!

If you’d like to download the complete Tableau Worksheets for this project, please take a look at: https://bit.ly/marketing_optimization

I am currently a Master’s student at The State University of New York at Buffalo, with aspirations to secure Data Analyst roles. You can find my LinkedIn profile here: https://www.linkedin.com/in/abdul-matheen-shaik-5707b9200