Bank Term Deposit Marketing Strategy

Elvis David

Published in

Analytics Vidhya

9 min readAug 29, 2020

A Data Science approach to predict who are the best clients to be targeted for the marketing campaign.

Introduction to the project

Financial institutions e.g Banks generate their revenue through lending and borrowing . Lending generates profits in for of interest from customers but some level of risk is involved, that is why machine learning algorithms come in handy in predicting clients who are eligible for loans. Another form that generates revenue for financial institutions is borrowing or attracting public’savings into the bank which is a bit less risky than lending. Borrowing works this: the bank invests the client’s long term deposits into other sectors which brings better returns, where some is payed to the customers. However when a client does fixed-term deposit, the company gets good returns than savings account as the customer or the client is deprived off the rights to access the money prior to the maturity unless the client is ready to compensate the bank.

Due to this reason, there is a stiff competition between banks to convince clients to do term deposits in their banks, and due to this marketing campaigns, a huge amount of money is spent by the banks in reaching out to clients, prospective subscribers and non-prospective ones since the bank doesn’t know who is and who is not. With advancement in data science and machine learning and availability of data, the banks are adapting to data-driven decisions and this will help in reducing the cost of marketing thus increasing the revenue of the bank.

In this project, we apply machine learning algorithms to build a predictive model of the dataset inorder to provide a necessary suggestion for the marketing campaign team. The goal is to predict whether a client will subscribe a term deposit or not.

Dataset Description

The dataset of the project was downloaded from the website “UCI ML”. The data is related with the direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required inorder to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed. More description on the dataset and attribute information can be found here.

Objective

Business goal: Reducing marketing resources by identifying customers who would subscribe to term deposit and thereby direct marketing efforts to them.

Analysis goal: Fitting the possible set of models that will predict which set of customers are best for directing marketing campaign to them. We analyzed the performances of five different machine learning algorithms by training and testing datasets. This would help the marketing campaign team of Portuguese bank to develop their strategy in telemarketing their term deposit scheme.

The objectives of this phase of the project are:

1. Data preprocessing

2. Model building and predictions

3. Performance comparison and choosing the best model

4. Limitations of the Algorithms

5. Summary and conclusion

1. Data Pre-processing

Learning algorithms have affinity towards certain data types on which they perform incredibly well. They are also known to give reckless predictions with unscaled or unstandardized features.

In simple terms, pre-processing refers to transformations applied to your data before feeding it to the algorithm. In python, scikit-learn library has pre-built functionalities under sklearn.preprocessing that we will use to transform our data before modeling.

First thing first, Loading libraries that we will use.

The following code explains the pre-processing applied to our data, short description of each code is written along with the code.

#importing data
data=pd.read_csv('bank-additional-full.csv',sep=";")#Viewing the shape our dataset
print("The data has {} rows with {} features/columns".format(data.shape[0], data.shape[1])) #ENCODING CATEGORICAL VARIABLES using OneHotEncoder# create an object of the OneHotEncoderOHE = ce.OneHotEncoder(cols=['job', 'marital', 'education', 'default','housing', 'loan','contact','month','day_of_week','poutcome'],use_cat_names=True)# encode the categorical variablespred1_data = OHE.fit_transform(data)#SCALING NUMERICAL DATA using robust scaler
# retrieve just the numeric input valuesnum_cols = ['emp.var.rate',"pdays","age", 'cons.price.idx','cons.conf.idx', 'euribor3m', 'nr.employed']# perform a robust scaler transform of the datasettrans = RobustScaler()pred1_data[num_cols] = trans.fit_transform(pred1_data[num_cols])#DIMENSIONALITY REDUCTION using PCApca = PCA(n_components=5) #We will choose five componentspca_result = pca.fit_transform(X)plt.plot(range(5), pca.explained_variance_ratio_)plt.plot(range(5), np.cumsum(pca.explained_variance_ratio_))plt.title("Component-wise and Cumulative Explained Variance")#CLASS BALANCING by oversampling
ran=RandomOverSampler()
X_ran,y_ran= ran.fit_resample(train_X,train_Y)
print('The new data contains {} rows '.format(X_ran.shape[0]))
#plot_2d_space(X_ran,y_ran,X,y,'over-sampled')

2. Model Building and prediction

Before we feed our data into the model, we need to split our dataset into train and test datasets. We will use sklearn train test split.

from sklearn.model_selection import train_test_split,cross_val_score#Splitting the data
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state=1)# returning the shape of our split data
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

2.1 Logistic Regression

Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary depent variable(whether a client will subscribe to term deposit{‘yes’ or ‘no’}).

Below is a code that trains the model and also evaluates the model using different metrics.

# create an object of the LinearRegression Model  model_LR = LogisticRegression()# fit the model with the training data  model_LR.fit(X_train, y_train)
# making the predictions  predict_test  = model_LR.predict(X_test)
# Getting the confusion matrix  confusion_matrix = confusion_matrix(y_test, predict_test)
# getting the classification report
 
  report = classification_report(y_test, predict_test)#ROC Curve for the modelns_probs = [0 for _ in range(len(y_test))]# predict probabilitieslr_probs = model_LR.predict_proba(X_test)# keep probabilities for the positive outcome onlylr_probs = lr_probs[:, 1]# calculate scoresns_auc = roc_auc_score(y_test, ns_probs)lr_auc = roc_auc_score(y_test, lr_probs)# summarize scoresprint('No Skill: ROC AUC=%.3f' % (ns_auc))print('Logistic: ROC AUC=%.3f' % (lr_auc))# calculate roc curvesns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)# plot the roc curve for the modelpyplot.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')pyplot.plot(lr_fpr, lr_tpr, marker='.', label='Logistic')# axis labelspyplot.xlabel('False Positive Rate')pyplot.ylabel('True Positive Rate')# show the legendpyplot.legend()# show the plotpyplot.show()

2.2 XGBoost Classifier

XGBoost is an implementation of gradient boosted decision trees designed for speed and performance that is dominative competitive machine learning.

Below is a code for implementation of XGBoost and also evaluation of the model.

# create an object of the XGBoost Model
model = XGBClassifier()# fit model with training data
model.fit(X_train, y_train)# make predictions for test data
y_pred = model.predict(X_test)# evaluate predictions
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: %.2f%%" % (accuracy * 100.0))#getting the XGBoost classification report
xgb_report = classification_report(y_test, y_pred)
print(xgb_report)#ROC Curve for XGBoost modelns_probs = [0 for _ in range(len(y_test))]# predict probabilities
lr_probs = model.predict_proba(X_test)# keep probabilities for the positive outcome only
lr_probs = lr_probs[:, 1]# calculate scores
ns_auc = roc_auc_score(y_test, ns_probs)
lr_auc = roc_auc_score(y_test, lr_probs)# summarize scores
print('No Skill: ROC AUC=%.3f' % (ns_auc))
print('Logistic: ROC AUC=%.3f' % (lr_auc))# calculate roc curves
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)# plot the roc curve for the model
pyplot.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
pyplot.plot(lr_fpr, lr_tpr, marker='.', label='Logistic')# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')# show the legend
pyplot.legend()# show the plot
pyplot.show()

2.3 Multilayer Perceptron

A multilayer perceptron is a logistic regressor where instead of feeding the input to the logistic regression you insert a intermediate layer, called the hidden layer, that has a nonlinear activation function (usually tanh or sigmoid)

Below is the code for implementation of the model and evaluation.

from sklearn.neural_network import MLPClassifier#create an object of the Multilayer Perceptron Classifier Model
 mlp = MLPClassifier(hidden_layer_sizes=(8,8,8), activation='relu',    solver='adam', max_iter=500)# fit neural network  model with training data
mlp.fit(X_train,y_train)# Predicting
predict_test = mlp.predict(X_test)#Evaluating the Neural Network model
 print(confusion_matrix(y_train,predict_train))
 print(classification_report(y_train,predict_train))# ROC Curve of the Neural Network model
ns_probs = [0 for _ in range(len(y_test))]# predict probabilities
lr_probs = mlp.predict_proba(X_test)# keep probabilities for the positive outcome only
lr_probs = lr_probs[:, 1]# calculate scores
ns_auc = roc_auc_score(y_test, ns_probs)
lr_auc = roc_auc_score(y_test, lr_probs)# summarize scores
print('No Skill: ROC AUC=%.3f' % (ns_auc))
print('Logistic: ROC AUC=%.3f' % (lr_auc))# calculate roc curves
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)# plot the roc curve for the model
pyplot.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
pyplot.plot(lr_fpr, lr_tpr, marker='.', label='Logistic')# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')# show the legend
pyplot.legend()# show the plot
pyplot.show()

2.4 Random Forest

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.

Below is the code for implementation of the model.

#Create a Gaussian Classifier
clf=RandomForestClassifier(n_estimators=100)#Train the model using the training sets
clf.fit(X_train,y_train)
RF_pred=clf.predict(X_test)# Model Accuracy, how often is the classifier correct?print("Accuracy:",metrics.accuracy_score(y_test, RF_pred))#ROC Curve 
ns_probs = [0 for _ in range(len(y_test))]# predict probabilities
lr_probs = clf.predict_proba(X_test)# keep probabilities for the positive outcome only
lr_probs = lr_probs[:, 1]# calculate scores
ns_auc = roc_auc_score(y_test, ns_probs)
lr_auc = roc_auc_score(y_test, lr_probs)# summarize scores
print('No Skill: ROC AUC=%.3f' % (ns_auc))
print('Logistic: ROC AUC=%.3f' % (lr_auc))# calculate roc curves
ns_fpr, ns_tpr, _ = roc_curve(y_test, ns_probs)
lr_fpr, lr_tpr, _ = roc_curve(y_test, lr_probs)# plot the roc curve for the model
pyplot.plot(ns_fpr, ns_tpr, linestyle='--', label='No Skill')
pyplot.plot(lr_fpr, lr_tpr, marker='.', label='Logistic')# axis labels
pyplot.xlabel('False Positive Rate')
pyplot.ylabel('True Positive Rate')# show the legend
pyplot.legend()# show the plot
pyplot.show()

2.5 Decision Tree

The decision tree classifier creates the classification model by building a decision tree. Each node in the tree specifies a test on an attribute, each branch descending from that node corresponds to one of the possible values for that attribute.

Below is the code implementation part.

from scipy.stats import randintmax_depth_value = [3, None]
max_features_value =  randint(1, 4)
min_samples_leaf_value = randint(1, 4)
criterion_value = ["gini", "entropy"]param_grid = dict(max_depth = max_depth_value,
max_features = max_features_value,
min_samples_leaf = min_samples_leaf_value,criterion =criterion_value)#Create a decision tree Classifier and make predictionsmodel_CART = DecisionTreeClassifier()
CART_RandSearch =RandomSearch(X_train,y_train,model_CART,param_grid)
Prediction_CART = CART_RandSearch.BestModelPridict(X_test)
print("Accuracy:",metrics.accuracy_score(y_test, Prediction_CART))#Getting a report of the modelDC_report = classification_report(y_test, Prediction_CART)
print(DC_report)

3. Performance comparison and choosing the best model

As already seen the five models we have built have their own accuracy of predicting whether a client will say “yes” or “no” to a term deposit of the bank. As expected there is some variation in the accuracy and F1_score, among the three classification algorithm.

Algorithms Accuracy percentage

Logistic regression model : 71%
XGBoost Classifier : 73.79%
Multilayer Perceptron : 71.45%
Random Forest : 96.79%
Decision Tree classifier : 94.3%

Based on the accuracy rate, the most reliable model for the data set appears to be the Random Forest model with just 96.79%.

Below is ROC curve for our winning models, Random forest Classifier.

4. Advantages and Limitations of the Algorithms

The advantage of logistic regression is that it is easy to interpret, it directs model logistic probability, and provides a confidence interval for the result. However, the main drawback of the logistic algorithm is that it suffers from multicollinearity and, therefore, the explanatory variables must be linearly independent.

Some limitations of logistic regression approach in context to the above model are

The model has some class of unknown predictors which are significant in the model. These variables actually do not carry any useful information fundamentally but their significance might affect the predictability of the model.

5. Summary and conclusion

From the study conducted, the results are impressive and convincing in terms of using a machine learning algorithm to decide on the marketing campaign of the bank. Almost all of the attributes contribute significantly to the building of a predictive model. Among the five classification approach used to model the data, the Random Forest model yielded the best accuracy with just 96.79% . This model is simple and easy to implement.

The bank marketing manager can identify the potential client by using the model if the client’s information like education, housing loan, Personal loan, duration of the call, number of contacts performed during this campaign, previous outcomes, etc is available. This will help in minimizing the cost to the bank by avoiding to call customers who are unlikely to subscribe the term deposit. They can run a more successful telemarketing campaign using this model.