Churn modelling and prediction

Published in

Analytics Vidhya

4 min readNov 22, 2020

What is Churn Prediction?

According to Appier’s blog, churn prediction can help one to foresee if the person is leaving so that you can have adequate time to prepare strategies to engage them. I have been busy these few weeks researching on this topic. In the area of HR, there are always a group of employees who tender their resignations even though they are doing well and the company wants to retain them. However, it is always too late to retain them as they have already been accepted of a better job offering by another company. Hence, why do we not engage the staff even before they start looking for other job opportunities? This is where Employee Churn Prediction comes into play.

The post on Telecom Churn Prediction written by Shivali is a good guide in getting started on Churn Prediction. Various techniques were used such as EDA (Exploratory Data Analysis), Cluster analysis and Churn prediction model. The main objective is to analyse the customers’ behaviour and develop a retention plan to lower the churn rate.

How to start?

Below are the general workflow on how to kickstart your churn prediction project. The main challenge will be the data collection process. We should collect as many data points as we have at different timestamps if possible.

Data collection
Data pre-processing
Exploratory Data Analysis (EDA)
Churn cluster analysis
Churn prediction model
Retention plan

1. Data collection

Data collection may sound easy, but what if your data is from multiple sources. Data on your employee may be difficult to collect especially on employees’ grades, relationship with supervisors/colleagues and their life events (e.g. marriage, parenthood and etc). We need to be clear of our project objective so that data collection can be simpler. Understanding our data is also critical especially during the next step — Data Preprocessing.

2. Data pre-processing

For instance, you are predicting employees who will leave within 6 months on a quarterly basis for the past 3 years. Hence, your processed dataset should look like this.

extract_mth is the month of extraction
resign_dt is the month of resignation within 6 months from extract_mth
pct_chg_mcrate is the percentage change in MC rate in the current quarter versus the previous quarter

3. Exploratory Data Analysis (EDA)

Before you start on EDA, you have to first group/transform your data into 3 categories — Numeric, Nominal and Binary and set your target column as Churn.

cat_cols = ["Gender","Partner","Dependents","SeniorCitizen","PhoneService","MultipleLines","InternetServiceType","OnlineSecurity","OnlineBackup","DeviceProtection","TechSupport","StreamingTV","StreamingMovies","IsContracted","ContractType","PaperlessBilling","PaymentMethod"]

num_cols = ["Tenure","MonthlyCharges","TotalCharges"]

target_col = 'Churn'

# spliting categorical columns into Nominal and Binary columns

nominal_cols =[Gender','InternetServiceType','PaymentMethod','ContractType']

binary_cols = ['SeniorCitizen','Partner','Dependents','PhoneService','MultipleLines','OnlineSecurity','OnlineBackup','DeviceProtection','TechSupport','StreamingTV','StreamingMovies','PaperlessBilling','InternetService',IsContracted']

A correlation heatmap is also useful to check if there is any relationship between the input features.

4. Churn cluster analysis

We can also check if there is any cluster relationships between input features (e.g Tenure and MonthlyCharges )


sns.lmplot(
x='Tenure',
y='MonthlyCharges',
data=df_cal,
hue='Churn',
fit_reg=False,
markers=["o", "x"],
palette=plotColor)plt.show()
# checking number of clusters

Create_elbow_curve(df_cal[df_cal.Churn==1][['Tenure_norm','MonthlyCharges_norm']])

From Elbow curve, 3 seems most efficient.

5. Churn prediction model

Various models should be used to compare which model works better for the dataset. Popular models include Logistic Regression , Random forest and Gradient boosting . Hyperparameter Tunning process is also used to get the best parameters for each model.

6. Retention plan

The last step is to use the generated model on current data to find out the probability of churn. With the identified churn group, we can further group them into different risk factors — low, medium, high risk. Retention plan can be provided to the high risk group.