Churn Prediction Using Machine Learning

Analyze all relevant customer data and develop a robust and accurate Churn Prediction model to retain customers and to form strategies for reducing customer attrition rates.

Bharat Choudhary
The Startup
9 min readOct 21, 2020

--

Churn means customers or users who left the services or migrates to the competitor in the industry. It is very important for any organization to keep its existing customer and attract new ones if one of them fails it is bad for business. The goal is to explore the possibility of machine learning for churn prediction to retain a competitive edge in the industry.

churn prediction machine learning deep learning sklearn pandas numpy telecom industry
Photo by Tony Stoddard on Unsplash

One of the most famous and useful case studies of churn prediction is in the telecom industry. It is important for telecom companies to analyze all relevant customer data and develop a robust and accurate Churn Prediction model to retain customers and to form strategies for reducing customer attrition rates.

In this project, Telco Customer Churn Dataset which is available at Kaggle is used.

Attributes Information
Prediction column:
Churn: Whether the customer churned or not (Yes or No)

Two numerical columns:
1. MonthlyCharges: The amount charged to the customer monthly
2. TotalCharges: The total amount charged to the customer

Eighteen categorical columns:
1. CustomerID: Customer ID unique for each customer
2. gender: Whether the customer is a male or a female
3. SeniorCitizen: Whether the customer is a senior citizen or not (1, 0)
4. Partner: Whether the customer has a partner or not (Yes, No)
5. Dependents: Whether the customer has dependents or not (Yes, No)
6. Tenure: Number of months the customer has stayed with the company
7. PhoneService: Whether the customer has a phone service or not (Yes, No)
8. MultipleLines: Whether the customer has multiple lines or not (Yes, No, No phone service)
9. InternetService: Customer’s internet service provider (DSL, Fiber optic, No)
10. OnlineSecurity: Whether the customer has online security or not (Yes, No, No internet service)
11. OnlineBackup: Whether the customer has an online backup or not (Yes, No, No internet service)
12. DeviceProtection: Whether the customer has device protection or not (Yes, No, No internet service)
13. TechSupport: Whether the customer has tech support or not (Yes, No, No internet service)
14. StreamingTV: Whether the customer has streaming TV or not (Yes, No, No internet service)
15. StreamingMovies: Whether the customer has streaming movies or not (Yes, No, No internet service)
16. Contract: The contract term of the customer (Month-to-month, One year, Two years)
17. PaperlessBilling: Whether the customer has paperless billing or not (Yes, No)
18. PaymentMethod: The customer’s payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))

The project is structured as follows:

  1. Data cleaning
  2. Exploratory Data Analysis
  3. Data Preprocessing
  4. Encoding
  5. Feature Selection
  6. Oversampling Technique
  7. Model Creation and Evaluation
  8. Improving the Model

1. Data Cleaning

Start with Importing important libraries:

Converting columns in the required datatype format before moving forward. As “TotalCharges” column is defined as object which is originally a numerical column.

Now First, check for any missing values available or not, and if available then by how many percentages so decide the imputation method accordingly.

Now missing is present in the dataset it is in very small percentages so either missing value can be removed from dataset or impute using simple mean imputation. There are 11 missing values which are only 0.15% of total values for Total Charges. So we can fill it with simple mean imputation our data set.

2. Exploratory Data Analysis

Check for imbalance class distribution

Plot of Churn Class Distribution

Target variable

We are trying to predict if the user left the company in the previous month. Therefore we have a binary classification problem with a slightly unbalanced target:

  • Churn: No — 72.4%
  • Churn: Yes — 27.6%
churn prediction data science deep learning flask sklearn tensorflow keras
Target Variable [Image By Author]

Numerical features

There are only three numerical columns: tenure, monthly charges, and total charges.

churn prediction data science deep learning flask sklearn tensorflow keras
[Image By Author]
churn prediction data science deep learning flask sklearn tensorflow keras
[Image By Author]
churn prediction data science deep learning flask sklearn tensorflow keras
[Image By Author]

From the plots above we can conclude that:

  • Recent Users are more likely to churn
  • Users with higher MonthlyCharges are also more likely to churn
  • TotalCharges have a similar property for both

Feature Generation that can b done by the difference between the MonthlyCharges and the TotalCharges divided by the tenure:

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation and Ev
[Image By Author]

Categorical features

This dataset has 16 categorical features:

  • Six binary features (Yes/No)
  • Nine features with three unique values each (categories)
  • One feature with four unique values

Binary Features (Yes/No)

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
  1. Gender Distribution — About half of the customers in our data set are male while the other half are female.
  2. % Senior Citizens — There are only 16% of the customers who are senior citizens. Thus most of our customers in the data are younger people.
  3. Partner — About 50% of the customers have a partner.
  4. Dependent status — Only 30% of the total customers have dependents.
  5. Phone Service — About 90.3% of the customers have phone services.
  6. Paperless Billing— About 59.2% of the customers make paperless billing

Partner and Dependent:

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
  1. Customer who has Partner is more likely to have Dependent
  2. Customers that don’t have Partners are more likely to churn
  3. Customers without Dependents are also more likely to churn

Senior Citizens and Dependent:

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
  1. Senior Citizen is less likely to have Dependent

Phone and Internet services

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
  1. Few customers don’t have phone service
  2. Customers with multiple lines have a slightly higher churn rate
Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
  1. Customers without internet have a very low churn rate
  2. Customers with fiber are more probable to churn than those with a DSL connection

Internet Services

There are six additional services for customers with the internet:

OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
  1. Customers with the first 4 additionals (security to tech support) are more unlikely to churn
  2. Streaming service is not predictive for churn

Payment Method

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
  1. Electronic Check is the Largest Payment method
  2. Electronic Check has most churn in Payment Method

Correlation Between Features

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]
Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]

Feature Importance

Data cleaning, Exploratory Data Analysis, Data Preprocessing, Encoding Feature Selection, Oversampling, Model Creation
[Image By Author]

Oversampling Technique

Synthetic Minority Oversampling Technique(SMOTE) is an oversampling technique and widely used to handle the imbalanced dataset. This technique synthesizes new data points for minority class and oversample that class.

Train Test Split

Divides data into Train and Test Subset

Model

For Starter, the GradientBoostingClassifier model is implemented to show to results of the basic model and its predictions.

Train Predict

Model prediction on the training dataset

Test Predict

Model prediction in testing dataset

Evaluation

We have achieved an overall accuracy of almost 85% with just direct implementation of the model without performing extensive feature engineering, feature selection, and hyperparameter tuning. If we apply all these techniques we can easily get accuracy above 90% and improve the model. Different model implementation and comparison can also yield an improvement in results.

This tutorial is more focused on Exploratory Data Analysis because it one of the important parts of a machine learning project cycle model building and improvement can be done easily but understanding and have an intuition about data is very important to solve a machine learning problem.

I hope this tutorial helps you to gain intuition and understanding of Churn prediction and its applications. I left the feature creation and selection part to you to experiment with and implement your understanding of the problem.

Complete Notebook of the project can be downloaded from the Repository.

Thank you for reading. Please let me know if you have any feedback.

I welcome feedback and constructive criticism and can be reached on Linkedin.

--

--