# Customer Churn Prediction Model Using Logistic Regression

May 27, 2018 · 5 min read

In an Online business, with multiple competitors in the same business its really important to re-engage existing customers and keep them from churning. This blog is my attempt to make a sample model for a Tele-company to predict customer’s behaviour and prevent them from abandoning their product.

# What is Customer Churning and why is it important to track for any business?

For a business in a stipulated period of time, customers can come under 3 major categories-
a) Newly Acquired Customers
b) Existing Customers
c) Churned Customers

Churned Customers are those who have decided to end their relationship with their existing company. It can happen because of variety of reasons like-
c) Lack of Ongoing Customer Success

Churned Customers means a direct loss of Marketing Acquisition Cost and possible revenue which could be capitalized post sale. Hence, predicting possible customers who can churn beforehand can help us save this loss.

# Sample Problem Statement

Context
“Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs.” [IBM Sample Data Sets]

Content
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

• Customers who left within the last month — the column is called Churn
• Services that each customer has signed up for — phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
• Customer account information — how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
• Demographic info about customers — gender, age range, and if they have partners and dependents

Problem Statement-
To Predict Customer Churn Model based on various Variables like Customer Profile, Customer Account Information & Services that he has signed up for etc.

# Data Visualisation

To get more sense of data, we break down the relation between Churn Cases and different Variables.

# Building a Logistic Regression Model

We start with a Logistic Regression Model, to understand correlation between Different Variables and Churn. Before this we had cleaned our dataset, and converted all the non-numerical variables into factors.

Clearly, we see that all the variables do not have a significant impact on Churn Factor.

# Improving Quality of Model by Reducing AIC

The Akaike information criterion (AIC) is an estimator of the relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

Presently original model has AIC of 5899.9

Final Model:
Churn ~ SeniorCitizen + Dependents + GrpTenure + MultipleLines +
InternetService + OnlineSecurity + TechSupport + StreamingTV +
StreamingMovies + Contract + PaperlessBilling + PaymentMethod +
MonthlyCharges

Reduced AIC is 5895.8.

Analysis of Variances & Odds Ratio :-

Odds Ratios can help in determining how exactly each variable impact our Dependent Variable. We can say that for example:-
#For one increase in Senior Citizen, it leads to 1.24 factor approx increase in Churn chances.
#Internet Service Fibre Optic increases factor by 4.0462 in Churn chances.

Analysis of Variances:-

# Accuracy of Model using ROC Curve and Area Under it.

A Receiver Operating Characteristic Curve (ROC) is a standard technique for summarizing classifier performance over a range of trade-offs between true positive (TP) and false positive (FP) error rates (Sweets, 1988). ROC curve is a plot of sensitivity (the ability of the model to predict an event correctly) versus 1-specificity for the possible cut-off classification probability values π0.

The Area Under the Curve (AUC), also referred to as index of accuracy (A) and it is an accepted traditional performance metric for a ROC curve. The higher the area under the curve the better prediction power the model has. c = 0.8 can be interpreted to mean that a randomly selected individual from the positive group has a test value larger than that for a randomly chosen individual from the negative group 80 percent of the time.

Area under Curve for our Optimized Model is 0.8461 suggesting a good accurate model.

Resources & References Used :-
a) For Dataset & Building a suitable structure-
https://www.kaggle.com/blastchar/retain-customers-exploratory-analysis/notebook

b) For Understanding Churn Analysis Structure in Detail
https://mkmanu.wordpress.com/2014/09/11/what-is-causing-customers-to-churn-attrition-analysis-using-r/

Written by

## More From Medium

Apr 28, 2019 · 12 min read