Customer Churn Prediction in a Telco

Published in

Analytics Vidhya

6 min readOct 17, 2019

All code related to this post can be found here.

Customer churn is the loss of clients or customers. In order to avoid losing customers, a company needs to examine why its customers have left in the past and which features are more important to determine who will churn in the future. Our task is therefore to predict whether customers are about to churn and which are the most important features to get that prediction right. As in most prediction problems, we will use machine learning.

How can we use machine learning to do that? First, we need to establish an evaluation metric. Given a dataset how can we evaluate if our churn predictions were right or wrong? Everybody will eventually churn (some people will change to a different company, some people will stop using the proposed services altogether, and eventually all the people will churn if not because eventually companies die as well as people do).

So the question remains, how do you evaluate if your churn predictions were right or wrong?

It has been argued elsewhere (see for example this post or this one) that the right approach to evaluate churn prediction models is through the concordance index. The concordance index basically focuses on the order of the churned customers. It is defined as the proportion of concordant pairs of predictions divided by the total number of possible evaluation pairs of predictions. It is a better approach than to treat the problem as a binary classification problem because of the reasons explained above.

I think the right way to think about the problem is that given that our model has predicted the customers to churn in a certain order, that that order has been right. The customers that were predicted to churn first have indeed churned first. Therefore, we will evaluate our machine learning models through the concordance index.

Treselle Systems, a data consulting service, analyzed customer churn data using logistic regression. We will use that dataset to do our analysis.

For a great data analysis of the dataset see this post by Zach Angell. However, in this post our focus is different since we’re interested on using machine learning to make predictions.

We first install libraries lifelines and pysurvival and the usual imports

We download the dataset and show the first 5 customers

The dataset includes information about:

+ Customers who churned — the column is called Churn

+ Services that each customer has signed up for — phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies

+ Customer account information — how long they’ve been a customer, the type of contract (month-to-month, one-year, two-years), payment method, paperless billing, monthly charges, and total charges

+ Demographic info about customers — gender, age range, and if they have partners and dependents

In the dataset, there are 7043 customers, of which 1869 customers (27%) have churned and 5174 customers (73%) have not churned yet.

We drop the ID column and the TotalCharges column (otherwise by knowing the MonthlyCharges our model can easily deduce how many months someone has been subscribed). We finally use one-hot encoding for the categorical variables. For details on the preprocessing of the data, please check the associated code.

Data Analysis

To evaluate our models, we split the dataset into training (70%) and testing (30%) datasets.

In this post, we want to compare the performance of three different random forest models with default values to make the churn predictions:

+ Random forest classifier

+ Random forest regressor

+ Random survival forest

We want to determine from only the set of features/covariables who will churn first.

Random forest classifier usually gives either a zero or a one, however we can use the method predict_proba to get the probability of belonging to the churn class and therefore make the predicted ordering of churning with this probability.

Random forest classifier obtains a concordance index of 0.765. The most important feature is the monthly charges, far behind is the type of contract, and the payment method.

Let’s first see how does a random forest regressor perform only taking into account the customers who have churned.

Random forest regressor obtains a concordance index of 0.817. The most important features are the monthly charges and the type of contract.

Let’s see if we do better by considering all the customers.

Random forest regressor obtains a concordance index of 0.82 which is almost the same as with only considering the customers who have churned. The most important features are the type of contract and the monthly charges. All the other features fall far behind.

Let’s see now how does random survival forest perform.

Random survival forest obtains a concordance index of 0.844. The most important features are the type of contract, the payment method, and if the customer has online services associated to the account. The monthly charges is much less important compared with the importance given by the random forest classifier and the random forest regressor.

It may be the case that the way we split our dataset gave an advantage to the random survival forest model. Let’s try 20 randomly chosen seeds.

Random survival forest model outperforms random forest classifier model and both considered random forest regressor models in all of the considered cases.

Conclusions

We have considered the problem of churn prediction in a telco using three different random forest models: random forest classifier, random forest regressor, and random survival forest, and the concordance index as the evaluation metric. Random survival forest outperforms both random forest classifier and random forest regressor in all of the considered cases.

Appendix

In our analysis, we have not been fair to random forest regressor and random forest classifier since there is no easy way to include both the events of interest (churn) and the time people have been subscribed (or the last known time) to these models.

Random forest classifier is only using the available churn information. Random forest regressor is only using the available time information. However, random survival forest is simultaneously using both the churn information and the time information.

Let’s consider two extra cases:

First, the random forest classifier will use the churn information and we will use that churn information to run a random forest regressor (we call that approach random forest classifier-regressor).

Second, the random forest regressor will use the time information and we will use that time information to run a random forest classifier (we call that approach random forest regressor-classifier).

Both of these cases perform worse than random survival forest (Random forest classifier-regressor gives a concordance index of 0.689 and random forest regressor-classifier gives a concordance index of 0.622). Actually, they perform worse than just considering one of the available informations.

Customer Churn Prediction in a Telco

Data Analysis

Conclusions

Appendix

Written by Alonso Silva Allende