Customer churn

Investigating customer churn using survival analysis techniques

Published in

rond blog

10 min readMar 9, 2021

https://pixabay.com/photos/farewell-say-goodbye-bye-road-3258939/

When you are running a shop or selling a service there is nothing more important than the customer. Without customers your business is doomed. It is often difficult to convince people to become your customers in the first place. This may require expensive marketing campaigns or elaborate shop windows. Once you have customers, it may be a better strategy to try to retain them.

To understand how to best retain customers, it is instructive to look at leaving customers and investigate their reasons for leaving. This is the domain of customer churn analysis.

Types of customer churn

Within the domain of customer churn one can observe different types of churn. A customer may engage in your business in the form of subscriptions or long-term contracts. This type is therefore called contractual churn, and is characterized by regular periodical payments. Examples are:

roadside assistance provided to members of an association, such as the ANWB (Netherlands), ADAC (Germany) and the Automobile Association (UK);
Telephone companies who provide their customers with phone and/or internet services;
or media companies, where customers can subscribe to their daily newspaper;
but also an employee who resigns his/her job.

Within these subscription settings churn is typically explicitly observed when a customer voluntarily or in-voluntarily stops the contract.

There is also non-contractual churn. This type is observed when there is no contractual relationship between the seller and buyer. Every purchase from the customer could — in principle — be the last. Examples are:

supermarkets selling groceries to passing or regular customers;
parcel companies who provide mail service and parcel delivery to customers whenever needed;
web shop selling stuffed animals online.

In this setting churn is not explicitly observed. When a weekly returning customer hasn’t visited the supermarket for over a year, you can safely assume that this customer has churned. Here you would try to observe the duration of previous periods between purchases and calculate the probability of a next purchase within a x-period of time.

In this blog I will focus on the contractual churn and leave the discussion on non-contractual churn for a later blog.

Example dataset on contractual churn

To take you through contractual churn analysis I propose to consider a database that has been used many times before, for example in this blog. It consists of service details, account information and demographic data of about 7000 customers of a telecom service. The dataset can be found here on Kaggle, where the dataset is also further explained.

To add a bit more reality to the data, I’ve added a fictitious start and end date of each customer membership to the dataset. An overview of the first 20 rows of the dataset is visualized in the figure below.

The vertical dashed line represents today. The customers represented by the reddish lines have ended their contracts somewhere in the past: they have left. We exactly now the duration of their membership. The customers shown in blue are still under contract today. We do not know how long their subscription will last after today. And that is exactly the question under consideration: when will they leave?

When will they leave?

We could consider the customers who have already left, and find a relationship between customer and account characteristics and the duration of their subscription. Subsequently we use that model to predict the expected duration of the customers who are still subscribed. However this leaves out a substantial amount of data.

It is therefore better to use a technique that is particularly suited to take data for which the end-point is not known into account. Such technique is called survival analysis. Data without an end-point is called right-censored data. In the remainder of this blog I will discuss several techniques and methods within survival analysis.

Survival analysis

Kaplan-Meier curves form the backbone of survival analysis. These curves estimate the survival function, which shows the proportion of customers that have not yet churned up to point t. The figure below shows such plot for the case under consideration. The figure shows the probability that a customer has an active subscription on the y-axis and the duration of the subscription on the x-axis. At the start, 0 months, all customers are subscribed, after 20 months about 80% of the customers still have a subscription.

This plot already provides quite a bit of insight on the customer behavior. A small amount of customers resign quickly after starting their subscription. Then, over time, the curve slowly declines to about 60% of customers who have a subscription time of 70 months.

This curve does not provide insight into the influence of customer or account characteristics. Such insight would be useful to understand the driving forces better, and possibly make advantageous offers to the customers. What can be done, for example, is plotting the Kaplan-Meier curves for different values of particular customer characteristics. The figure below shows the curves for three different options of account durations.

This figure shows that longer contract durations are associated with smaller custom churn. This is to be expected: ending the subscription before the end of the contract likely has negative financial consequences for the customer.

There are quite a bit of customer, account and demographic details that could be analyzed in similar ways as shown above. This is a rather cumbersome procedure. Additionally, Kaplan-Meier curves work excellently in the explorative phase of a project. However, they cannot be used to make predictions on individual customers. How long will the subscription of John Doe last?

Therefore we need to find a model that relates the customer and account characteristics to the subscription duration. That can be done in a wide variety of survival regression models, which is another important topic within survival analysis methods.

Survival regression model

Survival regression models do not model the survival duration directly. Instead they model the hazard: the risk per unit time that the customer will stop the subscription. In this blog I will start by applying the Cox Proportional Hazard method. Before you can practically apply this method, you need to make sure that your categorical variables are one-hot encoded and that the hierarchy in the data is properly dealt with as explained in this notebook. Additionally, the feature TotalCharges is not known at the start of the contract. It therefore does not seem to be a valid covariate and is removed from the dataset. The Cox proportional hazard model can then be fit with the lifelines package:

from lifelines import CoxPHFitter
cph = CoxPHFitter()
cph.fit(df_train, 'tenure', 'Churn_Yes', 
   strata=["Contract_Two year", "Contract_One year"])fig, ax = plt.subplots(1, 1)
cph.baseline_survival_.rename(
    columns={
        (0, 0): 'Monthly',
        (1, 0): 'Two year',
        (0, 1): 'One year'
    }).plot(ax=ax)

Note that I have stratified the data on contract duration; in this way the model better satisfies the proportional hazard assumption.

The figure below shows the base-curves of the Cox PH fit to the training data. On face-value the curves are pretty similar to the Kaplan-Meier curves shown above, suggesting a decent fit to the data.

Alternatively we can also fit a Weibull Accelerated Failure Time model. This model assumes a faster or slower rate of failure depending on the covariates. Applying this model is easy within the lifelines package:

from lifelines import WeibullAFTFitterwaft = WeibullAFTFitter()
waft.fit(df_train, 'tenure', 'Churn_Yes')fig, ax = plt.subplots(1, 1)
waft.plot_partial_effects_on_outcome(
    ["Contract_One year", "Contract_Two year"],
    values=[[0, 0], [1, 0], [0, 1]],
    plot_baseline=False,
    ax=ax
)
ax.legend(labels=["Monthly", "One year", "Two year"])

The resulting figure for various contract durations is shown below. The plot shows smooth lines which faster decrease for shorter contract durations.

Finally I have also applied a survival random forest model to the data from the scikit-survival package. This package does not readily provide partial effect plots — you will need to program the partial effects yourselves. That leads to the plot shown below. The figure displays roughly the same pattern as the plots above.

Assessing model performance

When we would like to compare the performance of the models there are several methods to our disposal. The most obvious may be the concordance index, which is a generalization of the well-known Area Under the Curve (AUC). This index ranges from 0 to 1, where a value of 1 indicates perfect prediction and a value of 0.5 is associated with random predictions.

The concordance indices of our models on the test-set are as follows:

Cox PH model: 0.83
Weibull AFT model: 0.87
Random Forest model: 0.84

Another method of assessing model performance makes use of the Brier score, which measures the accuracy of probabilistic predictions. Low Brier scores means better prediction. Such assessments can be made at different moments in time. We therefore use the test-set to predict the survival function and compare those predictions with the actuals at different points in time using the Brier score. This calculations results in the plot below.

The Brier scores shows low values for both the Weibull AFT and Random forest model. The time-range for which the predictions work best varies between the models. The Weibull AFT model reaches it’s minimum just before 20 months, while Random forest makes the best predictions around 40 months.

Composing the perfect model is not the ultimate goal in this blog. I would like to give you a flavor of different models, their characteristics and performance on this dataset.

Predictions

Let’s take the Weibull AFT model, which has the highest concordance index and seems to perform reasonably well over a decent time range, and use this model to make predictions on our dataset. Note that such predictions only make sense for customers who are still subscribed. These predictions can be calculated using the following code:

# predict remaining time
last_obs = df_surv.apply(lambda row: row['tenure'] if row["Churn_Yes"] == 0 else 0, axis=1)

# predict median remaining life
remaining_life = waft.predict_median(df_surv, conditional_after=last_obs)

The visualization of the forecasted membership duration is shown below for the first twenty customer id’s.

The figure shows that the predicted duration of the membership can vary quite a bit between customers. This variation obviously depends on demographic, customer and account characteristics. We would be particularly interested in customers with a short predicted membership and how we make an appropriate offer to those customers to convince them to stay longer.

As a first step we take a look at the coefficients of the model and see which covariates are most important. The figure below shows that in particular the contract duration positively influences the accelerated failure rate. This is no surprise: the Kaplan-Meier survival curve shown in the beginning of this blog already points in that direction.

This actually provides a handle towards improvement: what if we would upgrade the contracts of customers at risk of churn from monthly contracts to one- or two-year contracts? Or how much higher would the customer retention be if we offered them online security options, or online backup possibilities? We now enter the area of customer lifetime value.

Customer Lifetime value

The customer lifetime value consists of the value of the remaining length of service for this customer. Let’s consider the participants with a predicted remaining membership duration of 3 years. The value of such membership can be calculated by multiplication with the monthly charges. The figure below shows the distribution of the customer lifetime value by the blue shaded curve if we do not change the contract. Obviously this figure includes only customers who are still subscribed at this moment: who haven’t churned yet.

As we have shown above, customers with one or two-year contract tend to stay with the company much longer: they are less at risk to churn. Therefore it probably makes sense to offer all existing customers with monthly or one-year contracts a two-year contract. In case all those customers accept that offer, the remaining subscription duration shifts considerably to the right, as can be seen by the red shaded curve above.

This prolongation of the predicted subscription duration can be translated to new estimated customer lifetime value by multiplication by monthly charges. In real-life cases the monthly charge of a two-year contract will likely be lower than one-year or monthly contracts. In this blog — for simplicity reasons — I will leave that out of the equation.

Likewise we can calculate the changes in average customer lifetime value for other updates to the contract. The bar plot below shows the estimated customer value for the different contract upgrades. I have selected solely the customers with an expected remaining membership duration of one year or less. These customers are most interesting to target with specific campaigns. The figure below shows that clearly an offer of two-year contract would commercially make most sense in this case.

Closing words

In this blog I have explained how survival analysis can be used to analyse and model customer churn. This model can subsequently be used to make predictions and provide the fundament for data-driven decisions on customer retention campaigns.

I would be very interesting to learn about your customer retention challenges and the model and analysis techniques you have used. Please share your questions, comments and suggestions below!