An everlasting customer: a story of retention

Mathijs van Bree
Sogeti Data | Netherlands
10 min readDec 20, 2022
A customer doubting to walk out of the door — image generated by Stable Diffusion

Every company that sells a product or service to a customer, encounters at some point in time the problem of a customer ceasing to be their customer. Often a customer leaves and they soon move on to a competitor offering similar products or services. The phenomenon that a client leaves your business (or stops using it) is a harsh reality for every company, and it is known as customer churn. However, in the current times of big data, we can predict if a customer is going to churn and when this customer will be churning. In this blog, customer churn is introduced together with a two-stage model approach to predict customer churn. The insights and additional benefits of survival analyses as part of the two-stage model will be discussed. Finally, acting upon the churn prediction results to retain customers will be covered.

Definition of churn

A couple of years ago I had a preferred web shop. Let’s call it Gifts4All for now. Every time I needed to buy a gift, or I wanted a new gadget, I used to browse Gifts4All, select the product and buy it without a second doubt. I did this for years because of their great service. Just before Christmas, I bought a nice RC car as a gift for my nephew at Gifts4All. However, due to the busy holiday season, the gift only arrived after Christmas. I contacted the helpdesk a couple of times, but without success. This negative experience damaged the relationship that I felt with this webshop although I had been a satisfied and loyal customer for several years. Afterward, for every shopping spree I had, I compared the price of a product on multiple web shops. I did this until I found my new preferred web shop. However, what if Gifts4All contacted me just after my negative experience and offered me a discount voucher for instance? Maybe I would have stayed with Gifts4All. I guess we will never know. In technical jargon, the moment I stopped being a customer of Gifts4All, I had churned. In other words, churn is the moment a customer ceases to be your customer. Sometimes, customer churn is also referred to as customer attrition. Research shows that the example I described is not just hypothetical. According to research, 32% of customers indicated that they would stop using a business after only one bad experience.

What if I told you that in most cases it is quite predictable if a customer is going to churn because there are lots of hidden breadcrumbs in the data that hint at the possibility of churn? These breadcrumbs can be found easily with the right tools and insights at our disposal. Nowadays, all companies gather a lot of data about their customers and the interactions that they have with their customers. Technical advancements combined with the explosion of available data, both private and open-source data, bring about many new possibilities to improve customer experience among them proactively being able to retain customers.

Machine learning to the rescue

We don’t need a magical crystal ball to predict if a customer is going to churn, however, we do need machine learning. Machine learning models can learn to recognize patterns from data. Afterward, these patterns can be used to make predictions on unseen data. The most common approach when building a churn prediction model is to build a classification model that classifies if a certain customer is going to churn or not given a certain period. Depending on the domain and the available data, this can vary from a month to a year. A supermarket can gather data very frequently with customers visiting twice every week whereas an energy supplier commonly interacts with a customer only once or twice a year.

As the type of features that could be used for customer churn prediction strongly depend on the domain in which the model will be applied, several general features can be found in most domains. These features can typically be grouped into three categories: general features, engagement features, and external features. The first group consists of general information about customers like customer geographics, age, and gender. Most companies already posses this data and therefore it is quite straightforward to incorporate this into a churn prediction model.

In the second group, there are mainly engagement features. This type of features, tell us everything about how a customer interacts with you. Some examples from this group are social media behavior like who comments on your posts and who follows your Facebook page. Another example is information about the frequency of phone calls, chat messages, and emails in the last X months. Often feedback forms from customers after an interaction with your company can also be used as a feature for the churn prediction model. Lastly, the sentiment of phone calls, chat messages, or emails can be derived with sentiment analysis which can prove to be an important feature of the model.

The last category of features is external features. These are features that tell something about the world itself instead of your company. A typical feature within this category could be inflation rates, World Health Organization data about COVID-19, and the general sentiment of your company on Twitter. This last category is often the hardest to incorporate into a model, but it could help to factor out the effect of unforeseen circumstances like the next pandemic or recession. However, adding some external features to our model doesn’t mean that we can build a model that will be accurate for years. It is crucial to monitor the relevancy and performance of the model and features throughout the lifespan of the model. To learn more about monitoring ML models read the blog: “MLOps: Monitoring phase.

Most models are trained with mostly general features and some engagement features to predict if someone is going to churn in for example the next year. One important question that remains is how can we predict when we can expect a customer to leave?

Survival of the fittest

In general, there are two approaches to predicting when a customer is going to churn. This first approach is building a regression model instead of a classification model that does not predict if a customer is going to churn but when a customer is going to churn. This approach gives, on average, a lower performance because we don’t know yet how long it takes before every customer churns. One reason for the decrease in performance is that often a lot of customers haven’t churned yet, so it is hard to include them in the training dataset. Therefore, a solution is to exclude the active customers from the training dataset or to assign a high number of years until churn occurs. The latter is based on a lot of assumptions but both solutions require undesired assumptions to be made. One method to work around these limitations is to implement a two-stage solution. Within the first stage, we first apply a classification to determine if someone is going to churn in a certain timespan as we discussed previously, and afterward, we apply a regression model to determine the time until a customer churns.

Survival analysis, also known as ‘time to event’ analysis, is used to study a population and, in particular, to estimate the lifetime of this population. Survival analysis has been around since the 17th century and in the beginning, it was primarily focused on predicting and analyzing the survival duration of individuals that received different types of interventions. However, since then it has been extended and it evolved outside of the medical world among which, the prediction of churn. All types of regression models can be used for this second part of predicting when a customer is going to churn. However, in comparison to other regression models, survival analysis is superior because of its ability to deal with censorship in data. Censorship refers to the incompleteness of the data of customers that haven’t churned yet. In the end, all customers will at some point in time churn, and therefore they should be included when estimating the lifespan of a customer. As previously mentioned, customers that did not yet churn cannot be excluded from the data as they could hold vital information. Additionally, imputation of the lifespan comes with many assumptions that we cannot always safely make.

Several types of survival analysis models can be used to predict when a customer is going to churn. What model is suitable in which case depends on the characteristics of the dataset. With a Kaplan-Meier (KM) survival curve, which is displayed below, the survival rate of the average customer can be calculated. The only features that are required when fitting the KM survival curve are the start date, the end date, and if a customer has churned or not. In this case, after 20 months the chance of a customer still being a customer is 80%. In this example, 60% of all customers will be retained in the end.

The KM survival curve can be very insightful, however, it is quite an abstract overview of the development of survival rates over time without taking other features into account. The survival analysis model that we need in our case is the Cox Proportional Hazards (CPH) Model. The hazards in this model can be seen as factors that impact the survival rate, such as the number of orders on a website or complaints from a customer. The CPH model calculates coefficients for every feature, or hazard as they call it, with which we can see how much a feature contributes to a customer churning or staying. An example of such coefficients is displayed below.

Positive coefficients contribute to customers churning at a higher rate whereas negative coefficients decrease the churning rate. In other words, higher positive coefficients have a greater impact on making customers churn, and lower negative coefficients have a greater impact on making customers stay. Therefore, these coefficients give a lot of insight into why customers churn in the first place.

Now we know which features should be further examined, we can fit categorical KM survival curves. In the figure that is displayed below survival curves for every type of contract are displayed. Customers who have a two-year contract have a very high survival rate, customers with one-year contracts have a slightly lower survival rate and month-to-month contracts are a group that is very much at risk of churning. After ten months, 20 percent of the customers with a month-to-month contract have churned and sixty months later, more than 80 percent of our customers in this group have stopped using our service.

If we want to know for an individual customer what the chance is that he/she/them will churn, we can use the previously calculated coefficients together with the hazard features and our temporal feature, customer tenure. With these features, we can calculate for every customer at any moment in time what the chance of survival, and in our case churn, is.

We already looked at censorship within churn prediction. Censorship refers to the distinction that we want to make between customers that have already churned and active customers. The churned customers have an end date whereas an active customer can cease to be our customer at any moment in time. When we group our data in such a way that we only have active customers in our dataset, we can use the survival curves to calculate the survival rate for every individual at any moment in time. With this approach, we can select all customers that have a survival rate of less than 60% in three months.

Now we know that we can predict if a customer is going to churn and what the survival probability is of a customer at an arbitrary moment in time let us see how it all fits together. Why would we build two models if we can only apply the survival analysis to find out what the survival rates are of every customer at any moment in time? Well, retaining customers is often an investment, and if we are going to invest in something we want to invest as little as possible while still getting the most optimal return on our investment. Both models that we created are statistical models that make predictions based on historical data and assumptions. It is not 100% guaranteed that the output of the models will always become reality. To select the customers that we want to retain, we can benefit from combining the output of both models to be more confident about the group of customers that we want to proactively interact with for retainment. The method of interaction depends on the initial investment and of course the company, but it can range from sending a discount voucher to sending them a free PlayStation 5.

It is of paramount importance to add any retainment attempts to both models. If this is not registered properly and eventually added to the models, our predictive models will possibly corrupt because of the interaction that we’ve had with customers based on the predictions of the models. Preferably every retainment campaign is accounted for in the model. The group of customers with which was interacted for retainment purposes can therefore be tracked for efficiency within the survival analysis. We can compare this group of customers to a group of customers that were also at risk of churning but were not selected for retainment to measure the effectiveness of certain retainment campaigns.

Conclusion

In this blog, we’ve seen that for every company it is viable and beneficial to analyze and predict customer churn. We discussed the two-stage model approach for firstly predicting customers at risk of churning in a certain period and secondly calculating the survival rate of a customer at any arbitrary moment in time. Additionally, we looked at how we can use the predictions of both models to retain customers and what the importance is of registering retainment campaigns. Any retainment attempt should be eventually added to the model to ensure the quality of future predictions. I hope this blog was able to give more insight into the methods behind customer churn prediction.

--

--