Non-Contractual Customer Churn

Published in

rond blog

7 min readApr 20, 2021

Using data science to predict customer churn in cases where there is no contract between the buyer and the seller.

In a great many business domains, the relationship between sellers and customers is non-contractual. We might think of the corner shop or a web commerce platform. New customers arrive; others don’t return–without telling the seller.

The silent nature of customer churn raises a problem: how do we know whether a customer is “dead” or is “alive”? And can we figure out which customers are about to “die”?

It is well known that it is considerably cheaper to sell to existing customers than to new ones. Data science can help in the effort to retain customers by means of modelling customer behaviour. We call this activity “customer churn analysis”.

In a previous blog, Hans Weda wrote about contractual churn. That is, the type in which there is a contractual relationship between the seller and the buyer. Hans reviewed and demonstrated various statistical techniques to model contractual churn and customer lifetime value. In contractual churn analysis we explicitly observe a “churn event”. This is the case because a customer will discontinue a contract, and this will be visible in the database records..

In a wide range of business cases, such an explicit event does not occur. For example, we don’t know if a customer has left or just hasn’t purchased anything for a while. These cases have in common that there is no contract that underwrites the exchange. And, consequently, if a customer churns, this will not be recorded in the database..

Not directly observing the churn event, thus, complicates the analysis of non-contractual churn. How do we proceed?

One possibility is to actually construct a churn event. We can say: “if a customer has not made a repeat purchase within X days, then this customer has churned”. Here, X can be 7 days, 30 days, or any other period.

While having a churn event simplifies analysis, its definition can be arbitrary in the non-contractual setting. Another possibility is to approach churn probabilistically. We model two processes:

The probability that a customer will churn or not in a given time span
and, the rate of purchasing. That is: how many purchases will a customer make in, say, a week

These processes then jointly can predict future customer behaviour given customers’ past behaviour. To do this, we need three pieces of information:

Age: how long ago did a customer make his first purchase?
Recency: What is the time duration between first and last purchase of a customer?
Frequency: how many repeat purchases did this customer make?

Note that this data can be obtained from transactional databases rather easily. Also, “recency” is confusing from a terminology standpoint. One would think that it speaks to when the customer last made a purchase relative to today. Rather, it is the time (e.g., number of days) between a customer’s first purchase and his most recent one.

We put this data into a complicated statistical model known by the name Beta-Gamma/Negative Binomial Distribution (BG/NBD), or colloquially, “Buy Till You Die” models. This mixture distribution (plus some assumptions) allows us to model the relationship between age, recency, and frequency so that we can make predictions of customer behaviour.

Let’s look at some data.

Suzanne and Leonard

Data about transactions is commonly stored in online transaction processing databases. In this blog plot, we use the CDNOW dataset. CDNow was a compact disc commerce platform with its heyday before the dot-com bubble. Their transactional data has been made publicly available. By means of SQL or queries or the lifetimes one can convert transaction data to the Age-Recency-T format. This is what the data could look like (Suzanne and Leonard do not exist):

Suzanne has made 10 repeat purchases — she likes music! Her first purchase about 15 weeks ago and her last purchase was about 10 weeks after the first purchase. Leonard, however, has made 1 repeat purchase about 2 weeks after his first purchase.

Let’s look at the data beyond Suzanne and Leonard. Below we see a histogram, which shows the distribution of frequency (number of repeat purchases per client). The vast majority of clients have made only one purchase, that is zero repeat purchases.

If we then stack recency against frequency we see the following. Recent customers (customers that have a large time duration between their first and last purchase), as one might expect, have more repeat purchases, i.e., they have had more time to purchase more.

Beta-Gamma/Negative Binomial Distribution

Using the lifetimes package we can estimate our model for predicting customer churn. According to Pythonic machine learning conventions, we initiate an object bgf that contains all we need to fit the model. We then pass in the data.

bgf = BetaGeoFitter(penalizer_coef = 0.2)
bgf.fit(data['frequency'], data['recency'], data['age']

This means we now have a model that we can use to make predictions. Let’s look at what the model yields.

Below you see a plot that shows the predicted probability of a customer being “alive” conditional on a customer’s frequency and recency. This plot tells us about the complicated relationship between frequency and recency.

For customers who only purchase infrequently (say frequency < 5), we see that the probability of being alive is relatively independent of the degree of recency. Thus, for customers who purchase a few CDs, their recency doesn’t give much information about the probability of being alive. At least, relative to those who have high frequency.

At the same time, there are customers for whom there is a larger amount of time between their first and last purchase (say, recency > 30). For this group, frequency doesn’t say much about whether a customer has churned. They are likely to be alive.

Finally, we have few customers that are recent but have a high number of repeated purchases. Hence, the low probability of being alive above the upper-right diagonal. This means that loyal customers haven’t churned much.

Using these models we can get a probability that a customer is active or not. Let’s go back to Suzanne and Leonard. Leonard purchased two CDs a while ago. According to our model, he’s got about a 22% of being “dead” currently. Suzanne, on the other hand, has a 40% probability of being alive.

Saving Leonard

Leonard probably churned by now. Ideally, we would have retained Leonard as a customer. In the end, that’s what customer churn analytics is about.

We have predicted the probability of Leonard being alive today. This is a static prediction. As the customer ages, the prediction of our model changes. Had we started making predictions earlier on (for example when Leonard’s age as a customer was 20 weeks) we would have obtained a different prediction. Let’s see what that looks like.

In the plot above, we see how the alive probability changes weekly as the age of Leonard as a customer increases. The probability of being alive after 20 weeks is approximately 0.37 and at 39 weeks it’s 0.22.

From this plot we can see that this probability has been decreasing over time. These changes in probabilities can inform targeted marketing campaigns. We might offer Leonard a discount voucher if his alive probability decreases for 5 consecutive weeks. The voucher might prompt a purchase, which changes the trajectory of alive probability.

Conclusion

Using statistical models, we obtain predictions as to whether a customer is “alive” or not. These predictions can be used to identify potential churners.

Next steps include forecasting the number of purchases in the near future. When we then attach monetary value to these predicted purchases, we obtain estimates of customer lifetime value.

The model we have been using in this blogpost takes in three variables: frequency, recency, and age. This makes the model relatively easy to work with, but also reduces its explanatory power. If we want to figure out which type of customers churn in a non-contractual setting and why, we need to look elsewhere.

We’d love to learn from you how you use data to retain your non-contractual and contractual customers. Don’t hesitate to get in touch with us!

Non-Contractual Customer Churn

Suzanne and Leonard

Beta-Gamma/Negative Binomial Distribution

Saving Leonard

Conclusion

Written by Maarten van Meeuwen