Modeling customers’ churn? Start here

Netta Shachar
May 19 · 6 min read

One of the most important KPIs for any business is the customers’ retention rate, i.e. what percent of customers will purchase again. The higher the retention rate — the happier the business owner.

We often think of churn rate as the complementary event of the retention rate: churn rate=1-retention rate

This makes sense when the business sells a subscription: if a customer didn’t renew her subscription — she churned. However, when products are sold, this perspective is inaccurate: a customer who hasn’t made an additional purchase yet, may still purchase tomorrow! This means that churn is not directly observable, which is crucial when building a churn prediction model.

At Yotpo, we recently introduced a churn prediction ability for our “Loyalty & Referral” clients. This post describes 3 modeling frameworks we considered (classification, BTYD and survival models), and shares insights from our journey towards a production model.

But, as with any data science problem, first — the data.

Preparing the learning data

Since our clients are e-commerce stores, we assume the data is from such a store. Nevertheless, the ideas and principles presented throughout the blog can be easily extended to other domains, such as gaming.
Purchase history data, from which you infer customers’ timelines, is the starting point of any churn model:

Image for post
Image for post
Illustration of purchase history data
Image for post
Image for post
Customer A’s purchases on a timeline

When working with temporal data, you need to ask yourself (and answer) two questions:

Image for post
Image for post
Train/ Validation/ Test splits should be conducted according to the timeline


Given purchase history data, the most commonly used features for churn predictions are:
1. Frequency — number of purchases in the specified time period
2. Recency — time since last purchase

Other features/data sources can — and should — be considered, if available. Among the features we explored at Yotpo are: average time between purchases, average purchase amount, newsletter subscription indicator, and others.
You can also get some “feature inspiration” from this paper by Asos, discussing a closely related problem called Lifetime Value (LTV).

Ok, so now we have learning data and some features. Time to talk about….

Modeling Frameworks

1. Classification models

Churn is binary — either one buys again or not. So, binary classification seems like a natural framework.
Given labeled data of churned (label=1) and active (label=0) customers with predictive features, we can choose an algorithm, split the data and start training!
Sounds like a done deal, but building labeled data from purchase history data requires you to make one key decision:
When should we label a customer as “churned”?
To answer this question, consider two stores: Store A sells fruits & vegetables and Store B sells ski gear. A customer of Store A who hasn’t purchased in 3 months has likely churned, whereas in Store B, 3 months is not enough to assume churn.
So, what’s the “correct” number for your store?
Since it depends on your business — find the answer in your data!
Look at the distribution of time between purchases, and set a “churn” threshold at the tail of the distribution. The exact threshold depends on your desired “false churn”/”false active” ratio. This blog post discusses the classification approach in more detail.

2. “Buy Till You Die” (BTYD) models

This group of unsupervised models quantify churn probability by assessing the expected number of future transactions and probability of being “alive”, using the frequency and recency features only.
Pareto/NBD, BG/NBD and MBG/NBD models are the 3 main models of this framework.
All these models assume that the number of purchases for active customers follows Negative Binomial counting process (NBD), meaning:

Image for post
Image for post
Gamma distribution is used as a prior on the transaction rate in all BTYD models

The dropout process varies between models:

Pareto/NBD model assumes:

BG/NBD model assumes:

MBG/NBD model assumptions:
Similar to BG/NBD, except it allows customers to dropout after the first transaction

In addition, all models assume independence between purchase rate and dropout rate.

Image for post
Image for post
Beta Distribution is used as a prior on the drop out probability in BG/NBD and MBG/NBD models

In practice, Pareto/NBD is rarely used due to computational challenges, while BG/NBD is the most commonly used (this blog gives a classic example).
Having said that, in our e-commerce settings, we found that MBG/NBD outperformed BG/NBD thanks to its preferred modeling of one time buyers (significant portion of customers in many online stores), and both models were outperformed by Pareto/NBD.

3. Survival models

This last framework is less familiar among data scientists, and for no good reason.
Designed to estimate the time until an event occurs, survival models are often used for evaluating medical treatment efficiency (“event” = death) or machines lifetime (“event” = failure time).
Survival models have characteristics that make them perfect for churn prediction:

For a detailed introduction to survival models, we recommend this blog.
The most widely used survival models are statistical models, such as Cox-PH and Weibull regression. But some ML models exist as well — this blog suggests using NN, this R library uses Gradient Boosting machine, and this one uses Random Forest.

Image for post
Image for post
Survival curves allow us to estimate the probability of surviving without an “event” for various periods

Frameworks’ comparison

So, before we wrap up, let's summarize it all into a single table:

Image for post
Image for post

Final thoughts

As usual in data science, there is no one algorithm to rule them all. And, as usual in data science, your data has the answers. This review has not covered all possible approaches, or given the complete how-to for each approach. Nonetheless, we hope it serves as a good starting point for additional research, and that this post gave you some directions to think of — or even better — focused you on the direction that suits your problem best.


We're the Engineering Department of Yotpo, we share our…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store