Understanding the Customer Lifetime Value with Data Science

Elizaveta Lebedeva, Data Scientist at Bolt

Bolt

Follow

Published in

Bolt Labs

6 min readNov 15, 2018

--

Customer relationships are important for every business, playing a crucial role in the company’s growth. One of the important metrics to understand is customer lifetime value (LTV) — the net profit attributed to the entire future relationship between the company and the client. The more users consume the product and the longer they continue to use it, contributing to profit, the higher the LTV.

There are lots of marketing articles on this topic that present importance of LTV and customer segmentation. As data scientists, we are interested in math formulas and understanding how the model works. How can we predict LTV based on just 3 features? In this article, I will show you some models that we are using for marketing segmentation at Taxify and explain what stands behind them. There will be lots of formulas, but don’t be taken aback: everything is already implemented in Python libraries. The goal is to show you how math does all the magic.

Beta-geometric/negative binomial model for customer alive probability

Let’s consider an example: user X signed up 1 month ago and made 4 rides, with the last ride taken 20 days ago. Based on this data only, the model can predict the probability of the user activity during a specific period of time (as shown on graph) and the expected number of transactions in a certain future period (which is the base to understand the whole value of particular customer during lifetime).

*The model gives an immediate business insight: take some action towards a user when his or her probability of being active reaches a certain threshold to prevent churn.*

This model was proposed by Fader, Hardie and Lee and is called Beta Geometric / Negative Binomial distribution model (BG/NBD).

BG/NBD model has the following properties:

When a user is active, a number of his or her transactions in a time period of length t is described by Poisson distribution with transaction rate λ.

Poisson distribution helps to predict certain events happening using the data about how often event occurred in past. For example, if a user made 2 purchases per week on average (λ=2 on the graph below), the probability of making 3 orders next week is 0.18.

2. Heterogeneity in transaction rate across users (meaning how customers differ in purchasing behavior) has Gamma distribution with parameters r (shape) and α (scale).

Gamma distribution arises naturally in processes with waiting time between Poisson distributed events (as in our case for transaction rate λ). Let’s take a user who makes 2 purchases per week on average. In this case, the probability that the waiting time before the user makes 3 purchases will be more than 4 weeks is equal to the area to the right from the dotted line on the graph — 0.13.

3. Users may become inactive after any transaction with probability p and their dropout point (when they become inactive) is distributed between purchases with Geometric distribution.

Geometric distribution is similar to Bernoulli trials and is used for modeling the number of trials up to and including the first success. If for a certain user p=0.2, the probability of being inactive after 3 transactions for them is 0.12 (blue line on the graph).

4. Heterogeneity (variation across users) in dropout probability has Beta distribution with the two shape parameters α and β.

Beta distribution is the best for representing a probabilistic distribution of probabilities — the case where we don’t know what a probability is in advance, but we have some reasonable priors, described by α and β (mean of a Beta distribution α / (α+β)).

For our example with a customer having prior drop-out probability 0.2, orange line with α = 2 and β = 8 on the plot describes probability density function of the user’s drop-out probability.

5. Transaction rate and dropout probability independently vary across users.

Math notation to represent the features of a user X:
X = x, t_x, T, where x is the number of transactions at some period of time (0, T], and t_x (<=T) is the time of the last purchase.

Based only on these features, the model predicts future purchasing patterns of customers:

P(X(t) = x) — probability of observing x transactions in the period t in the future
E(Y(t) | X = x, t_x, T) — expected number of transactions in the period for a customer with observed behavior.

Now we can derive these 2 main characteristics. Without going into too much detail, I will just present final formulas (more derivations are found in papers).

Probability of being active

Expected number of transactions

Where 2F1 is the Gaussian hypergeometric function

To sum up, if we obtain estimates of model parameters r, α, a, b (for example, using Maximum likelihood estimator), we can forecast the expected number of transactions for users.

Gamma-Gamma model for Customer lifetime value

Up until now, we have only used recency and frequency of customer purchases. But we can also use the data about the monetary value of user’s transactions. Let’s add this new information into the example: user X made these 4 rides with prices 10, 12, 8, 15. The Gamma-Gamma Model can predict the most likely value per transaction in the future.

Altogether, we now have all elements ready to determine the lifetime value of a customer

LTV = expected number of transaction * revenue per transaction * margin

where the first element is from BG/NB model, the second element is from Gamma-Gamma model and the margin is defined by the business.

Math notation for Gamma-Gamma model :

The customer has x transactions with z1, z2, … values, m_x = Zi/x is observed mean transaction value
E(M) is unobserved mean transaction value and we are interested in E(M | mx, x) — expected monetary value of a customer giving his/her purchasing behavior.

The properties of Gamma-Gamma model are:

Monetary value of users’ transactions is random around their mean transaction value.
Mean transaction value varies across users but doesn’t vary for an individual user over time.
Mean transaction values is Gamma distributed across customers.

Going through a bunch of gamma distribution (details in the paper), we have

where p is shape and v is scale parameters of gamma distribution for transactions Zi, q is shape and γ is scale parameters for gamma distribution of v (p is constant by assumption — individual-level coefficient of variation is the same for all customers). As before, we can use the maximum likelihood method to estimate model parameters.

We are done with math and have the customer LTV ready! But what about the model performance?

Evaluating model performance

The traditional approach suggests that we divide the dataset into training/validation parts. In original papers, authors show that this approach performs well. I’ve applied these methods to real datasets and have also got some promising results.

The graph shows the distribution of real and predicted transactions in validation period: error here is 2.8%.

How to apply

As I mentioned at the beginning of article, everything is already implemented for you. For example, Python library “lifetimes” provides us with all functions and metrics we need to estimate LTV. The well-written documentation contains lots of examples and explanation. It also has sample sql queries to extract data in a suitable form. So it takes just a few minutes to get started.

Summary

In this blog post, I explained in detail how customer LTV can be estimated using just a few features.

I would like to underline that we can step apart from popular frequently used gradient boosted trees and try a different approach which has a comparable level of performance. Statistical learning still can be applied in practice and can help business by giving insights about their customers.

About the author

Elizaveta Lebedeva works as Data Scientist at Taxify. Her main focus is supporting lifecycle marketing campaigns, ensuring the company growth and delivering best experience for riders and drivers.

Being passionate about math and having degrees in finance and economics, she transitioned to Data Science from Business Analytics, marking her path with numerous math competitions and hackathons.

Join Taxify’s Data Science team.

References

Fader, Peter & G. S. Hardie, Bruce & Lok Lee, Ka. (2005). “Counting Your Customers” the Easy Way: An Alternative to the Pareto/NBD Model. Marketing Science.

Fader, Peter & G. S. Hardie, Bruce (2013). The Gamma-Gamma Model of Monetary Value.

Fader, Peter S., Bruce G. S. Hardie, and Ka Lok Lee (2005), “RFM and CLV: Using Iso-value Curves for Customer Base Analysis,” Journal of Marketing Research.