Understanding the Customer Lifetime Value with Data Science
Customer relationships are important for every business, playing a crucial role in the company’s growth. One of the important metrics to understand is customer lifetime value (LTV) — the net profit attributed to the entire future relationship between the company and the client. The more users consume the product and the longer they continue to use it, contributing to profit, the higher the LTV.
There are lots of marketing articles on this topic that present importance of LTV and customer segmentation. As data scientists, we are interested in math formulas and understanding how the model works. How can we predict LTV based on just 3 features? In this article, I will show you some models that we are using for marketing segmentation at Taxify and explain what stands behind them. There will be lots of formulas, but don’t be taken aback: everything is already implemented in Python libraries. The goal is to show you how math does all the magic.
Beta-geometric/negative binomial model for customer alive probability
Let’s consider an example: user X signed up 1 month ago and made 4 rides, with the last ride taken 20 days ago. Based on this data only, the model can predict the probability of the user activity during a specific period of time (as shown on graph) and the expected number of transactions in a certain future period (which is the base to understand the whole value of particular customer during lifetime).
This model was proposed by Fader, Hardie and Lee and is called Beta Geometric / Negative Binomial distribution model (BG/NBD).
BG/NBD model has the following properties:
- When a user is active, a number of his or her transactions in a time period of length t is described by Poisson distribution with transaction rate λ.
2. Heterogeneity in transaction rate across users (meaning how customers differ in purchasing behavior) has Gamma distribution with parameters r (shape) and α (scale).
3. Users may become inactive after any transaction with probability p and their dropout point (when they become inactive) is distributed between purchases with Geometric distribution.
4. Heterogeneity (variation across users) in dropout probability has Beta distribution with the two shape parameters α and β.
For our example with a customer having prior drop-out probability 0.2, orange line with α = 2 and β = 8 on the plot describes probability density function of the user’s drop-out probability.
5. Transaction rate and dropout probability independently vary across users.
Math notation to represent the features of a user X:
X = x, t_x, T, where x is the number of transactions at some period of time (0, T], and t_x (<=T) is the time of the last purchase.
Based only on these features, the model predicts future purchasing patterns of customers:
- P(X(t) = x) — probability of observing x transactions in the period t in the future
- E(Y(t) | X = x, t_x, T) — expected number of transactions in the period for a customer with observed behavior.
Now we can derive these 2 main characteristics. Without going into too much detail, I will just present final formulas (more derivations are found in papers).
- Probability of being active
- Expected number of transactions
Where 2F1 is the Gaussian hypergeometric function
To sum up, if we obtain estimates of model parameters r, α, a, b (for example, using Maximum likelihood estimator), we can forecast the expected number of transactions for users.
Gamma-Gamma model for Customer lifetime value
Up until now, we have only used recency and frequency of customer purchases. But we can also use the data about the monetary value of user’s transactions. Let’s add this new information into the example: user X made these 4 rides with prices 10, 12, 8, 15. The Gamma-Gamma Model can predict the most likely value per transaction in the future.
Altogether, we now have all elements ready to determine the lifetime value of a customer
LTV = expected number of transaction * revenue per transaction * margin
where the first element is from BG/NB model, the second element is from Gamma-Gamma model and the margin is defined by the business.
Math notation for Gamma-Gamma model :
- The customer has x transactions with z1, z2, … values, m_x = Zi/x is observed mean transaction value
- E(M) is unobserved mean transaction value and we are interested in E(M | mx, x) — expected monetary value of a customer giving his/her purchasing behavior.
The properties of Gamma-Gamma model are:
- Monetary value of users’ transactions is random around their mean transaction value.
- Mean transaction value varies across users but doesn’t vary for an individual user over time.
- Mean transaction values is Gamma distributed across customers.
Going through a bunch of gamma distribution (details in the paper), we have
where p is shape and v is scale parameters of gamma distribution for transactions Zi, q is shape and γ is scale parameters for gamma distribution of v (p is constant by assumption — individual-level coefficient of variation is the same for all customers). As before, we can use the maximum likelihood method to estimate model parameters.
We are done with math and have the customer LTV ready! But what about the model performance?
Evaluating model performance
The traditional approach suggests that we divide the dataset into training/validation parts. In original papers, authors show that this approach performs well. I’ve applied these methods to real datasets and have also got some promising results.
How to apply
As I mentioned at the beginning of article, everything is already implemented for you. For example, Python library “lifetimes” provides us with all functions and metrics we need to estimate LTV. The well-written documentation contains lots of examples and explanation. It also has sample sql queries to extract data in a suitable form. So it takes just a few minutes to get started.
Summary
In this blog post, I explained in detail how customer LTV can be estimated using just a few features.
I would like to underline that we can step apart from popular frequently used gradient boosted trees and try a different approach which has a comparable level of performance. Statistical learning still can be applied in practice and can help business by giving insights about their customers.
About the author
Elizaveta Lebedeva works as Data Scientist at Taxify. Her main focus is supporting lifecycle marketing campaigns, ensuring the company growth and delivering best experience for riders and drivers.
Being passionate about math and having degrees in finance and economics, she transitioned to Data Science from Business Analytics, marking her path with numerous math competitions and hackathons.