Customer life time value (LTV) is a fairly common term that gets tossed around within startups. Depending on the context, the life time value and the calculation needed to get the number could be surprisingly variant. In general, LTV is a dynamic concept representing the net profit attributed to the entire relationship with a customer. LTV is a very influential metric in shaping business decisions because it promotes the concept of longer-term customer relationship management rather than focusing on the immediate profitability. A more accurate understanding of LTV can also allow a company to confidently lower prices and offer more incentives. However, it is usually difficult to quantify this metric, because the projection into the future needs to be done with various assumptions and goals, which leads to methods ranging from crude heuristic to sophisticated machine learning models. In this article I will review several variations of LTV models along with their assumptions, calculations and the scenarios they fit into.

**Heuristic Measure**

It is hard to put a number on the future purchases — the churn rate is easier to quantify and can be used to approximate heuristic LTV. Thus, one of the ways to calculate LTV is as follows:

Churn rate is the percentage of customers who end their relationship with a company in a given period and the assumption is that the churn rate is constant cross the life time of the customer. Average contribution per order is calculated within the same given period and assumed to be invariant across time as well. These simplifying assumptions bring the deciding parameters down to only two, but this approach approximates customer behavior at group level and therefore loses resolution of each individual.

**BTYD (Buy ’Til You Die) Statistical Model**

Use Case: You have some amount of purchase history

As the need for targeted or even personalized marketing grows, a statistical model is crafted to estimate the probability of future purchasing events. Specifically, each customer’s purchase decision is modeled as two subsequent events: a “coin” flip to determine whether a customer has churned in the period and a “dice” toss to determine how many times the customer will order in the same period. The model assumes each customer makes the decision based on their own situation, meaning that each customer has a churning rate of their own. To achieve generalization, the modeling target is not individual’s churning rate, but the churning rate distribution among the cohort. For example, in the cohort of study 35% of the customers are quite loyal to the brand, and they have a churning rate below 0.2, while 20% of the customers are only trying out the product and they generally have a higher churning rate at 0.7, and the churning rate of the remaining 45% of customers are spread between 0.2 and 0.7. Similarly, the cohort could have 40% of the members who purchased over 10 times per year, 20% purchased less than 5 times in a year and 40% purchased within the range of 5 to 10 times. Mathematically, this continuous churning rate distribution can be modeled as geometric distribution and similarly the distribution of number of purchasing events is modeled by poisson process. Each process(cohort churn rate distribution — geometric, cohort in-period-purchase-number distribution — poisson) is defined by 2 parameters, making for 4 total.

To solve for the two cohort-level distributions (4 parameters total), the model ingests customers’ purchasing pattern as a cohort. With the optimized distributions, the model can predict an individual’s LTV by estimating where the individual’s churn and poisson parameters lie on the respective cohort distributions. To estimate an individual’s churn and poisson parameters, the following information is needed about the individual:

- Frequency: the number of repeated purchases the customer has made
- Age: the time duration since customer’s first purchasing event
- Recency: the age of the customer when they made their most recent purchases

In the example illustrated above, two customers, A and B, have the same Age and Recency, but A has a much higher historical frequency compared to B. It is intuitive to say that by now A has a higher propensity of churning comparing to B because A has ceased following his/her regular schedule for a long while, and this is exactly how the model makes predictions as well. As result, even though A might have a higher value than B at the current moment, the model thinks customer B will have a higher LTV in the long run.

This statistical model provides personalized prediction and possesses several merits including fast optimization and easy deployment. In addition, this model estimates the churning probability as an intermediate step, which provides specific targets for customer engagement. However, to achieve reliable prediction with this model, a decent length of purchasing history is needed. Like in the example above, if A and B only has two orders in their history, it will be hard to draw the pattern. In addition, there are other factors or information that your team might be hypothesizing and wanted to be incorporated in the prediction of the LTV, such as customer’s marketing engagement or demographics, and the statistical model’s setup is generally not quite flexible to accommodate features outside the purchasing behavior.

This model has been implemented by multiple programming languages, and here are the links:

- Python: https://lifetimes.readthedocs.io/en/latest/index.html
- R: https://cran.r-project.org/web/packages/BTYD/

**Machine Learning Model**

Use Case: No or little purchasing history at the time of prediction

Chances are that the your LTV requirements don’t fit into the existing models and their assumptions. For example, your business stakeholder wants to use LTV to assess the quality of marketing conversions, which requires the LTV to be available immediately after the first conversion (eg. first purchase). With no historical purchasing behavior, a customized machine learning model can be used here to solve the problem. In here, I will go over the thought practice of crafting a machine learning model to predict LTV without any purchasing history. Detailed implementation and practical considerations will be introduced in another article.

In here, we will use supervised learning to comprehend what already happened, known as “training data” in order to acquire the underlying pattern of the LTV. In other words, if we want to train an LTV machine learning model, we need to have a list of customers with known LTV. Instead of calculating the value across the whole lifetime, we could approximate this number with timing constraint, like LTV within the first 1 years. How to choose the timing constraint is totally based on the business need and the total data history available for this study. Based on the pre-determined timing constraint, each customer who has started at least 1 year ago will be included for this study, and their values at 1 year mark will be calculated individually and used as target of the machine learning model.

Next step, we want to determine what features should be included in this model. Since the model is scheduled to give predictions shortly after the conversion, the information such as customer engagement generated later on needs to be excluded from the training features. For example, the customer’s value within the first 3 months would be available for training, but when making predictions with newly-converted costumers this value won’t be available and renders prediction failure or inaccurate.

After scrutinizing a list of features, a machine learning model can be trained using whichever algorithm makes the most sense in this application. Before wrapping up the work, it is important to check on the performance of the model on a randomly selected hold-out set. This evaluation should serve as a measure of how confident you feel about the prediction and a guide for future iterations and improvements.

**Summary**

There’s no certain way of building a LTV model and the models introduced above fit in different scenarios and have their own merits and limitations. By reading this article, next time when you’re considering building a predictive LTV model, I hope you would know what options you have and the several key points needed to be worked out with your stakeholders: what the model is used for, when the prediction should takes place in the lifetime journey of a customer and what data is available to use.