Estimating Customer Lifetime Value Via Cohort Retention

CLV or LTV as they call it

The Startup
Published in
7 min readAug 3, 2020

--

This is Part I of the two-part series dedicated to estimating customer lifetime value. In this post, I will describe how to estimate LTV, on a conceptual level, in order to explain what we’re going to be doing in Part II with the Python code.

First of all, why LTV? There are two reasons: creating a benchmark for customer acquisition costs (CAC) and comparing customers, e.g. if we’re targeting those who spend more or less than an average customer.

Many sources talking about using churn or retention to estimate customer lifetime value (LTV), and while the core idea remains the same, approaches to its calculation differ dramatically. So, while any analyst will benefit from reading this article, its primary objective is to explain how historical retention data can be used to estimate LTV for customers. We are not going to use statistical techniques to estimate churn and build our predictions. Instead, we will be making use of historical retention, which is an easier place to start with.

Why retention? The issue with customer lifetime value is the customer lifetime. If we’re talking subscription-based service, an estimate for customer brought-in value is recurring revenue (RR), or the amount a customer pays for a subscription. If your customer has a possibility to skip a period, however, do not forget to adjust for that (estimate the average % of skips).

What we do not know is how long a new customer will stay within the business, so we are trying to make an educated guess based on earlier acquired customers. It is often suggested that we calculate lifetime as an overall metric for the whole customer base, which gives a confusing average: across customers who could have spent years with the business, at least potentially, and those customers who joined last week or yesterday. At the same time, while older cohorts are good for analysis, we’d like our metrics to be actionable, and hence, to make estimates for younger cohorts. A retention matrix, or a curve, visually represent how many of the acquired customers stayed with the business, continuing to generate revenue. It is based on actual data, so you can start identifying patterns and approximate those for newer customers. So, how?

Cohorts and retention matrix
Because customers join the business at different times, there should be a way to “normalize” their retention. A simple example: 10% of the customers who joined a year ago are still with the business; however, 90% of last month’s customers are still with us. By no means, this implies that customers who joined last month are better (or worse) than the last year’s customers. They simply had less time to show how “sticky” or valuable your business is for them.

For this reason, we can (and should) split customers into cohorts (groups), based on the time they joined. Normally, cohorts and their retention are analyzed looking on a retention matrix, or similarly, a retention curve. In the matrix below, each square represents the proportion of originally acquired users that moved (re-ordered, re-subscribed) in the next month. For simplicity, I colour-coded them, as also shown below.

Figure 1: Sample scale for all the colours in this post — from yellow (~ 0–15%) to darker green (~90–100%)
Figure 2: Retention Matrix; y-axis, acquisition period (month), x-axis, tenure, or the time passed after the acquisition; for colour, see Figure 1.

There are 2 axes: time joined, or the acquisition month (y-axis), and subsequent periods, or tenure (x-axis, weeks, months, years, whatever makes sense for the business). The first thing you would notice in most of the retention matrices is that retention tends to fade away as the tenure increases, although the rate might be different. The matrix alone can be used to compute the average actualized lifetime for a cohort, or an average amount of time a customer in a cohort used your product. This calculation simply the sum of retention by row (example here).

Another thing that you will quickly notice here is that the matrix will always be half-empty, and our first aim is to figure out the question marks below:

Figure 3: The what do we need to estimate?

Obviously, that’s because younger cohorts have had a lesser actualized lifetime. Ideally, we’d love to know the lifetime (and value) for them. And still, how?

Extrapolation
One of the easy approaches would be to fill the values for newer customers based on averaging of previous cohorts’ performance. Because averaging retention directly might be too rough and disconnected from actuals for newer cohorts, we can make use of marginal retention. That’s different from cohort retention as it’s retention period-over-period (e.g., month-over-month or week-over-week). The period will depend on your business cycle.

Figure 4: Marginal Retention, describes how many customers that paid for the service in a previous period “migrated” into the next one, tends to increase as the tenure increases, and ideally approaches 100%

Once we know the marginal retention for cohorts that have their values actualized, we “drag them down” to extrapolate for younger cohorts, element by element. As an option, we can be using a simple average of the last N cohorts, starting from the earliest cohort, or the first rows in the matrix below. This way, you’ll have a moving average of N rows for row numbers N and higher, and an average of all previous values for rows N-1 and lower. You can use it as a blueprint and make it more convoluted, including seasonality aspects, your assumptions about future changes, etc.

Figure 5: Extrapolate marginal retention top-bottom, element by element

As a result, your matrix will be filled with actual values of marginal retention above the diagonal and estimates below the diagonal. In our case, the latter is moving averages with a window of min 1, max N, depending on a row number.

Figure 6: Marginal retention extrapolated to fill the full matrix

After you’ve got the whole square matrix populated, we can extrapolate retention. It can be done element-wise by “dragging it right”: 1) multiplying the last value in the retention matrix by the next-column value in the marginal retention matrix, which will update the retention estimate in a retention matrix 2) repeating the same procedure for the entire row, taking the last retention estimate in a row and multiplying it by the next value in a marginal retention matrix.

Figure 7: Combine extrapolated marginal retention and cohort retention to fill the cohort retention matrix

This way, we have a fully populated retention matrix with actual values above the diagonal and the estimates below the diagonal. Average lifetime estimate for a cohort is just a sum of retentions for the cohort. (If you want me to describe why it is true, please comment below or highlight and comment, I would like to keep the current post more to the point).

Value
The value we’re talking about should ideally reflect the recurring amount generated by a customer, net of operational costs associated with delivering a service or a product to a customer. For example, if you are in a delivery business, you’d want to exclude delivery costs. If you need to maintain infrastructure for a customer, you’d exclude that. Any discounts that typically apply to a payment, should be considered, so we can arrive at gross margin. Be careful though, if you have first-time customer offers, make sure to not extrapolate those discounts into the future as it will significantly lower your lifetime value.

If the above seems complicated, consider starting with gross revenue instead of margin and work from there. I see value in starting with a top line and working your way to a gross margin, so further on you can make educated assumptions about revenue dynamics in your business, given its different components. It can also be a good starting point for LTV modelling if you want to assume the impact of your marketing or product improvement efforts.

Life Time Value
After we’ve nailed the above, lifetime value estimate is just a product of lifetime and value, on a cohort basis. You can take a simple or weighted average to give more weight to newer cohorts or cohorts with more customers.

And there we have it!

In essence, we’ve made use of cohort retention and marginal retention to extrapolate the former onto newer cohorts. An interesting fact that is not always apparent is that lifetime will be a sum of retention for a cohort, across the tenure axis.

Sources: there’s a very straightforward and comprehensive explanation of the topic in this blog post

Thanks for reading this post and getting this far! Hope it was helpful and if you have any comments, please leave them below.

You can also contact me on LinkedIn: https://www.linkedin.com/in/areusova/ (mention that you’re coming from this Medium post).

Or Twitter https://twitter.com/khunreus.

P.S. I have written a small follow-up here: Weighted Cohort Lifetime. In case you’d want to see how retention adds up to get to the average lifetime of a cohort.

And a Python implementation here.

--

--