Calculating customer lifetime value: A Python solution

Published in

Data Science at Microsoft

12 min readSep 22, 2020

By Lisa Cohen, Zhining Deng, Shijing Fang, and Ron Sielinski

Customer lifetime value (LTV) is a powerful concept. It provides a comprehensive perspective on the end-to-end customer experience by incorporating customer growth and retention trends into a single metric. In Azure, our goal is to ensure that customers are successful on the cloud, at all steps in their transformation journey. Therefore, it’s useful to have this summary metric to track our progress.

Introduction

In this article, we’ll go into our specific implementation of LTV. But first, we’ll share some foundational context about our customer flows and the general LTV calculation, to keep us all on the same page. If you’d like skip ahead, please click here to read about our cohort-based modeling approach.

Below is an overview of our customer onboarding funnel. At any given point in time, we have a number of different product and program changes in flight to enhance the customer experience. However, when evaluating the impact of these changes, we might find that they help one part of the funnel but hurt another. For example, if we launch a new service, the announcement can drive a lot of sign-ups and perhaps even initial usage. If for some reason the service fails to deliver on expectations though, we will also see an increase in churn. If we only look at the upper funnel impact, we might conclude that the launch was successful. Given this, we always need to ensure we’re taking an end-to-end perspective to ensure that the change we’re evaluating is truly delivering on customer needs. Here is an overview of the Azure trial customer funnel, to give a sense of the stages:

Performance indicators that combine multiple metrics into one are called composite scores. Some composite scores have the downside that they are not meaningful in and of themselves. However, customer lifetime value has the benefit of a specific meaning. It represents the total net profit from a single customer. By projecting the expected lifespan of the customer (using retention rates), the net spend (using historical purchase data), and the gross margin (by incorporating costs), we can calculate the expected customer value over time. Given that the metric also takes costs into account, it provides a way to monitor efficiencies and ensure that we’re taking advantage of any opportunities to improve there as well.

Methodology

In this section, we’ll demonstrate how we produce the LTV metric for Azure customers, and use formulas that you can apply to your dataset as well.

Formula

Here is the overall formula for customer lifetime value (LTV):

As you can see, the LTV calculation has three major parameters. Below are the definitions for each of these terms.

1) Margin

The “Margin” here refers to the gross margin for your product expressed as a percentage. This is a good metric to monitor your efficiencies. To calculate it, first subtract the cost of goods sold (COGS) from the revenue that you earn to determine the net profit. Then divide by the revenue to calculate the gross margin percentage. Here is the formula:

In the case of Azure, cost of goods sold includes a number of factors, such as the computing power and building costs to run the services in our data centers, promotional costs (advertising, credits, events), particular headcount costs, the cost of fraud and non-payments that we incur, and more.

In your company, you may find that gross margin is a number which your financial department tracks as part of their accounting. That way, they can design a consistent framework across products for the components to include in this calculation.

If you don’t know the gross margin, you can still use the rest of this formula (and article) to calculate the lifetime revenue of your customers. However, multiplying by gross margin (to determine LTV) allows you to account for costs and reduce the lifetime revenue amount in order to understand the net profit per customer.

Note also that the product must be in a profitable state to have a positive gross margin and for this LTV formula to make sense. Before that point (if you are in the initial launch stages of your product), you might focus more on awareness and adoption, customer development, or efficiencies to drive your gross margin to profitability.

2) Monthly revenue

In our case, monthly revenue is the average consumption per customer “ACPC”. (In some businesses, this is also called “ARPU”, which stands for average revenue per user.) We calculate this on a monthly basis, which aligns well with our business since Azure customers receive monthly bills. (Whatever timeframe you use, you’ll want to ensure that the units match the next variable in the formula, on expected tenure.)

Monthly revenue is a monetary value and should be expressed with a dollar sign, “$”. When calculating monthly revenue for Azure, we look at the past few years of data. This gives enough history to produce a large dataset, yet is recent enough that it’s still representative of the current business. When you calculate the average monthly revenue, you should include all customers, not just those that pay a non-zero amount. That way you get a sense of LTV for the entire population. Later, we’ll also discuss how you can take a more focused view to calculate LTV for different cohorts.

How you calculate average revenue per user varies depending on the type of business you’re in. However, the value that we’re trying to obtain holds true in any of these forums. For example:

In a software-as-a-service scenario, your users may pay a consistent monthly amount with their subscription. This would be true for Office 365 or Netflix, for example.
In a consumption-based subscription, as in Azure, you’ll want to review monthly charges by month, since they vary based on what the customer has used at each point in time. A telecom business is the same in this respect because customers’ usage patterns may cause changes in their monthly bills.
In a retail business without a subscription (such as a restaurant or clothing store), users make purchases of different amounts. In this model, you’ll want to sum the total amount of each user’s purchases (on a monthly, quarterly, or annual basis) to derive the average revenue per user.

3) Tenure

Lastly, “Tenure” represents the expected tenure for the population, in months (to match the monthly revenue units discussed above). To determine the expected tenure, we introduce two additional variables for this calculation:

Retention rate (r) refers to the percentage of customers who remain active (in other words, are “retained”). “Retention rate” (r) is calculated as 1 minus the “compound churn rate” (c):

The compound churn rate is calculated as follows:

In this case, ‘n’ refers to the number of months of tenure. By applying this “compound” formula, we take into account the cumulative effect of churn.

(For more information on techniques to analyze and drive retention, please see our earlier article: Retain more customers by understanding churn.)

“Time span” (t) refers to the time horizon for which we measure the customer’s lifetime value. Of course, the name “lifetime” value implies that we should be considering the full lifetime! However, in our calculations, if we let the simulation run forever, the value would be infinite, and the metric would no longer be useful. Therefore, we choose a finite duration for the metric. With this modification, the calculation essentially becomes a “net present value” for the designated time period. You might choose a time horizon of five or ten years for your net present value. We recommend thinking about how long into the future you expect the product and landscape to look similar enough to the current situation, so that the model predictions continue to hold. You might also consider aligning with the time horizons used in your long-range planning and other budget cycles. (If you can’t decide on just one time horizon, you can produce two or three versions of LTV with different time horizons, to give your stakeholders a sense for how LTV varies with time.)

By incorporating the retention rate (r) and time span (t), we get the following calculation for expected tenure:

Using all of the parameters above, we now have the following formula for LTV:

Discount rate:

Finally, one last factor to consider in the LTV equation is the discount rate. The discount rate provides a way of discounting future cash flows back to their present value. The idea here is that receiving money now is more valuable than receiving the same amount in the future, because it can be invested. The discount rate is expressed in terms of a percentage. Discount rate is an industry concept; however, your finance department should be able to share the rate that your company consistently uses. To incorporate this consideration, you can apply the discount rate to the final LTV. Here is the corresponding formula:

Cohort-based modeling approach

Given that the variables above (monthly revenue, churn, and gross margin) tend to change over time, it’s important to incorporate multiple customer cohorts from the history of your product when you develop the model. Then, you’ll also want to refresh it over time to ensure your calculations are up to date with the latest trends. In our case, we’ve programmed this model (in both R and Python, at various versions), which makes it easy to re-run as an automated solution that is directly connected to the data source. Below are some insights into what this looks like.

Overall, we use a cohort-based approach and group customers by the month they received their first bill. We saw some clear patterns emerge from this approach as we observed similar trends across customer cohorts. In the charts below, the red lines represent customers with the oldest start date, and therefore with the longest tenure to date. The colors then continue through “rainbow” order, to orange, yellow, green, blue, and ultimately purple, which is the most recent cohort and therefore has the shortest total tenure to date. Each cohort is zero-based at the bottom-left “origin” point and then continues along the x-axis, as tenure increases on Azure. (Note that the sample size for each cohort also decreases over time due to churn.)

Monthly revenue: We see that the monthly revenue of different customer cohorts follows similar trends over time (with the same general shape of the curve below across colored cohorts). We also see monthly revenue steadily increasing with newer cohorts as the product evolves with new enhancements to best serve our customers.

Monthly compound churn: When we analyze our churn curves, we see a similar shape across customer cohorts. Essentially, rates are highest at the beginning of the customer tenure as they’re evaluating product fit, and then quickly drop with a steep plunge as customers start using and building habits around the product.

Lifetime Value: Finally, putting these parts together, we calculate the LTV for each customer cohort and plot these LTV trends over time. We observe consistent trends across customer cohorts in this historical data, as summarized by the blue line in the center.

Python code: To program this yourself, first install Jupyter Notebook (from the Jupyter Project) and Python.

Below is sample Python code you can use to create a function that calculates LTV based on margin, monthly revenue, retention rate (r), and time horizon (t):

(Note: If you’re new to Python programming, here are some tutorials that give a nice introduction: Python, Pandas.)

Predictions: In order to provide a LTV for the chosen time span (t) duration, we need to predict future values as well. Given the consistent trends from our historical data, we’re able to leverage a LOESS (“locally estimated scatterplot smoothing”) regression method to accomplish this. LOESS is a very powerful strategy for fitting a smooth curve to data points without requiring a particular distribution shape for the dataset (i.e., it represents a non-parametric approach). Below you can see how we use the LOESS module in Python (tatsmodels.nonparametric.smoothers_lowess.lowess).

Conversion rates: Up to this point, we’ve been considering LTV for billable accounts (in other words, those that have converted and are in a state that they can receive bills). However, in order to determine the LTV for all trial sign-ups, we need to include both trials that convert as well as those that don’t. Therefore, we multiply the LTV for billable trials by the conversion rate to get the corresponding LTV per trial sign-up. We use geography as a key dimension that is useful for detecting different behaviors among our user base. In a later section, we’ll discuss additional characteristics that you can consider to develop a meaningful segmentation for the audience of your product.

Applications

Now that we know how to calculate LTV, what can we do with it? In this section, we’ll discuss ways that we use these insights to learn more about what’s driving LTV and leverage this data in relevant initiatives.

Segmentation analysis

While we saw above that the overall shape of our LTV curves are consistent, we do observe differences in the values among customer cohorts. These differences can provide gateways for learning, and also allow us to take more targeted approaches. For example, one of the primary methods that we use for developing LTV customer cohorts (as mentioned above) is to segment the population by geography. Below you can see a breakdown of LTVs by geography:

Of course, by definition, some geographies fall above the worldwide average, while others fall below. Using this information, we can look more deeply to understand what is causing certain geographies to have higher LTV and learn from those best practices. Given the variation that we see here, we also leverage this insight to ensure that models which depend on LTV consider geo-specific trends.

In addition to geography, here are a few additional segmentations that we’ve found valuable for comparing LTV:

Segmenting by user audience (for example, student, startup, partner, developer, enterprise, and others).
Segmenting by traffic source (for example, customers who come to Azure organically versus those who click on an advertisement).
Segmenting by offer source (for example, customers who start with a free trial to evaluate first versus those who subscribe directly).

We strongly encourage you to consider the segmentation that makes sense for your business. When you find a good one, it will expose interesting variations among your customers. The examples above might trigger some ideas for your own product. For enterprise products, you can also consider customer size, technology stack, role, and language. For consumer products, a few examples include age, gender, culture, and income level.

Customer-level predictions

Beyond viewing the average LTV for a customer cohort, sometimes we want to predict the LTV for a specific customer. In this case, the LTV is really a customer-level forecast. As part of developing the forecast, we work to minimize the historical data required so that the forecast is available as early as possible. As illustrated in blue below, we include a prediction interval (at 10 to 90 percent confidence levels) in order to help our business stakeholders understand the expected variability.

Fig. Customer-level forecast. (Credit: Steve Broschart for the model and Nancy Organ for the visualization.)

Understanding customer opportunities

Without LTV, we typically have only customer history for context when we engage with customers. However, LTV gives us insight into the future. Using this perspective, in addition to being a model output, LTV can also be used as a model input.

For example, knowing the LTV potential of specific customers can help prioritize our nurture efforts. Another scenario is fraud risk detection. By understanding the LTV potential of legitimate users (versus the cost of fraudulent users), we can set an optimal threshold for acceptable fraud transmission in our risk detection models.

Conclusion

In this article, we’ve provided context on why customer lifetime value is a key metric to track for your business. We also walked through the formula and how to apply it to a dataset. Finally, we discussed additional applications for LTV, including segmentation and customer-level predictions. Please leave a comment and let us know how you plan to put this to use!