Customer Lifetime Value

On a Real-World Business Problem

Deniz Cansu Turan
7 min readMar 6, 2023

Customer Lifetime Value (CLTV) is a measure of the total revenue a customer brings to the company as long as they remain a customer.

Why is CLTV an important parameter?

Cltv is also used to estimate the future profit a customer is likely to bring to the company. Therefore, it helps to arrange the relationship with particular customers and make long-term marketing strategies customer-based. Companies can decide the budget for these efforts since they have an idea how much they are going to make in a certain time.

How to calculate CLTV?

There are multiple ways to calculate CLTV. One thing to consider, estimating and calculating CLTV are two different concepts.

Let’s take a simple approach:

  • Repeat Rate (Retention Rate):

Number of customers who made a purchase more than once / Total Number of Customers

  • Churn Rate = 1 - Repeat Rate

It is a constant that is determined and used to adjust the customer value.

  • Average Order Value = Total Price / Total Transaction
  • Customer Value = Average Order Value * Purchase Frequency
  • Purchase Frequency = Total Transaction / Total Number of Customers
  • Profit Margin = Total Price * 0.10
  • CLTV = (Customer Value / Churn Rate) * Profit Margin

Example:

Let’s say in a company;

  • Total number of customers: 100
  • Churn rate: 0.8
  • Profit: 0.10

Total(toplam)of 3 transactions of a customer(müşteri):

Average Order Value = 1200 / 3
Purchase Frequency = 3 / 100
Profit Margin = Total 1200 * 0.10 = 120
Customer Value = 400 * 0.03 = 12
CLTV = 12 / 0.8 * 120 = 1800

CLTV is calculated for each customer and then grouped to form segments. Promising customer for future can be segmented too.

Now, perform these steps on a real-world business problem:

CUSTOMER LIFETIME VALUE

Dataset is the same dataset that I discussed in Customer Segmentation with RFM article. Therefore, there may be some similarities in code such as columns. However, this time customer lifetime value (CLTV) will be calculated rather than RFM. CLTV does not include recency parameter, so we can say that RFM is performed for analyzing current relationship with the customer; on the other hand, CLTV is performed for analyzing life-time relationship with the customer.

Dataset includes 8 variables namely Invoice, StockCode, Description, Quantity, InvoiceDate, Price, Customer ID, and Country. Invoices that start with “C” are for cancelled transactions. CLTV will be calculated for years 2010–2011, however it can be used for years 2009–2010 too since the process will be generalized at the end.

This is a public dataset, you access from here.

1.Data Preparation:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler

pd.set_option('display.max_columns', None) # to see all the variables
pd.set_option('display.float_format', lambda x: '%.5f' % x) # deciaml preference for float numbers

df_ = pd.read_excel("dataset location", sheet_name="Year 2010-2011")
df = df_.copy()
# if something goes wrong we do not need to read the data again
# working on the copy right now.

df.shape
df.head()

df.isnull().sum()
# null values are checked and missing customer ids are dropped
# since without knowing the customer, we do not know whose CLTV that we are calculating is


df = df[~df["Invoice"].str.contains("C", na=False)]
# in dataset story it is mentioned that invoices starting with C are cancelled ones
# no need for them to stay
# ~ means everything except this:

df.describe().T
# to check if there is something out of the ordinary
# after adjustments, it must be run to control again

df = df[(df['Quantity'] > 0)]
# negative values for quantity found in describe, they should be eliminated
# since there can not be negative quantity of a product

df.dropna(inplace=True)
# to eliminate missing customer ids info and missing description info

df["TotalPrice"] = df["Quantity"] * df["Price"]
# same products is ordered multiple times in the same invoice
# TotalPrice is needed for a product type


cltv_c = df.groupby('Customer ID').agg({'Invoice': lambda x: x.nunique(),
'Quantity': lambda x: x.sum(),
'TotalPrice': lambda x: x.sum()})

# analysis is customer based so dataset is groupby the id,
# necessary columns (metrics for cltv )are calculated
# and saved as a new dataframe cltv_c

cltv_c.columns = ['total_transaction', 'total_unit', 'total_price']
# columns are renamed
cltv_c.head()

Before, there were multiple data for a customer. After the adjustments, data is grouped for each customer, needed parameters are extracted. Now, we have 4339 unique customers to focus on with summary of their behaviors.

2.Average Order Value (average_order_value = total_price / total_transaction)

cltv_c["average_order_value"] = cltv_c["total_price"] / cltv_c["total_transaction"]
# necessary columns exist, easily calculated and added as a new column

3.Purchase Frequency (total_transaction / total_number_of_customers)

cltv_c.shape[0]
# shape function returns a tuple with row and column number of a dataframe
# by indexing we obtained the raw number of the cltv_c dataframe
# which is the exactly the total number of customers

cltv_c["purchase_frequency"] = cltv_c["total_transaction"] / cltv_c.shape[0]
# purchase frequency calculated and added

4.Churn Rate & Repeat Rate (Number of customers who made a purchase more than once / Total Number of Customers)

repeat_rate = cltv_c[cltv_c["total_transaction"] > 1].shape[0] / cltv_c.shape[0]
# same idea, just a condition added in cltv_c and again row number is taken
# as number of customers
# total_transaction is the number of unique invoices
# retention rate, customers we consider ourselves.

churn_rate = 1 - repeat_rate
# by definiton, if a customer did not order more than once
# it is accepted that it is a customer churn

5.Profit Margin (total_price x 0.10)

cltv_c['profit_margin'] = cltv_c['total_price'] * 0.10

# 0.10 is a constant that is needed to be calculated by the company
# profit_margin is added, it is profit after all the expenses

6.Customer Value (customer_value = average_order_value x purchase_frequency)

cltv_c['customer_value'] = cltv_c['average_order_value'] * cltv_c["purchase_frequency"]
# these customer values are needed to be corrected by churn rate and profit margin

7.Customer Lifetime Value (CLTV = (customer_value / churn_rate) x profit_margin)

cltv_c["cltv"] = (cltv_c["customer_value"] / churn_rate) * cltv_c["profit_margin"]
# while churn_rate is a constant, profit margin is customer-based

cltv_c.sort_values(by="cltv", ascending=False).head()
# customers are sorted according to their final lifetime values

Notice that, Customer ID corresponds to index rather than a column. For instance; Customer 14646 made 74 orders, purchased 197491 unit quantity product, totally paid 280206 unit money in 2010–2011.

8.Creating segments

Until now, CLTV is calculated for each customer. Main purpose is to segment these customers with the help of CLTVs. Unlike RFM, there is not predetermined lines to divide customers’ scores.

There are 4339 customers, trying to manage an identical relationship with each customer is unrealistic. Should we even focus on everyone? Therefore, there is a need for prioritizing customers. CLTV is a parameter including above-mentioned metrics to lead this prioritization.

cltv_c.sort_values(by="cltv", ascending=False).tail()
# to observe the worst cases


cltv_c["segment"] = pd.qcut(cltv_c["cltv"], 4, labels=["D", "C", "B", "A"])
# it is decided to create 4 groups namely A, B, C, D
# new column segment is added
# qcut function from pandas groups a column given the number of groups and labels
# however, by default, qcut sorts the values ascending
# since the bigger the CLTV, the better; labels have to be written backwards

cltv_c.sort_values(by="cltv", ascending=False).head()
# expected to have A segment on the head

cltv_c.sort_values(by="cltv", ascending=False).tail()
# expected to have D segment on the tail

cltv_c.groupby("segment").agg({"count", "mean", "sum"})
# to analyze the segments on the basis of all variables
# for example, A segment customers made 10.4 transaction on average

cltv_c.to_csv("cltc_c.csv")
# csv file is created

By this analyse, what can be concluded as a company strategy?

  • A segment may be focused on mainly.
  • Countries of customers with different segments can be examined.
  • C and D segment can be merged and be called as C segment.

9.Functionalization

Whole process is defined in a function so that we can perform all the above on the year 2009–2010 too within seconds. Profit is defined by the company, so an option is added to control the profit value. Full version of the code on GitHub!

def create_cltv_c(dataframe, profit=0.10):

***

return cltv_c

P.S. This article and example are a great start for customer lifetime value subject. Now that we have an understanding of it, following article will be on prediction of CLTV with a real-world problem. We are making progress on CRM (Customer Relationship Management) Analytics:)

Connect with me on LinkedIn

Connect with me on Medium

Connect with me on GitHub

Reference:

Miuul Lecture Notes

--

--

Deniz Cansu Turan

Data Science & Machine Learning Enthusiast || Engineer <Sharing as I learn>