Customer Lifetime Value
On a Real-World Business Problem
Customer Lifetime Value (CLTV) is a measure of the total revenue a customer brings to the company as long as they remain a customer.
Why is CLTV an important parameter?
Cltv is also used to estimate the future profit a customer is likely to bring to the company. Therefore, it helps to arrange the relationship with particular customers and make long-term marketing strategies customer-based. Companies can decide the budget for these efforts since they have an idea how much they are going to make in a certain time.
How to calculate CLTV?
There are multiple ways to calculate CLTV. One thing to consider, estimating and calculating CLTV are two different concepts.
Let’s take a simple approach:
- Repeat Rate (Retention Rate):
Number of customers who made a purchase more than once / Total Number of Customers
- Churn Rate = 1 - Repeat Rate
It is a constant that is determined and used to adjust the customer value.
- Average Order Value = Total Price / Total Transaction
- Customer Value = Average Order Value * Purchase Frequency
- Purchase Frequency = Total Transaction / Total Number of Customers
- Profit Margin = Total Price * 0.10
- CLTV = (Customer Value / Churn Rate) * Profit Margin
Example:
Let’s say in a company;
Total number of customers: 100
Churn rate: 0.8
Profit: 0.10
Total(toplam)of 3 transactions of a customer(müşteri):
Average Order Value = 1200 / 3
Purchase Frequency = 3 / 100
Profit Margin = Total 1200 * 0.10 = 120
Customer Value = 400 * 0.03 = 12
CLTV = 12 / 0.8 * 120 = 1800
CLTV is calculated for each customer and then grouped to form segments. Promising customer for future can be segmented too.
Now, perform these steps on a real-world business problem:
CUSTOMER LIFETIME VALUE
Dataset is the same dataset that I discussed in Customer Segmentation with RFM article. Therefore, there may be some similarities in code such as columns. However, this time customer lifetime value (CLTV) will be calculated rather than RFM. CLTV does not include recency parameter, so we can say that RFM is performed for analyzing current relationship with the customer; on the other hand, CLTV is performed for analyzing life-time relationship with the customer.
Dataset includes 8 variables namely Invoice, StockCode, Description, Quantity, InvoiceDate, Price, Customer ID, and Country. Invoices that start with “C” are for cancelled transactions. CLTV will be calculated for years 2010–2011, however it can be used for years 2009–2010 too since the process will be generalized at the end.
This is a public dataset, you access from here.
1.Data Preparation:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
pd.set_option('display.max_columns', None) # to see all the variables
pd.set_option('display.float_format', lambda x: '%.5f' % x) # deciaml preference for float numbers
df_ = pd.read_excel("dataset location", sheet_name="Year 2010-2011")
df = df_.copy()
# if something goes wrong we do not need to read the data again
# working on the copy right now.
df.shape
df.head()
df.isnull().sum()
# null values are checked and missing customer ids are dropped
# since without knowing the customer, we do not know whose CLTV that we are calculating is
df = df[~df["Invoice"].str.contains("C", na=False)]
# in dataset story it is mentioned that invoices starting with C are cancelled ones
# no need for them to stay
# ~ means everything except this:
df.describe().T
# to check if there is something out of the ordinary
# after adjustments, it must be run to control again
df = df[(df['Quantity'] > 0)]
# negative values for quantity found in describe, they should be eliminated
# since there can not be negative quantity of a product
df.dropna(inplace=True)
# to eliminate missing customer ids info and missing description info
df["TotalPrice"] = df["Quantity"] * df["Price"]
# same products is ordered multiple times in the same invoice
# TotalPrice is needed for a product type
cltv_c = df.groupby('Customer ID').agg({'Invoice': lambda x: x.nunique(),
'Quantity': lambda x: x.sum(),
'TotalPrice': lambda x: x.sum()})
# analysis is customer based so dataset is groupby the id,
# necessary columns (metrics for cltv )are calculated
# and saved as a new dataframe cltv_c
cltv_c.columns = ['total_transaction', 'total_unit', 'total_price']
# columns are renamed
cltv_c.head()
Before, there were multiple data for a customer. After the adjustments, data is grouped for each customer, needed parameters are extracted. Now, we have 4339 unique customers to focus on with summary of their behaviors.
2.Average Order Value (average_order_value = total_price / total_transaction)
cltv_c["average_order_value"] = cltv_c["total_price"] / cltv_c["total_transaction"]
# necessary columns exist, easily calculated and added as a new column
3.Purchase Frequency (total_transaction / total_number_of_customers)
cltv_c.shape[0]
# shape function returns a tuple with row and column number of a dataframe
# by indexing we obtained the raw number of the cltv_c dataframe
# which is the exactly the total number of customers
cltv_c["purchase_frequency"] = cltv_c["total_transaction"] / cltv_c.shape[0]
# purchase frequency calculated and added
4.Churn Rate & Repeat Rate (Number of customers who made a purchase more than once / Total Number of Customers)
repeat_rate = cltv_c[cltv_c["total_transaction"] > 1].shape[0] / cltv_c.shape[0]
# same idea, just a condition added in cltv_c and again row number is taken
# as number of customers
# total_transaction is the number of unique invoices
# retention rate, customers we consider ourselves.
churn_rate = 1 - repeat_rate
# by definiton, if a customer did not order more than once
# it is accepted that it is a customer churn
5.Profit Margin (total_price x 0.10)
cltv_c['profit_margin'] = cltv_c['total_price'] * 0.10
# 0.10 is a constant that is needed to be calculated by the company
# profit_margin is added, it is profit after all the expenses
6.Customer Value (customer_value = average_order_value x purchase_frequency)
cltv_c['customer_value'] = cltv_c['average_order_value'] * cltv_c["purchase_frequency"]
# these customer values are needed to be corrected by churn rate and profit margin
7.Customer Lifetime Value (CLTV = (customer_value / churn_rate) x profit_margin)
cltv_c["cltv"] = (cltv_c["customer_value"] / churn_rate) * cltv_c["profit_margin"]
# while churn_rate is a constant, profit margin is customer-based
cltv_c.sort_values(by="cltv", ascending=False).head()
# customers are sorted according to their final lifetime values
Notice that, Customer ID corresponds to index rather than a column. For instance; Customer 14646 made 74 orders, purchased 197491 unit quantity product, totally paid 280206 unit money in 2010–2011.
8.Creating segments
Until now, CLTV is calculated for each customer. Main purpose is to segment these customers with the help of CLTVs. Unlike RFM, there is not predetermined lines to divide customers’ scores.
There are 4339 customers, trying to manage an identical relationship with each customer is unrealistic. Should we even focus on everyone? Therefore, there is a need for prioritizing customers. CLTV is a parameter including above-mentioned metrics to lead this prioritization.
cltv_c.sort_values(by="cltv", ascending=False).tail()
# to observe the worst cases
cltv_c["segment"] = pd.qcut(cltv_c["cltv"], 4, labels=["D", "C", "B", "A"])
# it is decided to create 4 groups namely A, B, C, D
# new column segment is added
# qcut function from pandas groups a column given the number of groups and labels
# however, by default, qcut sorts the values ascending
# since the bigger the CLTV, the better; labels have to be written backwards
cltv_c.sort_values(by="cltv", ascending=False).head()
# expected to have A segment on the head
cltv_c.sort_values(by="cltv", ascending=False).tail()
# expected to have D segment on the tail
cltv_c.groupby("segment").agg({"count", "mean", "sum"})
# to analyze the segments on the basis of all variables
# for example, A segment customers made 10.4 transaction on average
cltv_c.to_csv("cltc_c.csv")
# csv file is created
By this analyse, what can be concluded as a company strategy?
- A segment may be focused on mainly.
- Countries of customers with different segments can be examined.
- C and D segment can be merged and be called as C segment.
9.Functionalization
Whole process is defined in a function so that we can perform all the above on the year 2009–2010 too within seconds. Profit is defined by the company, so an option is added to control the profit value. Full version of the code on GitHub!
def create_cltv_c(dataframe, profit=0.10):
***
return cltv_c
P.S. This article and example are a great start for customer lifetime value subject. Now that we have an understanding of it, following article will be on prediction of CLTV with a real-world problem. We are making progress on CRM (Customer Relationship Management) Analytics:)
Reference:
Miuul Lecture Notes