Customer Lifetime Value with Python’s Lifetimes Library

Mine Gazioğlu
4 min readApr 11, 2023

--

Have you ever wondered how much a customer is worth for your business? Or how much revenue you can expect to obtain from a particular customer in the future? These questions can be answered by calculating customer lifetime value (CLV).

In this article, we’ll explore how to use Python’s Lifetimes library to predict future revenue and amount of transactions for a customer. We’ll walk through an example using the Online Retail dataset from the UCI Machine Learning Repository. We’ll start by exploring the dataset and cleaning the data, then we’ll use Lifetimes package to fit statistical models that estimate customer lifetime value.

Let’s get started!

Download the dataset using the Kaggle API

The Online Retail dataset from the UCI Machine Learning Repository is a well-known benchmark dataset for customer lifetime value analysis. It contains transactional data from a UK-based online retailer, including the customer ID, product description, quantity, unit price, and date of purchase.

To access the dataset, we’ll use the Kaggle API, which provides a convenient way to download datasets and competition data directly from the Kaggle website. We’ll set our Kaggle credentials as environment variables using the os library. If you want to learn how to download a dataset using Kaggle API, you can visit my other article.

import kaggle as kg
import pandas as pd
import os

os.environ['KAGGLE_USERNAME'] = 'user-name' # Enter user name
os.environ['KAGGLE_KEY'] = 'user-key' # Enter user key

kg.api.authenticate() # authenticate with Kaggle API using kaggle library
kg.api.dataset_download_files(dataset = "vijayuv/onlineretail", path='on.zip', unzip=True)

df = pd.read_csv('on.zip/OnlineRetail.csv', encoding='ISO-8859-1')

df.head() # display first 5 rows of the dataframe

Let’s preprocess the data to ensure that only valid data points are included in the analysis. First, remove any rows with missing customer IDs. Next, remove any rows with negative quantities, as these are likely to be errors in data entry. Finally, calculate the total amount spent per transaction by multiplying the quantity by the unit price, and convert the invoice date to a datetime object using the pd.to_datetime() method. For lifetime analysis we only need TotalAmount, InvoiceDate and CustomerID columns, hence, we filter the dataframe to contain only these columns.

# preprocess data
df = df[df['CustomerID'].notna()] # remove rows with missing customer IDs
df = df[df['Quantity'] > 0] # remove rows with negative quantities
df['TotalAmount'] = df['Quantity'] * df['UnitPrice'] # calculate total amount spent per transaction
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate']) # convert invoice date to datetime

df.head()

Calculating Customer Lifetime Value with the BG/NBD Model

Lifetimes is a powerful Python package for analyzing customer lifetime value (CLV) using a variety of models. It offers a simple and intuitive interface to model customer transactions, frequency, recency, and monetary value. Lifetimes includes popular models such as the Beta-Geometric/Negative Binomial Distribution (BG/NBD) and the Gamma-Gamma submodel, as well as extensions such as the Pareto/NBD and the Non-Contractual Bayesian model. Beta Geometric/Negative Binomial Distribution (BG/NBD), will be used in this article to predict future revenue and transactions for customers in the Online Retail dataset

To get started with Lifetimes, simply install it via pip:

!pip install lifetimes

Alternatively via conda:

!conda install -c conda-forge lifetimes

In this final part, we will be using the lifetimes package to predict customer lifetime value for the next month using the BG/NBD and Gamma-Gamma models. First, we will transform our transaction data into summary data using the summary_data_from_transaction_data function. We will then fit the BG/NBD and Gamma-Gamma models to the summary data. Next, we will use these models to predict the number of future purchases and the customer lifetime value. Finally, we will calculate the estimated monetary value of each customer using the conditional expected average profit.

from lifetimes import GammaGammaFitter
from lifetimes import BetaGeoFitter
from lifetimes.plotting import plot_frequency_recency_matrix
from lifetimes.utils import summary_data_from_transaction_data

# create summary data from transaction data
summary = summary_data_from_transaction_data(df,
customer_id_col = 'CustomerID',
datetime_col = 'InvoiceDate',
monetary_value_col='TotalAmount',
observation_period_end = max(df["InvoiceDate"]))

summary = summary[summary["monetary_value"] > 0]

# fit the BG/NBD model
bgf = BetaGeoFitter(penalizer_coef=0.0)
bgf.fit(summary['frequency'], summary['recency'], summary['T'])

# fit the Gamma-Gamma submodel
ggf = GammaGammaFitter(penalizer_coef=0.0)
ggf.fit(summary['frequency'], summary['monetary_value'])


# predict customer lifetime value
summary['predicted_purchases'] = bgf.predict(30, summary['frequency'], summary['recency'], summary['T'])
summary['predicted_clv'] = ggf.customer_lifetime_value(bgf,
summary['frequency'],
summary['recency'],
summary['T'],
summary['monetary_value'],
time=1, #the lifetime expected for the user in months
freq='D',
discount_rate=0.01)
summary["estimated_monetary_value"] = ggf.conditional_expected_average_profit(
summary['frequency'],
summary['monetary_value']
)

summary.head()

Customer Lifetime Value (CLV) is an important metric for businesses of all sizes, as it helps to estimate the total value a customer will bring to the company over their entire lifetime. In this article, we have explored how to calculate CLV using the BG/NBD model and the Gamma-Gamma submodel from the lifetimes package. We also learned how to create summary data from transaction data using the summary_data_from_transaction_data function.

By using these models and techniques, businesses can gain valuable insights into their customer base, such as identifying high-value customers and predicting future revenue streams. We hope that this article has provided you with a good starting point to begin exploring CLV analysis in your own business.

--

--