Analyzing Customer Lifetime Value (CLV) and Churn with Python.

10 min readFeb 1, 2023

Photo by Pixabay: https://www.pexels.com/pt-br/foto/laptop-computer-gray-and-black-265087/

Helo! In this article, we’ll talk about CLV, Customer Lifetime Values, and Churn, as well as use the python lifetimes library to make some predictions related to these measures. The CLV is an extremely important sales metric, as from it we can predict the revenue that a company will be able to obtain from a customer in a given period of time. However, for this, the buying and selling relationship between the customer and the company must remain active. In contractual relationships, when a customer acquires a service from a company, for example, it is easy to know the Churn (abandonment, in this context), since we only look at those customers who canceled the service. However, in non-contractual relationships, such as those in which customers spend a period buying products from an establishment and, after that, they stay for a long time without returning (there is no contract that obliges them to continue buying), the company does not know if the customer will buy again , making it difficult to know if the customer abandoned the company or if it is just a seasonal characteristic of his purchases, since he can present a continuous purchase pattern, which can occur at any time, or discrete, as in the case of monthly purchases.

OK! But how can we then calculate the customer’s CLV in cases where there is no signed purchase and sale agreement, and we don’t even know if he will shop again? Making predictions using prediction algorithms.

There are a few ways we can forecast CLV and Churn, but here we’ll use the lifetimes package.

Import of Packages

Preliminarily, we need to import the packages that we will use in the action. Below are the ones we will need:

import pandas as pd
import numpy as np
from lifetimes import BetaGeoFitter, GammaGammaFitter
from lifetimes.utils import summary_data_from_transaction_data
from lifetimes.plotting import plot_period_transactions
from lifetimes.datasets import load_dataset
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.metrics import mean_squared_error, confusion_matrix, mean_absolute_error, f1_score, r2_score
from math import sqrt
from datetime import date, timedelta
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

Data Set Loading and Data Preparation

The chosen dataset was CDNow from CDNow, Inc., a company that sold CDs and music-related products over the web, which can be found in the lifetimes package dataset. It has 6,919 records from 2,357 consumers dated January 1, 1997 to June 30, 1998, totaling 78 weeks.

df = load_dataset(filename= 'CDNOW_sample.txt' , 
    #header=None,
     delim_whitespace= True , 
    names=[ 'client_id' , 'client_index' , 'date' , 'quantity' , 'price' ], 
    converters={ 'data ' : lambda x: pd.to_datetime(x, format = "%Y-%m-%d" )} #Conversion of date field to date type
 )

#Creating a Total Value variable
 df[ "value" ] = df[ "price" ]*df[ "quantity" ]

# We select the values that we will use in the data set. 
df = df[[ 'client_id' , 'date' , 'value' ]]

# Descriptive statistics
 df.describe()

#Analysis of the number of null values per field in the dataset
 df.isnull(). sum ()

BG/NBD for Predicting Number of Transactions and Churn

BG/NBD is a probability distribution model that proposes to evaluate the relationship between values of a given variable under study and its probability of future occurrence. With it, we will predict the number of purchases that will be made by a given consumer, analyzing the frequency and recentness of purchases made in the past, as well as the probability that the customer will become or already be a Churn (Recalling that the Churn prediction could be done with Pareto/NBD, but due to computational performance issues and a bug that sometimes appears in this algorithm (NotImplementedError: Cannot apply ufunc <ufunc ‘hyp2f1’> to mixed DataFrame and Series inputs.), the prediction of Churn was made with BG/NBD itself). However, for this, we will need to create the RFM matrix, which will give us the necessary metrics to apply in the algorithm.

Construction of the RFM Matrix

rfm = summary_data_from_transaction_data(df, 'client_id' , 'data' , 'value' ) 
rfm = rfm[~rfm.isna()] #If there is any null value, they would be removed.

In short, the RFM matrix is the acronym for recency, frequency and monetary value, also including the tenure (T) in the case of using the summary_data_from_transaction_data method of the lifetimes package to build it. Recency is the time between the first and last transaction, showing us the time the customer was or has been active (there are some who define recency as the time between the last purchase and the current day, but as the BG/NBD algorithm also uses the tenure (T) as a parameter, and the difference between the tenure and the recency is precisely the days since the last purchase of a given customer until the last purchase of the dataset, I assume that is the reason for the difference of the concept, in addition to which we can gain one more variable in the RFM matrix to enrich our predictive analyses); Tenure is the difference between the first transaction of a given customer and the last transaction made in the data set (normally, the current day, if purchases are made every day regardless of who makes them) and serves to evaluate the time in which a consumer has been buying from a company; the frequency refers to the amount of transactions made per consumer per period, for example, if the frequency is daily, then 1 is applied for the day’s transactions, regardless of how many have been made on that day. That said, 10 transactions made by a consumer in two days will equal two in the frequency field; and the monetary value is the average value of purchases per consumer per period.

Note: In the summary_data_from_transaction_data method, the first transaction by default is not included while calculating the frequency and monetary value. To include it, just set the parameter include_first_transaction to True. To use the data with some fitters from the lifetimes package, this parameter must be set to False, just as we need it. Another point: note that it would be impracticable, in our example, to consider recency as the difference between the last purchase of a given customer and today’s date, as we would not be able to assess the probability that the customer was a Churn at the time the company CDNow was active, on the contrary, we would have to consider everyone as Churn due to the time lag of transactions until today.

Creating the Model

With the RFM matrix created, we can continue the code to predict the number of purchases that will be made by consumers and the probability that they will become Churn:

For the creation of our model, it makes sense to remove frequencies and monetary values equal to zero. Because frequencies zeroed indicate that the customer only made one transaction (there was no repetition), and the monetary value equal to zero can be about insertion errors in the database.

#Removal of frequencies and reset monetary values
 rfm = rfm[(rfm[ 'frequency' ] > 0 ) & (rfm[ 'monetary_value' ] > 0 )]

#Creation and training of the BG\NBD model. 
bgf = BetaGeoFitter(penalizer_coef= 0.0 ) 
bgf.fit(rfm[ 'frequency' ], rfm[ 'recency' ], rfm[ 'T' ])

#Forecast of the amount of purchases for the next 90 days. 
t = 90 #Forecast  for the next 90 days
 rfm[ 'predicao_' + str (t)+ '_dias' ] = bgf.conditional_expected_number_of_purchases_up_to_time(t,rfm[ 'frequency' ], rfm[ 'recency' ],rfm[ 'T' ])

#Criação das colunas Retenção e Churn
rfm['Retention'] = bgf.conditional_probability_alive(rfm['frequency'], rfm['recency'], rfm['T'])
rfm['Churn'] = 1-bgf.conditional_probability_alive(rfm['frequency'], rfm['recency'], rfm['T'])

An important observation: the Churn value is given in a range that goes from 0 to 1, and can be transformed into a percentage. The closer to 1, the more likely a Churn will occur.

Finally, using the Gamma-Gamma model, we can predict the customer’s CLV. Remembering that the Gamma Gamma model requires that there is no correlation between frequency and monetary value.

#Analysis of correlation between frequency and monetary value fields, which needs to be low
 rfm[[ 'frequency' , 'monetary_value' ]].corr()

#Creation and training of the GAMMA GAMMA model. 
ggf = GammaGammaFitter(penalizer_coef = 0.0 ) 
ggf.fit(rfm[ 'frequency' ], rfm[ 'monetary_value' ])

preds_total = ggf.customer_lifetime_value( 
    bgf, 
    rfm[ 'frequency' ], 
    rfm[ 'recency' ], 
    rfm[ 'T' ], 
    rfm[ 'monetary_value' ], 
    time = 3 , #CLV forecast for the last 3 months.
     freq = 'D' , 
    discount_rate = 0.01  #discount of the company's cash flow. This information can be obtained from the company you will provide services to.
 ) 
rfm[ 'clv' ] = preds_total

rfm[ 'segmentacao_clv' ] = pd.qcut(rfm[ 'clv' ], q= 5 , labels=[ ' Low' , 'Low' , 'Medium' , 'High' , 'Top' ]) #Customer segmentation into 5 
#groups of equal amount of values

Model Validation

Because the data volume is small, it will not be necessary to partition the dataset for comparison between calibrated and holdout data. This could be done using the calibration_and_holdout_data method, but that’s a topic for another post.

Coming back to model validation, we can compare the actual data with the predicted data. But first, we will need to create the field that will contain the prediction of the entire database as follows:

# First, we define the prediction of the model. 
# The actual number of transactions in the observed period, needs to be added by 1.
 actual_values = rfm[ "frequency" ] #+ 1

# Dataset
 time time = (df.data. max ()-df.data. min ()) .days # Predict the number of transactions from the observed time of the dataset. 
predicted_values = bgf.predict(t=time, frequency=rfm[ 'frequency' ], recency=rfm[ 'recency' ], T=rfm[ 'T' ]) actual_values = actual_values.fillna( 0 ) 
predicted_values = predicted_values.fillna ( 0 )RMSE = mean_squared_error(y_true = true_values, y_pred = predicted_values, squared = False ) print (RMSE)

Below is a function, seen in other posts including, which serves the purpose of evaluating the quality of the model.

def  validate_clv ( current, predicted, bins ): 
    print ( f"Mean absolute error: {mean_absolute_error(current, predicted)} " ) 
    #Evaluate numeric
     plt.figure(figsize=( 10 , 7 )) 
    plt.scatter(predicted, current ) 
    plt.xlabel( 'Predicted' ) 
    plt.ylabel( 'Current' ) 
    plt.title( 'Predicted vs Current' ) 
    plt.show() 
    
    #Evaluate Bins
     est = KBinsDiscretizer(n_bins=bins, encode= 'ordinal' , strategy= 'kmeans' )
    est.fit(np.array(current).reshape(- 1 , 1 )) 
    bin_actual = est.transform(np.array(current).reshape(- 1 , 1 )).ravel() 
    bin_predicted = est.transform( np.array(predicted).reshape(- 1 , 1 )).ravel() 
    
    cm = confusion_matrix(current_bin, predicted_bin, normalize= 'true' ) 
    df_cm = pd.DataFrame(cm, index = range ( 1 , bins+ 1 ) , 
                      columns = range ( 1 , bins+ 1 )) 
    plt.figure(figsize = ( 20 , 10)) 
    sns.heatmap(df_cm, annot= True )

    # View
     b, t = plt.ylim() # find out the values for the bottom and top
     b += 0.5  # Add 0.5 to the bottom
     t -= 0.5  # Subtract 0.5 from the top
     plt.ylim(b, t) # update the ylim(bottom, top) values
     plt.show() 
    print ( f'F1 score: {f1_score(current_bin, predicted_bin, average= "macro" )} ' ) 
    print ( 'Sample in each bin: \n' ) 
    print (pd.Series(current_bin).sort_values().value_counts())validate_clv(actual_values, predicted_values, bins= 7 )

Mean absolute error: 1.5997010769132896

F1 score:  0.3258443374477299 
Sample in each bin:

0.0     856 
1.0     195 
2.0      61 
3.0      16 
4.0       4 
6.0       4 
5.0       3 
dtype:  int64

Here, we have another way of comparing the actual and predicted frequencies to check how harmonious are the predicted data and the model data. Note that we can manipulate plot parameters such as the title, X and Y labels and also the legend, which is not intuitively obtained.

plot_period_transactions(bgf, max_frequency= 10 , title= 'Repeating Transaction Frequency' , 
    xlabel= 'Number of transactions in calibration period' , 
    ylabel= 'Customers' , label=([ "Current" , "Predicted" ]))

Also, I can predict the average value of the model to compare it with the actual average value of the data set, since the multiplication of this parameter with the prediction of the amount of purchases that will be made forms the CLV. Thus, I can better measure the performance of the model.

predicao_valor_monetario = ggf.conditional_expected_average_profit(
        rfm['frequency'],
        rfm['monetary_value']
)

The closer the values are in the graph, the better the model predicts.

fig, ax = plt.subplots(figsize=(10,5))
sns.distplot(predicao_valor_monetario,ax=ax)
sns.distplot(rfm['monetary_value'],ax=ax)

Created Model Application Ideas

After the model is created, tested and validated, we can use the information generated by it in decision making. Marketing campaigns, for example, can be created according to the value or segmentation of the CLV, for example, where the CLV is higher or the CLV segment is higher (Top), more purchase incentives can be contained in the messages of marketing; or customers who are not Churn yet, but who are highly likely to be, can be stimulated, by sending promotional campaigns, to continue shopping. There are several strategies that can be applied. And how to test the efficiency in practice after applying the model-driven ideas? Well, that is for a next post.

Final Considerations

There’s more we could include in this review, but for now, that’s it. I hope you enjoyed. Do not forget to comment and share so that more people can apply knowledge like this in their analyses. The complete code is on my github, in this link clv python code . Hug!