Predicting Customer Lifetime Value in E-commerce

Published in

Blibli.com Tech Blog

5 min readOct 6, 2020

The Pareto principle states that, for many events, roughly 80% of the effects come from 20% of the causes. For e-commerce, 80% of our sales come from 20% of our customers. What if we could identify which of our customers make up that 20%, not just historically, but in the future as well? Predicting customer lifetime value (CLV) is a way to identify those customers.

The goals of this article are :

Explain the concepts of CLV modeling.
List out what features and approaches can be used for CLV modeling.
Compare three approaches to do CLV modeling.

What is customer lifetime value?

The customer lifetime value (CLV) is an estimate of all the future profits from a relationship with a given customer. If we can predict clv for each customer, then we know which customers that we should prioritize.

When we’re predicting future lifetime value, there are two distinct problems that require different data and modeling strategies:

Predict the future value for existing customers who have a known transaction history.
Predict the future value for new customers who just made their first purchase.

This series is focused on the first problem. There are three different approaches to this problem: naive method, probabilistic models, and machine learning (ML) models.

By predicting Customer Lifetime Value, we can prioritize our next actions by answering these questions:

(source: https://medium.com/swlh/5-simple-ways-to-calculate-customer-lifetime-value-5f49b1a12723)

Data preparation

The dataset feature that we used is listed in the following table.

No matter which approach we use, we must perform a set of data cleaning and pre-processing steps that are common to all models. The following operations are required to get a set of workable fields and records:

Group the orders by day instead of using Order Id, because the minimum time unit used by the probabilistic models in this solution is a day.
Keep only the fields that are useful for probabilistic models.
Keep only records that count as purchases or returns.
Keep only records with a customer ID.
Keep only customers who bought something in the past 1 year.
Keep only customers who bought at least twice in the period that’s being used to create features.

Training and testing Schema

We used one-year historical data to predict next year’s. For this experiment, we used order data from 2018 to predict customers’ CLV in 2019.

(Source: https://cloud.google.com/solutions/machine-learning/clv-prediction-with-offline-training-intro)

For training purposes, we split our customers into three-part. The training part consists of 70% of customers, the validation part consists of 10% of customers, and the testing part consists of 20% of customers.

Naive Method

The first method naively assumes that the rate of purchases established by a customer during the training interval stays constant through the target interval. So if a customer bought 6 times over 40 days, the assumption is that they would buy 9 times over 60 days (60/40 * 6 = 9). Multiplying the count multiplier, the order count, and the average basket value for each customer gives a naively predicted target value for that customer. The feature that used by this method is:

https://cloud.google.com/solutions/machine-learning/clv-prediction-with-offline-training-train

Result = AOV * Order Count * Count Multiplier

Statistic Model

For the statistic method, we would use a Python library called Lifetimes that supports various models including the beta-geometric BG/NBD models. This model is well known in related research, so don’t worry about accuracy.

A BG/NBG model is defined using the following parameters:

Machine Learning Model

For the Machine Learning method, we would use a Python model called LightGBM. This model is well known because can process large data faster than XGBtree without using GPU.

A LightGBM model is defined using the following parameters:

Results of comparing models with Blibli data

The following table shows the MAE values for each model, as trained on the sample dataset. All models are trained on Blibli data. MAE values vary slightly between runs, due to random parameter initialization. The LightGBMmodel makes use of additional features such as average basket value and count of returns.

The results show that on this dataset, the LightGBM model outperforms the probabilistic and naive models when predicting the monetary value. However, All models were trained by using the same original data (including customer ID, order date, and order value).

The intent here was to perform a comparison on the same input features between the models. One advantage of using the ML model is that we might improve our results by adding more features than the ones used in this example. With LightGBM, we could take advantage of data from sources such as customer age, clickstream events, user profiles, or train and predict data.