In travel retailing, some customers are more equal than others (part 2)

Published in

The OpenJaw Data Science Blog

7 min readNov 2, 2019

In this article, we share how we predict Customer Lifetime Value (CLV) at OpenJaw and explain why this metric is so important for travel retailing. This is the second part of a three part article on this topic.

By John Carney, Yuxiao Wang, Auren Ferguson, Beibei Flynn and Gavin Kiernan.

Introduction

In the first part of this article, we provided an introduction to the underlying concept of CLV and explained the practical challenges that arise when applying CLV to the travel domain. In this part of the article we describe in detail one of the two methods we use at OpenJaw for CLV prediction; the ‘Pareto/Negative Binomial Distribution (NBD)’ model.

Suitable methods for predicting CLV in travel

There are two primary quantitative methods suitable for predicting CLV in a non-contractual setting such as travel today.

The first method uses non-linear stochastic models of buyer behaviour to link the popular RFM (Recency, Frequency, Monetary) paradigm with CLV. There are two key variants of this approach — the Beta-Geometric/Negative Binomial Distribution (BG/NBD) model first proposed in (2) and the Pareto/NBD model first proposed in (1). Generally speaking, the Pareto/NBD model is more accurate, but has a higher computational overhead than BG/NBD and is more complex to implement.

The second method uses machine learning methods such as Random Forest to predict CLV. An early example of such an approach applied in a non-contractual setting such as travel is described in (3). This approach normally aims to create models that combine the usual RFM features with exogenous features such as customer demographic information to improve predictive performance.

Neither of these methods can claim to be the ‘best’ method for CLV prediction in travel in all circumstances; this really depends on how much data you have, its quality, as well as how much computing power you have available to parametrise / train the models.

Generally, at OpenJaw, our approach is to use Pareto/NBD when the history of RFM data is limited — parametric methods such as Pareto/NBD are less likely to overfit their training set, so tend to generalise better in this scenario. On the other hand, if we have access to a large history of RFM data as well as exogenous information such as demographic profiles for every customer, then we normally use the machine learning method with Random Forest.

Before deploying in a real-world scenario, we always confirm which method is best for a particular customer data-set using cross-validation or a similar model validation technique. We describe the details of this experimental process in part 3 of this article.

The Pareto/NBD model: How does it work?

Commonly known as a ‘Buy Till You Die’ (BTYD) model, at the heart of this method is a framework that models the flow of customer transactions over time in a non-contractual setting:

BTYD models jointly model two processes; a repeat purchase process that explains how frequently customers make purchases while they are still ‘alive’ and a dropout process that models how likely it is that a customer is about to churn in any given period.

The variant of Pareto/NBD that we use at OpenJaw is the version described in (2). This requires four pieces of information that are pertinent to customer purchasing behaviour:

Recency: The duration of time in days between a customer’s first booking and most recent booking;
Frequency: The total number of repeat bookings the customer has made. This is equivalent to the total number of booking minus one;
Monetary: The total amount spent divided by the number of bookings made by a customer. Also called Average Purchase Value;
Time: The duration between a customer’s first booking and the end of the period of consideration.

The model works by making assumptions regarding following:

The number of purchases by any individual customer is Poisson distributed, with latent parameter 𝜆;
True customer lifetime is not observed but follows an Exponential distribution with latent parameter 𝜇;
Purchase value is Gamma distributed with scale parameter 𝜈;
The number of purchases and purchase value are mutually independent and customers are independent of one another.

Statistical distributions assumed for customer purchasing behaviour in the Pareto/NBD Gamma-Gamma model for predicting CLV.

Building upon the above assumptions, the model makes the following ‘prior’ assumptions:

True 𝜆 is unobserved and Gamma distributed;
True 𝜇 is unobserved and follows a Gamma distribution;
𝜆 and 𝜇 vary independently across customers;
The scale parameter of purchase value, 𝜈, is itself Gamma distributed.

After working through the equations, which are detailed in (4), the following ‘posterior’ distributions apply:

Customer lifetime follows a Pareto distribution;
The number of purchases by a customer follows a Negative Binomial Distribution (NBD);
Purchase value is ‘Gamma-Gamma’ distributed.

This leads to the ‘Pareto/NBD Gamma-Gamma’ model for predicting CLV.

Using the Pareto/NBD Gamma-Gamma model to predict CLV

At OpenJaw, the first step in predicting CLV is to predict what we call Customer Lifetime Spend (CLS). Think of this as CLV, but without margin factored into the calculation.

The reason we do this is because, as described in part 1 of this article, in travel retailing, profit margin is highly variable at the customer level, even for similar products. To handle this complexity in a pragmatic way, our software enables end-users to apply margin to a CLS calculation after a customer segment and product has been selected.

We define the Predicted CLS (PCLS) as the product of Predicted Average Purchase Value (PAPV) and Predicted Number of Purchases (PNP):

PCLS = PAPV x PNP

Once margin for a specific customer segment and product is available, it is applied to calculate Predicted CLV (PCLV):

PCLV = PAPV x PNP x Margin

Illustrative example

To illustrate how the Pareto/NBD Gamma-Gamma model works with real customer data, we will now describe how the model handles three typical purchase journeys for a low-cost airline and customer of OpenJaw.

Note that our goal here is to give readers an intuitive ‘feel’ for how our model works. We do this by showing how the ‘probability of being alive’ (P-alive) evolves over time for three typical customer purchase journeys; frequent customer, repeat customer and churned customer. Note that P-alive is essentially the probability a customer remains active, so is a key component of the PNP variable described above.

The underlying RFM data was extracted from multiple sources of Passenger Name Record (PNR) data. This represents the vast majority of transactional history (over 90%) for that airline across direct and indirect sales channels spanning 3 years.

Extracting the RFM data from the PNR data was non-trivial. This was because the personal data used to identify individual customers was inconsistently described in the PNR’s — this is a normal feature for this type of data. For example, some PNR’s had email addresses, others just had national ID’s, most had phone numbers but with prefixes that were inconsistently applied and so on.

To tackle this problem, OpenJaw’s identity resolution algorithm was used to create a single customer view which produces a unique identifier for every customer, together with a validated set of transactions. This was then used to accurately calculate Recency, Frequency and Monetary (spend) for every customer. Full details of this process and algorithm are published in another article in the OpenJaw Data Science Blog.

Scenario 1: Frequent customer

As illustrated in the graph below, a frequent customer maintains a high probability of repeat purchases as long as their (frequent) pattern of purchase is maintained. As soon as this pattern is disrupted, the P_alive probability decays rapidly because the customer is starting to fall outside a pattern of frequency as captured by the model.

The evolution of ‘probability of alive’ for a frequent airline customer.

Scenario 2: Repeat customer

In contrast to a frequent customer, a repeat customer that purchases infrequently will have a lower decay rate. This is illustrated below; the customer has only 2 purchases which are 8 months apart, so the decay rate for P_alive is lower. This is intuitive; you would expect to see that a consistently frequent customer will purchase again within a similar timeframe, while for other customers who rarely purchase, you won’t have the same expectation.

The evolution of ‘probability of alive’ for a repeat airline customer.

Scenario 3: Churned customer

In the scenario illustrated below, the customer is frequent, but then stops purchasing. After about one year, the P_alive probability has reduced to less than 50%, so the customer has effectively churned. However, after 2 years the customer purchases again, so the model is reactivated and starts again.

The evolution of ‘probability of alive’ for a churned airline customer.

In part 3 of this article…

In the next part of this article we will describe the Random Forest machine learning method we use to predict CLV and provide some empirical analysis that compares its predictive performance relative to the Pareto/NBD Gamma-Gamma model described above.

References

(1) Schmittlein, David C. and Peterson, Robert A. (1994). “Customer Base Analysis: An Industrial Purchase Process Application”, Marketing Science, 13 (Winter), 41–67.

(2) Fadar, Peter S.; Hardie, Bruce G.S.; Lee, Ka Lok (2005). “Counting Your Customers the Easy Way: An Alternative to the Pareto / NBD Model”, Marketing Science, 24 (Spring), 275–284.

(3) Ali Vanderveld, Addhyan Pandey, Angela Han, and Rajesh Parekh (2016). “An Engagement-Based Customer Lifetime Value System for E-commerce”, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

(4) Peter S. Fader, Bruce G. S. Hardie, and Ka Lok Lee. 2005. “RFM and CLV: Using Iso-Value Curves for Customer Base Analysis”, Journal of Marketing Research XLII, November (2005), 415–430.