In travel retailing, some customers are more equal than others (part 3)

--

In this article, we share how we predict Customer Lifetime Value (CLV) at OpenJaw and explain why this metric is so important for travel retailing. This is the third and final part of a three part article on this topic.

By John Carney, Yuxiao Wang, Auren Ferguson, Beibei Flynn and Gavin Kiernan.

Introduction

In part 2 of this article, we described in detail how the Pareto/NBD model can be used for CLV prediction in travel retailing. In this, the third and final part of the article, we describe how Random Forest machine learning can also be used successfully for this purpose. We also compare the predictive performance of both methods and provide some guidance on which method to use in different circumstances.

Using Random Forest regression to predict CLV

As described in detail in part 2 of this article, the first step in predicting CLV is to predict what we call Customer Lifetime Spend (CLS). We define CLS as the product of Predicted Average Purchase Value (PAPV) and Predicted Number of Purchases (PNP):

CLS = PAPV x PNP

Once margin for a specific customer segment and product is available, it is applied to calculate CLV:

CLV = CLS x Margin

Our approach in this article is to build two separate Random Forest regression models to estimate PAPV and PNP. An alternative approach is to build a single Random Forest model that predicts CLS directly. However, if we did this it would be more difficult to uncover where the performance gain or loss is in the Random Forest models versus the traditional models, so we decided for the purposes of the experiments in this article to model PAPV and PNP separately.

Training set

The training set that we use for the experiments described below is constructed from 270,000 flight transactions or Passenger Name Records (PNR’s). We process these transactions with our Identity Resolution algorithm, described in a separate article. This creates a single customer view by assigning each transaction a unique customer identifier. Transactions are then rolled up to create features for each customer.

Training process

To train the Random Forest model, we start by picking a prediction window. This is the length of time in the future we want the model to predict into. For the experiments summarised below, we chose T = 1 year.

As illustrated in Figure 1 below, the full data set spanned 3 years. To construct a training set for model training, we take data from the most recent period (shown in red), calculate each customer’s number of purchases and average spend per booking as target values. We then join these to the corresponding input features (shown in blue). For customers that didn’t purchase in the ‘target’ period, their target values are set to zero.

Note that, for the PAPV model, only customers that actually purchased in the ‘Target’ period can be used to train the model, because PAPV is unobserved for people without purchases.

Once both models are trained (PAPV and PNP), we can apply the model on the most recent 2 years of data (shown in green), to estimate PCLS for the future (shown in yellow).

Figure 1: Model training and predicting time-scales.

Feature selection

Our training set contained a relatively large number of features (62), so we utilised a feature selection process to identify the most predictive features and separate them from features that are adding little predictive value, or are just noise.

The first step in this process is to drop a feature if more than 20% of the field values are null, or the variance of the field is less than 10% of the mean value. The second step is to apply Random Forest ‘feature importance’ on all of the remaining variables. This method calculates how much each feature decreases the target variance in a tree, weighted by the probability of a customer / training vector reaching that tree. It then accumulates this over all trees in the Random Forest. This is known as the Mean Decrease in Impurity (MDI) method (2).

Tables 1 and 2 below show the important features for the PNP and PAPV models. The importance scores are normalised so that they sum to 1 and we drop features with importance scores less than or equal to 0.001.

As expected, the Recency, Frequency and Monetary (RFM) related features are ranked highly, as it is highly probable that the Random Forest model captures the same underlying process of customer purchasing behaviour that the traditional RFM based models like Pareto/NBD capture. Other features, such as ‘Number of bookings with 1 adult passenger only’ or ‘Number of bookings with same day return’ may look surprising at first glance, but we know at OpenJaw that these features correspond to business travel and can provide additional information regarding the Frequency of a customer.

Further analysis is required to form definitive conclusions on this, but it does seem that the Random Forest model is doing something very similar to traditional RFM based models for predicting CLV, but enriching itself with additional features that contain more information on RFM (versus improving performance with completely new, exogenous features).

Experiments

To benchmark the prediction performance of our Random Forest method, we compared it to two established RFM based methods; BG/NBD (4) and the Pareto/NBD model described in part 2 of this article. We also wanted to measure the value of including the additional features listed in tables 1 and 2 above, so we compared the prediction performance of four models in total:

  1. Random Forest regression model with the features listed in tables 1 and 2 above;
  2. Random Forest regression model with just three features; Recency, Frequency and Spend;
  3. BG/NBD model. This works with just Recency, Frequency and Spend;
  4. Pareto/NBD model. This works with just Recency, Frequency and Spend.

The prediction performance of each model on the test set, expressed in (normalised) Root Mean Squared Error (RMSE), is summarised in Table 3 below. The best performing models are highlighted in green text.

The Pareto/NBD model wins overall on CLS prediction as it outperforms other models on PNP prediction significantly, while the Random Forest model with additional features appears to be better at PAPV predicting.

Note that both the BG/NBD and Pareto/NBD have the same RMSE for PAPV prediction. This is because they use the same process for prediction (based on the Gamma-Gamma distribution assumption).

So which model is best? The CLS prediction performance would suggest that the Pareto/NBD model is the best performer overall, followed by BG/NBD and then the Random Forest models. However, it is worthwhile taking a deeper look into how this performance is distributed for different types of customers. Figure 2 below illustrates how CLS prediction performance varies across the test set for customers with increasing frequency of purchase.

Figure 2. Prediction error (RMSE%) for customers with increasing Frequency in the test set.

As we expected, the prediction performance of the Random Forest models improve notably when Frequency is larger than 2. This is because machine learning methods like Random Forest are generally non-parametric i.e. they have no distributional (parametric) assumptions to fall back on when the number of observations are small for a particular type of customer. In contrast, the traditional RFM models are parametric, so can continue to perform reasonably well, even when the number of observations are small.

Conclusion

The purpose of this three part article is to share with the data science and marketing community in the travel industry how we predict CLV at OpenJaw and also to explain why this metric is so important for travel retailing.

The key message of part 1 of the article is that CLV provides a much more precise measure of customer value relative to the informal methods used today in travel e.g. points accumulated in a loyalty program or the fare class for a particular trip. Another key message it that the field of CLV is somewhat of a ‘minefield’, with many variants and a myriad of vague publications online that confuse one approach over another. So caution is advised when embarking on a journey towards CLV adoption.

In part 2 of the article we focused on the most established method used to predict CLV in a non-contractual setting such as travel; the Pareto/NBD method. We explained the mechanics of how it works and also provided some examples with real travel data to illustrate how the process of customer purchasing behaviour is captured by the model.

And finally, in part 3 of the article, we described how the popular machine learning method Random Forest can also be used to predict CLV. To evaluate its performance, we compared it to two traditional methods, Pareto/NBD and BG/NBD. The prediction performance of each model was similar, but for higher frequency customers, the Random Forest model with additional features beyond RFM, performed consistently better. This result provides some guidance on which model to use in different circumstances and also illustrates the potential of machine learning for CLV prediction in travel.

References

(1) Chamberlain, B. P., Cardoso, A., Liu, C. H., Pagliari, R., & Deisenroth, M. P. (2017, August). Customer lifetime value prediction using embeddings. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1753–1762). ACM.

(2) Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, №10). New York: Springer series in statistics.

(3) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), 2825–2830.

(4) Fader, P., Hardie, B., Lee, K. (2005). Counting your customers the easy way: An alternative to the Pareto / NBD model. Marketing Science, Vol. 24, №2, Spring 2005, pp. 275–284.

--

--

The OpenJaw Data Science Team
The OpenJaw Data Science Blog

The data science team at OpenJaw share their approach, opinions and methodologies for data science in travel.