Case study XAI — hotel booking cancellation use case

Mateusz Krzyziński
ResponsibleML
Published in
8 min readJun 30, 2021

Overview of model exploration using XAI (eXplainable Artificial Intelligence) methods.

By Mateusz Krzyziński

In this post, we present an example of the use of XAI methods for analyzing and building explanations for a complex model that predicts the risk of hotel booking cancellation.

We won’t go into the technical details of the methods that were used in this analysis, so we recommend BASIC XAI series for more extensive background.

The blog is based on the project realized during the Case Study course at the Warsaw University of Technology. The whole article is available here and it is the joint work with Anna Urbala and Artur Żółkowski.

Why analyze a hotel booking cancellations model?

Machine learning is used in many fields. It has also found its application in the hotel industry. Decision support systems based on complex models, predicting the probability of canceling a reservation, are a key aspect of hotel revenue management.

Unfortunately, nowadays, machine learning models are often treated as black boxes whose decisions are difficult to interpret. The same is the case with the hotel industry. The behavior of used models has not been studied so far and they have not been tested in terms of the causes of their predictions.

This post aims to show a brief example of the use of XAI methods to analyze hotel booking cancellations model, and more generally, fill the gap for the still-missing case studies of the use of these methods in machine learning.

The data — what was the analyzed model fed with?

We trained the random forest model on the data published in the paper Hotel booking demand datasets in the Data in Brief Journal. The data set contains information about bookings bookings at two Portuguese hotels over a two-year period The key thing is that around 37% of the nearly 120,000 bookings made have been canceled! The created model predicts whether the reservation with the given features will be canceled or not.

Local explanations

To explain model output for a particular guest and their booking, you can use instance-level exploration methods, such as Break Down, SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanations) and Ceteris Paribus profiles. Since there are often too many observations in a dataset to look at each one separately, you can look at least at the particularly noteworthy ones, in the case of binary classification:

  • false positive and false negative (especially the worst classified),
  • the most valid,
  • the closest to the decision boundary.

Let’s look at a booking A that has not actually been cancelled but the model was pretty sure to classify it as cancelled (false positive). We use Shapley values to check how feature values affected this most misclassified prediction.

A plot of Shapley values for A — the misclassified observation (false positive) with the highest probability of cancellation. The green and red bars correspond to the contribution of the variable to the prediction. The green ones increase the probability of cancellation, while the red ones decrease it (increase the probability of no cancellation). On the x-axis, there is the model prediction value, while on the y-axis there are variables and their values for the observation (note that label encoding was used).

Information with the biggest contribution to the final prediction are guest’s country of origin (Portugal), a total number of special requests equals zero, and the fact that customer type related to the given booking is Transient-Party. This is an indication that the model may be slightly biased due to the country of origin. It is the property of the customer and not of the booking itself. Thus, depending on the application, it is worth considering whether this response is satisfactory and meets ethical standards. With every value contributes to the misprediction, the only feature with the correct contribution is customer type. Transient-Party guests are mobile guests who seek short (and often urgent) hotel-stays. They are the most popular type of customers, accounting for as much as 75% of bookers and they do not usually cancel their bookings.

Now, for a change, let’s analyze the reservation B that the model has also classified as not cancelled (with a probability equal to 1.0), but this time this label is true.

A plot of Shapley values for B — the observation with sure negative prediction.

Like in the previous example — the country of origin has the largest contribution. In this case, the booker is from France and it affects the prediction in a completely different way. Again, this is a Transient-Party customer and that also strongly influences the prediction of non-cancellation. Also, recording special booking requests affects negatively prediction (more than eg. no previous cancellations). Only one of the top variables affected the prediction positively: it is no required car parking spaces, but this impact was unnoticeable in the final prediction.

But the XAI methods are not only about what and how influences the prediction… In the case of an observation, the classification of which the model was absolutely certain, it is worth considering how would the model’s predictions change if the values of some of the explanatory variables changed. For this purpose, Ceteris Paribus profiles can be used. Let’s analyze the observation already studied (B) and a completely different booking — correctly classified as cancelled with absolute certainty (C).

A plot of Ceteris Paribus profiles for the selected continuous explanatory variables and label encoded country variable for observations B and C with the sure prediction. Dots indicate the values of the variables and the values of the predictions for observations. Blue profiles are for sure negative prediction (B, analyzed above), while green profiles are for sure positive prediction (C).

Looking at the Ceteris Paribus profiles, it is intuitive to see that the prediction for the observation classified as not cancelled is more stable, i.e. less sensitive to changes in the values of explanatory variables.

In the case of observation of cancelled reservations, a change of the arrival date by a few weeks would cause a significant decrease in the certainty of the prediction. It is related to the seasonality of bookings (the decrease occurs at the beginning of July — the holiday period). However, the biggest changes in the prediction for this observation could be due to noting the fact of additional booking requirements (required car parking spaces and a total number of special requests). Changing these values to non-zero would change the prediction completely.

When considering the prediction for an observation classified as not cancelled, we see that the only explanatory variable whose change would have a significant impact on the certainty of the prediction is the number of previously cancelled reservations. A change to any non-zero value would change the prediction, but its certainty would be close to the decision boundary.

Global explanations

You may ask yourself whether the use of XAI methods applies only to single observations? Of course not. Although we can deduce a lot about our model from local explanations, it is the global methods that allow us to look at its behavior more broadly. These global methods are for example Permutational Variable Importance, PDP (Partial Dependence Profile), ALE (Accumulated Local Effects). They can also be used to compare the performance of many different models and to be more confident about the factors influencing a given phenomenon.

So let’s look at what variables were the most important for our model in the perspective of the entire training set.

A plot of Permutational Variable Importance for the trained random forest model. The length of each bar represents the difference between the loss function (1-AUC) for the original data and the data with the permuted values of a particular variable.

As we can see above, the most important variable for the model is the information about the guest’s country of origin.

The method indicated that the next principal variables for predictions are information on lead time and the number of special booking requests. Lead time is the number of days that elapsed between the entering date of the booking into the system and the arrival date. It is a factor that may inform about whether the stay was planned long before or it is spontaneous. Meanwhile, the number of special requests is related to additional interest in the booking, which may indicate that it is an important stay for the client.

It should also be noted that some of the variables are of marginal importance. These are the type of hotel (recall that the data relates to two hotels), the number of adults, the number of previously not canceled bookings (this is probably also related to a small number of observations with a non-zero value of this feature — 3.1% in the entire dataset), the type of room reserved. These are the variables that should be considered to be excluded in order to simplify the model.

Knowing the key variables, it is worth checking the influence of their values on the predictions. For this, we can use the ALE method (very similar to PDP, but averaging the changes in prediction from Ceteris Paribus over the intervals of the defined neighborhood grid). Let’s check how changes in lead time affect predictions. The plots below show the results for our main random forest model and 3 other tree-based models.

A plot of ALE for lead time. The ALE plots work like Ceteris Paribus — they show how the variable affects a prediction but not only for one observation — for the entire training dataset.

Simply put, we have confirmation that the longer the lead time, the greater the chance that the customer will resign. The ALE plots are very similar to Ceteris Paribus but this is a great example that they are not the same. Look at the Ceteris Paribus profile for lead time in Local explanations section. The probability of resignation increases to about 170 days, then decreases and remains at a constant level. Here, we can see that it is the “local behavior”. Globally, the probability only grows, then it stabilizes (after about 350 days — a year). Therefore, you may suspect that it is not worth allowing reservations so far in advance.

Moreover, one can spot the biggest change in predictions up to about 30 days of lead time. Thus, it is worth taking a closer look at how the ALE profiles change in a shorter period. We can see that the increase in the average prediction is greater up to about 15 days, then the chance of cancellation begins to rise more slowly.

A plot of ALE for lead time shorter than 35 days.

When comparing the results for different models, the profiles are comparable, but the XGBoost model is the most sensitive, as can be seen from the shape of the profile curve.

Summary

We discovered that the country of origin of the booker is the most important factor determining the decisions of the model. In business use, this could be considered an unfairly biased behavior, so the possible solution would be to use separate models for predicting cancellations for foreigners and compatriots.

Moreover, the identification of the least significant variables makes it possible to eliminate them and simplify the model.

We can also validate the created model with an additional, not related to any metric, assessment of its behavior.

In addition, the discovered and described relationships serve not only to explain the operation of the model itself, but also constitute a newly generated knowledge about the phenomenon of hotel cancellation and confirm its causes.

Now we see how the use of explainable artificial intelligence methods helps us understand the behavior of the created model and what are the other benefits of using them.

If you are interested in other posts about explainable, fair and responsible ML, follow #ResponsibleML on Medium.

--

--