EXPEDIA GROUP TECHNOLOGY — DATA
The Impact of Corporate Travel Policies on Traveller Behaviour
A statistical analysis
An important part of Egencia’s™ (part of Expedia Group™) travel management offer is the configuration and usage of travel policies, offering our clients control and flexibility over their corporate travel program. These policies allow our clients to regulate employee travel choices, for reasons related to savings, comfort or safety.
Clients can flag flight booking options that are too expensive, business or first class, or last minute, among others. On our web platform, we mark these out of policy options on the search results pages with a red flag. Below is an example of a flight from Paris to Marseille that’s out of policy because it’s too expensive.
While we’ve always believed such policies add value, we wanted to precisely quantify their influence on traveller behaviour. So we turned to data and robust statistical analysis techniques.
Our big questions
The main questions were:
- Are policy rules linked to changes in client spending?
- Are policy rules impacting traveller comfort?
- Are policy rules affecting traveller convenience?
Using data to better understand how these policies impact our clients and travellers allows us to make more informed decisions and to improve the clarity of our value proposition to our clients regarding the usage of policies.
There are eight air policies that may be applied, including:
- Price above Recommended Fare and Fixed Price Air Policy: set reasonable spending limits on a flight fare (the first one is a limit on top of the price of our recommended fare for the search, and the second one is a maximum price).
- Advance Purchase: controls how long in advance flights should be booked.
- Highest Cabin Class: regulates the cabin classes that can be selected by travellers (e.g., business, economy).
We analysed the impact of all air policies, but the ones mentioned above provided the most interesting results. All of these policies can be configured in different ways depending on the settings considered. It’s important to mention that policies are often used together, so when analysing the effect of a certain policy, we were aware of the potential confounding effects of other policies.
Are policy rules linked to changes in client spending?
Yes. The activation of any of the four main air policies above was associated with a statistically significant decrease in the Average Ticket Price (ATP) of our client bookings. This translates directly to client savings. The data can also demonstrate potential savings ranges per flight distance.
When comparing similar clients, enabling any of the above air policies was associated with potential savings in the range of:
- US$4-$18 for a short haul flight of 500km
- US$8-$35 for a medium haul flight of 1000km
- US$39-$177 for a long haul flight of 5000km
These are aggregated figures across all client booking data included in the samples. The ranges encompass the results from all four policies mentioned above.
Not only were we able to show that activating certain policies is correlated with a decrease in spending, we also were able to verify the impact of using different settings of the same policy. In general, the analyses confirmed our expectations that the stricter the policy setting, the lower the traveller spending. Some examples:
- Having a stricter Advance Purchase policy setting to encourage the traveller to book earlier in advance was linked to lower prices of the tickets being bought. In particular, comparing bookings made under ‘less than 1 week in advance’ settings and those under ‘between 2 and 4 weeks in advance’, we could observe a statistically significant average price difference of ~US$0.01/km (that being US$10 for a 1000km flight, for example).
- Having a lower allowance setting for the Price above Recommended Fare policy was associated with a decrease in the average price of the tickets bought compared to comparable clients that had a higher allowance policy setting. Considering the threshold of separating the two settings at US$200/25% over our recommended fare price, we observed a statistically significant price difference of ~US$0.011/km (that being US$11 for a 1000km flight, for example).
It was also interesting to see that the behaviour under less restrictive settings can sometimes be similar to the behaviour where a particular policy is disabled— for example, in the case of the Highest Cabin Class policy.
Are policy rules impacting traveller comfort?
Yes. The activation of policies such as Highest Cabin Class , Price above Recommended Fare and Fixed Price Air Policy was associated with a statistically significant reduction in the percentage of business class bookings in favour of economy class bookings. Similarly, more restrictive settings of these policies provided a higher reduction in the percentage of business class bookings compared to less restrictive settings.
Are policy rules affecting traveller convenience?
No. Considering the flight duration as an element of the travel convenience, the activation of air policies was linked to no statistically significant changes in the average flight duration of flights chosen. Travellers are not pushed by the air policies to select longer flights.
While these three were our biggest questions around the usage of policies, with our data-driven analyses we were able to answer other questions, such as the impact of policies on online versus offline bookings. Consequently, this brought more clarity to our business about the potential impact of travel policies for us and our clients.
How did we get these results?
To be able to answer all our questions we applied a simulated AB-test methodology to measure the statistical impact of policy settings on business metrics and traveller behaviour. We call it ‘simulated’ as the AB-testing was not performed live, but we managed to approximate it with the historical data we’d already collected about our bookings in the period before COVID-19.
The methodology can be observed in the diagram below.
To understand better our methodology, let’s take an example.
Let’s say we want to test the effect of enabling the policy Price above Recommended Fare on the clients’ spending. In business terms, the client spending for a general ticket can be expressed as the ‘Average Ticket Price’ or ATP. Given that the policies are set at the level of a traveller group (a subset of employees of a client company), we test their impact at this level.
In our pipeline, we applied the following steps:
1. Choose policy: We have the policy whose impact we want to test, in this case: Price above Recommended Fare.
2. Split into AB groups: Based on the policy usage, we can split the available traveller groups into two categories: A — for the groups that have the policy disabled/off, and B — for the groups that have the policy enabled/on. Later, we used this same logic to compare different settings of the same policy.
3. Select similar samples: An essential step is to sample the traveller groups from the two above-mentioned groups so that they represent similar samples with respect to some chosen features, called control features (see below for details). This is to ensure that the traveller groups in A and B have similar backgrounds so that any potential difference that we observe in our comparative metric ATP is not due to comparing different segment groups (e.g., if we observe a difference in the ATP to ensure that this is not due to the point of sale of the travellers, for example, but it is indeed due to the policy settings).
4. Compute statistic of interest for the impact feature: Now that we have the two similar samples A and B, we can compute the statistics of interest for both groups and compare them to see if the difference observed is statistically significant. The statistic of interest can in this case be the mean, the median or the weighted mean of the impact feature. The impact feature for our example is the ATP, but it could also be the flight duration, percentage of business class bookings etc. It depends on what we want to measure exactly. Thus, in this case, we compute the mean ATP for both samples A and B.
5. Apply hypothesis testing: To determine if the difference between the two samples is indeed statistically significant, we perform hypothesis tests using permutation tests. In brief, these tests are the empirical equivalent of t-tests, but we rely on computer simulations instead of applying mathematical formulas. Additionally, we can use bootstrapping to create confidence intervals of the features of interest as well as of the difference between the two groups.
6. Conclusion: Finally, comparing the p-value we obtain from the hypothesis test with the alpha we set (for the confidence value), we can derive the conclusion to see if the difference between the two groups is indeed statistically significant or if it’s just due to random sampling.
In our case, given the policy Price above Recommended Fare, we observed that indeed the difference between the two groups was statistically significant (at a 95% confidence level), where traveller groups that enabled the policy spent significantly less than those who disabled it.
Control features — ensuring that our A & B samples represent similar traveller groups
As specified in step 3 of the previous section, using control features to create similar samples is an essential step of the analysis. This is similar to what it is done in live AB tests when controlling that the users’ backgrounds are similar in both scenarios. When we say ‘similar samples’, we mean that the two samples we form — A and B — have a similar distribution of the control features. There are four control features we considered:
- Two categorical features: point of sale (e.g., US, France, Norway) and company segment (related to the size of the company)
- Two continuous features: number of travellers and number of bookings (of the company)
This part of the pipeline has two steps for each control feature x:
- Sample the data from the initial groups A and B so that they have a similar distribution of feature x. For the categorical features, we sample an equal number of points for each possible value from the 2 groups (e.g., for the point of sale variable we ensure that the groups A and B have the same number of groups coming from each country). For the continuous features, we first bin the values into a defined number of buckets and then we follow the same logic.
- Ensure that the two distributions of feature x are indeed similar by performing statistical tests. For this we perform different tests depending if the features are categorical or continuous. For categorical features we perform a Chi-Square test and for the continuous ones we perform a Kolgomorov-Smirnov test.
In the images below we can see a ‘before and after’ when applying the sampling techniques to the initial samples A and B (for both a categorical and a continuous feature). The control features distributions become similar, which we can see in the grey colour of the charts given by the two groups distributions now overlapping.
However, given that we have four control features for which we have to perform the above two steps, we have to make sure that when sampling for a new control feature, we do not affect the similarity considering the previous features. That’s why for each feature we sample, we rerun the 4 statistical tests and we perform the sampling in a loop until the distribution of all control features is similar between the two samples. By relying on these statistical tests to confirm the similarity of distributions we do not expect that the distributions would be exactly the same, which would not be realistic across all 4 features at the same time.
In order to choose those specific 4 features we used a combination of expert knowledge and AA testing:
- Expert knowledge: Thinking about the features that would be the most important to control for. This gave us a subset of features to consider, out of the many available.
- AA testing: This is an important step usually done in AB testing to ensure that the samples are properly randomised and controlled. It works by creating two A groups (i.e., control groups) and comparing the statistics of interest e.g. ATP between the two groups. Given that both groups are ‘A’ in this case, we would expect that the difference between them is not statistically significant. If it is significant, it means there is a problem with our pipeline. We used this logic to try many different subsets of control features and consider those for which the AA tests were passing.
Conclusions
We have simulated AB tests from retrospective data to better understand the impact of using travel policies on the client behaviour. This showed us that yes, activating some of our policies was linked to a statistically significant decrease in client spending and in the percentage of business class bookings, while bringing no significant change in the average duration of the flights selected. We also confirmed that stricter policy settings translated into lower client spending compared to less strict settings. These results will continue to provide valuable guidance to our clients in finding and implementing the optimal travel policies.