EXPEDIA GROUP TECHNOLOGY — DATA

Adaptive Products and Contextual Bandits — A New Way of Optimising Websites

How using contextual multi-armed bandits will change the way product teams optimise web pages.

Fedor Parfenov

Published in

Expedia Group Technology

9 min readJan 19, 2021

For more than a decade, personalisation of web content has been a hot topic for E-commerce, as websites adapting to users’ diverse needs saw tremendous wins on conversion and revenue. With the use of machine learning techniques, a lot of progress has been made when recommending items or services sold on websites. However, it seems that there was not as much progress made when it comes to personalising the layout of webpages or apps.

I believe this is due to two main factors: firstly, many webpages were not originally designed with layout modularity in mind, making it hard to implement personalisation — a problem currently being addressed by developer teams across the industry. Secondly, this requires a shift in how product and data science teams approach web optimisation — the subject I want to tackle here.

In this somewhat anticipatory entry, I wanted to share my insight on how the use of contextual multi-armed bandits on adaptive products might change the way testing programmes will be run and how product teams will need to adjust their thought process once they are fully adopted.

The canonical way of thinking: Independent Experimentation

Most product teams I have worked with have adopted a well established experimentation-driven approach. First, a team choose a north star like increasing the subscription rate to a loyalty program. Then it designs a campaign of relevant improvements coupled with A/B testing. Some of these tests will succeed and others less so. As they build knowledge, product managers adapt the roadmap and iterate until eventually reaching the target. This method is both flexible and robust — it has proven to be a driver in many success stories becoming a staple approach in the industry.

Approaching testing independently, the teams take charge of different aspects and improve them using testing.

However, there are several challenges:

Typically, tests on one aspect of the page will have to run sequentially and often for a lengthy period of time. This bottleneck can significantly slow progress down as fewer ideas are tested.
Tests on different parts of the page can run concurrently, but independently from one another. This exposes the team to miss on positive or negative interactions between two variants of different tests. Additionally, uncoordinated test launches at different points in time could blur the results and complicate decision making.
Optimising for specific sub-segments of users requires to split the traffic between several buckets which leads to reducing the power of the tests, exposing an increased amount of false-positive and negatives. Moreover, this quickly becomes difficult to manage if tests are launched independently.

Contextual Bandits for Multivariate Optimisation

Given these bottlenecks, the recent development in contextual multi-armed bandits applied to web optimisation has been received with great interest. Some groundbreaking papers [1–5] have shown that these techniques can alleviate the problems by:

Repurposing traffic to more promising options on the fly which reduces the risk of poor variants running for too long and makes better use of the traffic in general.
Running optimisations as a model (statistical/machine learning model) which enables combining tests together and taking interactions into account.
Optimising for specific users sub-segments which is now not only possible but also maintainable.

To illustrate, let us take a simple example: say a team wants to maximise the click-through rate of a landing page with a series of design decisions over several aspects of the page like the welcome message, the image size and the search module size:

Example of aspects that can be changed on the page.

In this example, 3 aspects with 3 variants each yields 27 different layouts. To optimise this, the contextual bandits will start by randomising the different variants of the modules. However, as it ingests feedback from user interactions, it will start focusing the traffic on the ones showing more promise, automatically converging to better options and minimising the opportunity cost of showing ‘poor’ layouts. For more details, please read our technical article.

Changing those 3 simple aspects can radically change the page’s focus and look.

This is the main strength of a bandit approach — the natural balancing between exploring the different options (or arms) to find and exploit what appears to be the best one. This opens an avenue to explore much more complex ideas while using the traffic much more efficiently.

In general, bandit algorithms generate less ‘regret’ from displaying poor options.

In such projects, we also want to cater for the diverse preferences of our users. One common way to de-average the experience is to bucket them with available information which we will call context. There are several classic contextual dimensions in the industry: the user login status, the country (ie. DE, FR, UK, etc.), the channel used to reach the website, which is usually easily accessible information. While this is only a first step towards pure personalisation for each specific user, contextual bandits can use those features to capture strong preferences among sub-groups of users effectively adapting the experience to their needs.

These type of approaches have gained in popularity recently and companies like Amazon, Microsoft, Google and Netflix[6–9] have implemented them on various use cases. At Expedia Group™, we have the ambition to make them a widely used tool to accelerate testing programmes and personalise the content. I expect an increasing amount of companies will follow suit given these success stories.

The new way of thinking: Managing Optimised Layout Bundles

Now that it’s possible to efficiently combine different tests, product managers have the opportunity to converge their work streams into a single optimisation campaign over dimensions like country or user identification status.

Product managers combine their ideas to test in a campaign

Nevertheless, optimising for specific segments requires reviewing product managers’ approach when testing the website. When several segments are considered, there is a high chance of obtaining more than one optimal layout. With an increasing number of aspects tested and customer segments, the number of optimal layouts can actually grow exponentially, quickly becoming intractable. From this point, one cannot think of a page as a monolithic entity on which one can manually tweak and test small aspects in isolation.

With contextual multi-armed bandits, the product managers have to operate with complex entities with numerous possibilities which we call optimised layout bundles. This is a much more macro approach as it is not realistic to have knowledge of all optima within the bundle. For instance, if the day of the week is one of the contextual dimensions, the optimal layout can vary from one day to another. From there, only a general understanding of the content featured on a page and their frequency remains practical.

When adopting contextual bandits, the product managers have to give up control over smaller-scale decision making — some might consider this an issue. However given the amount of decisions to make, this will be much better controlled algorithmically — instead of managing one hypothesis in one segment at a time, one manages a set of experiments, all run at the same time and optimised automatically. This new layer of abstraction shifts the focus away from small scale tactical considerations and toward much more strategic tasks like product design and user experience research, which fosters bolder products and ideas.

As for the deliverable of the design teams, it remains the same however it has to be coordinated to run as one optimisation campaign. This does sacrifice flexibility when scheduling experimentation — in my opinion, however, the benefits of testing more ideas in a coordinated manner outweighs the losses.

One additional benefit: product managers do not need to limit themselves to a few preselected variants anymore. The exploration and exploitation balancing of the bandits allow testing a wider scope of ideas over the same amount of traffic, including those not looking promising at first glance. This also removes the fear of negative interactions between tests as they will be quickly picked up and deactivated. This can enable product managers to test bolder ideas with a greatly mitigated risk for the business.

Running an optimisation campaign — A/B Testing is here to stay

We are still missing one piece — how do we assess the result of a campaign against the current state of the page? While there is some research in the off policy evaluation field, the venerable A/B test remains the most robust and reliable way to ensure that a change on a webpage did improve the performance of a website. In this setting, however, the test will assess one optimised layout bundle against another instead of assessing isolated changes. As shown in our example above, the combinations of changes are much more impactful on user experiences and yield more clear-cut results.

Running an optimisation campaign, then testing the new *optimised layout bundles* against the current one.

The flow will be as follows:

The page at any point in time has a control optimised layout bundle (a static and unique layout on a webpage can be seen as the simplest possible bundle.)
The product managers create a set of variants over several aspects that they want to optimise and define a set of contextual dimensions — a new optimisation campaign. For instance, the product manager wants to test different colours of the header and different image size per point of sale.
This new optimisation campaign will be run using the contextual bandits algorithm we described above, yielding a new optimised layout bundle. For instance, blue and small is optimal for French customers while purple and large is optimal for Spanish customers.
The product manager will then run an A/B test between the control bundle and the candidate bundle, assessing the impact and rolling out the winner given the target KPI.
Rinse, repeat.

It is possible with this method to change both the aspects tested and the dimensions of the context from one campaign to another, which offers new avenues to create relevant and personalised content. Please note that the sequential nature remains as it helps coordinating the work of the team.

Final Thoughts

We are very excited with the recent results we have obtained with contextual bandits and hope this entry has given some food for thought regarding how this new capability can shift the approach of web optimisation and enable teams to experiment more new and bold ideas. The current A/B testing approaches are still robust and reliable, and should be kept for parts of the websites not ready for contextual bandits projects. Mature pages and teams however can greatly benefit from the accelerated decision making and the personalisation of the content of their page or app. Stay tuned for more on some of our campaign and infrastructure!

References

[1] Agrawal, S. and Goyal, N., 2013, February. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning (pp. 127–135).

[2] Hill, D.N., Nassif, H., Liu, Y., Iyer, A. and Vishwanathan, S.V.N., 2017, August. An efficient bandit algorithm for realtime multivariate optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1813–1821).

[3] Chapelle, O., Manavoglu, E. and Rosales, R., 2014. Simple and scalable response prediction for display advertising. ACM Transactions on Intelligent Systems and Technology (TIST), 5(4), pp.1–34.

[4] Li, L., Chu, W., Langford, J. and Schapire, R.E., 2010, April. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (pp. 661–670).

[5] https://github.com/VowpalWabbit/vowpal_wabbit/wiki

[6] https://cloud.google.com/blog/products/ai-machine-learning/how-to-build-better-contextual-bandits-machine-learning-models

[7] https://aws.amazon.com/blogs/machine-learning/power-contextual-bandits-using-continual-learning-with-amazon-sagemaker-rl/

[8] https://www.microsoft.com/en-us/research/blog/contextual-bandit-breakthrough-enables-deeper-personalization/

[9]https://www.reforge.com/brief/netflix-artwork-personalization-through-multi-armed-bandit-testing#jZipCPY2xrisuc7qg9pGDA

Icons by Adriano Maringolo: https://www.iconfinder.com/adriano.gomes

Learn more about technology at Expedia Group