Behind the buzzwords: how we build ML products at Booking.com

Published in

Booking Product

8 min readApr 11, 2019

In our day to day work, we often encounter buzzwords like Machine Learning, Artificial Intelligence, Personalization (sometimes referred to as P13N), MVP, Agile, A/B Testing, HIPPO (highest paid person’s opinion) and many more. It can be during a business meeting, while reading an article, mingling in conferences, interviewing for our next dream job, and so on.

In this post, I’ll go over some of these buzzwords and try to make them more accessible by providing examples of how it takes place, in practice, in our ML product development process.

Machine Learning, Artificial Intelligence, Data Science and Personalization

Machine Learning (ML) is an application of artificial intelligence (AI) in which systems can learn from data, identify patterns and make predictions with minimal human intervention. An ML algorithm can improve its performance on a task over time when exposed to more data. This gives an advantage to big-data environments such as big e-commerce sites, popular gaming applications, etc.

Looking at a typical search results page on Booking.com’s website, we can identify many areas in which ML is involved: typical recommendation blocks as “Customers who viewed Hotel Victorie also viewed”, content selection such as the main image of the hotel, availability forecasting which is surfaced to the users in messages like “We’ll likely sell out of rooms at this hotel within the next 13 hours”, and hotels ranking/order.

Booking.com search results page (not the latest version which currently exists on the website)

In most of the above features, we can think of segments that will not necessarily be interested in the same content. Let’s look at an example of this.

“3 Reasons to visit” — this feature extracts the main incentives to visit a certain location according to users reviews using NLP (natural language processing) techniques. Now, imagine that your next trip to Amsterdam is for business reasons, it is unlikely that you would be interested in the “red light district” and “coffee shops” as the two main reasons to visit Amsterdam.

In other words, we can use ML to create a better fit between the user’s current needs and the content presented to him/her.

How can we use ML as an enabler for personalization?

At Booking.com we have two main approaches for personalization — most cases and 1:1 personalization. In most cases personalization, we develop and optimize the product for a segment/cluster of users (a.k.a “persona”) with similar characteristics or preferences. In 1:1 personalization (a.k.a “hyper-personalization”) we use the user’s specific behavior (searches, clicks, bookings, etc.) in order to tailor the experience for him/her.

1:1 personalization is very relevant in the Booking.com domain as the user may wear different travel hats each time he walks through our virtual door — one time he may be the husband that’s looking for a romantic vacation for his partner and himself, another time the businessman who is looking to book a hotel for his next business trip and at other times the friend who is looking to book a ski vacation with his pals. In addition, users may have different preferences while wearing those different travel hats. For example, a higher budget for a romantic vacation, different facilities preferences pending on the trip (e.g. pool, pet-friendly, spa, fitness center), etc.

Our main challenge in this scenario is cracking the “real-time context” of the users in order to provide a relevant personalized experience for them.

A good example of hyper-personalization is the Google match score that is integrated into Google Maps. The match score represents the fit level between a user and a restaurant based on the user’s dining preferences, past visits, past ratings, saved lists and more.

Google Match Score in Google Maps as an example for Hyper-personalization

Having said that, there is a thin line between being cool and relevant to being creepy and making users feel uncomfortable (tracking user’s locations, eavesdrop on the user, etc.). One famous example belongs to the large American retailer, Target, which identified patterns of pregnant customers and sent coupons with discounts on maternity products to their homes. One of their coupons found its way to a teenagers mailbox, and her father who didn’t know about her pregnancy till this point was less happy with the super intelligent personalization of Target.

So how do we create personalization in a more sensitive way?

The answer to this question involves, of course, users.

User Research, Customer Centric, A/B Culture and Data Driven

Across the wide range of personalization studies we’ve conducted, there are 2 overarching factors that most users raise:

Users express concern about personalization infringing on their sense of autonomy.

“Booking a trip is like playing Tetris; making all the pieces fit together. I tend not to go to sites where you press one button and the entire package lands in your lap.” (Booking.com user research, 2018).

Including user’s input may be a critical ingredient to take personalization to the next level.

“Most travel sites make recommendations for us based on our browsing history. I’d happily input 10 or so destinations that I would consider traveling to and welcome offers to be sent to me based on it.” (Booking.com user research, 2018).

When thinking of explicit feedback, we may fall into the trap of creating long forms or questionnaires for users to fill out. This should definitely not be the case. A thumbs up or down is a good example of how Netflix collects user feedback for their products (movies, series, etc.), without asking too much from their customers. Another example, from the Booking.com domain, is the “x” button in the below image (top right corner of each card), which is basically the user’s way of saying “This recommendation is not relevant for me”.

Explicit feedback in Booking.com main page (“x” button in each card)

As a side note, we use a variety of different research tools to get users’ opinions and feedback. These are being used across all phases of the product’s cycle from exploration, ideation, testing prototypes and optimizing existing products.

Tools for collecting users feedback across the product’s cycle

These tools can be clustered into four groups as presented below — data analytics, contextual inquiry/investigation, surveys, and interviews.

Tools for collecting users feedback represented in clusters

One of the most common ways of collecting users’ feedback at Booking.com is A/B testing. This method is an integral part of the Booking.com culture and removes the hierarchical nature of decision making, by creating a focus on data-driven decisions. Remember the HIPPOS from the beginning of this post? A/B testing removes the HIPPOS (highest paid person’s opinion) from the process, and instead lets decisions be made by our users! Other than contributing to our culture, the A/B testing also contributes to our business — 2–3x higher conversion rates than the industry average (at least according to Evercore Equity Research).

Which one of the variations was able to increase the number of users that added a review after their stay?

Example for A/B test in Booking.com’s notifications screen

The winner is variant B!

Although A/B testing is great, we all need to remember it is just a tool. In order to learn fast, fail fast and iterate we must have the right processes and methods to support it.

Agile and MVP

Working agile in an environment which involves research and machine learning can sometimes be challenging. For agile is all about quick iterations, small steps, short sprints, breaking a complex task into smaller tasks, etc. This is what we call the Data Science dilemma.

For Backend/Frontend developers or UX designers it is trivial to break a task into smaller tasks, while for a data scientist it is a bit more complicated. When data scientists get a task they don’t necessarily know the complexity of it, how much research has been done in the area, what would be the accuracy of the model in the first iteration, nor how many optimizations iterations will be needed. How can we tackle this anyway? I’ll explain by using an example.

Different images of the same hotel in Booking.com website

The above images are in fact three images of the same hotel. This implies that the same hotel can attract different travelers by using different images. The left image, for example, may be relevant for a backpacker who is looking for a decent place to spend the night in. The pool image might be more attractive for couples on a romantic vacation, while the image to the right may be relevant for a family with one or two kids.

Assuming we received a signal from user research that property images are an important factor in the booking decision, we can build our hypothesis based on that and start making a plan to go forward.

Before jumping into complex computer vision/image classification/object detection models, we can start our first iteration with randomizing the main images and see whether it “moves the needle” (or the conversion metric in our case). Assuming it does, we could conduct a second iteration on a specific segment, for example, business bookers which probably care more about having a desk in their room. As such, we would surface up images that include a desk as the main image for those travelers. Pending on the success of this, we could continue with other segments as well. This is an example of how we can break complex ML task into smaller tasks.

This is the MVP — minimum viable product — we can define the minimum, but our users, through user research and A/B testing, will define what is viable for them.

To sum up, the buzzwords we hear during our day to day have a real effect on the way we build ML products at Booking.com. We use ML as an enabler for personalization but try not to cross the creepiness line. We do that by talking to our users and understanding their behaviors and opinions. Moreover, we test, test and test, which enables us to take a no HIPPOS data-driven approach. We take small steps, even in a complex research environment.

Did you get through all this and are still wondering why P13N means personalization? Well, starts with a “P”, ends with an “N”, and contains 13 letters in between :)

Behind the buzzwords: how we build ML products at Booking.com

Written by Hadas Harush