Philosophers & Telescopes
Even without owning any accommodations, Booking.com hardly needs an introduction as one of the world’s largest travel companies. By effectively combining data science with a customer focus, Booking.com has become a market leader. Science, in general, is a key driver of its success and highly valued within the company. One of the key players within this scientific environment is Lukas Vermeer. With an academic background in computing science and machine learning, Lukas is a Senior Data Scientist at Booking.com and responsible for controlled experiments. Idyllically located at a canal in Amsterdam, Jasper Lanters and I interviewed him at their headquarters. He enthusiastically tells us about experiments at Booking.com and the future of data science: philosophers and telescope builders.
Purpose & Pitfalls of A/B Testing
As a company that highly values science, how do you conduct experiments?
We use experiments in product development to enhance the customer experience in two ways: on the one hand we use it to prevent the customer experience from unexpectedly worsening and we also use it to learn what customers want in a positive sense. Preventing degradation is the easier challenge. There are certain customer segments that are relatively small and vulnerable, and they are easily overlooked while developing new features for our customers. For example countries where they read from right to left, or people who use screen reader software. It could very well be that a new feature on the website does not work properly for these groups. These are typical ‘edge cases’. Whenever we roll out a change, we use A/B testing to find out its effects. Finding large negative effects is easier and requires smaller sample sizes compared to minor optimizations to the customer experience, so I spend most of my time on the latter.
“When problems are effectively solved, customers will come to you and KPIs will go up. Not the other way around.”
Of course we do not rely only on controlled experiments; we use both quantitative and qualitative methods. Booking has its own lab where qualitative research is performed. There we invite people to have a coffee and ask them to book a place to stay. We observe what they do, listen to them, ask questions. Ideas we get from these qualitative studies, for example people might say they appreciate a good breakfast, turn into features on our website which we then validate through A/B testing. This is for us the second goal of A/B testing: validating if a solution helps users solve a certain problem or not. This validation and solution focus is very important and is often forgotten. When you look at companies that do a lot of A/B testing, the continuous threat is that they end up trying to optimize for the KPI, rather than solve actual problems for users. In my view, the goal of A/B testing is to validate whether a hypothesised problem is effectively solved. And that’s it! When problems are effectively solved, customers will come to you and KPIs will go up. Not the other way around.
How does data science relate to this?
It helps in understanding what people want, and in creating features and finding recommendations that help them find what they need. Many people don’t know exactly what they are looking for. Booking.com has more accommodation options than any other company, so in theory there should be something for everyone. The challenge is to help people find the place that’s just right for them, each and every time.
When it comes to recommendations, sometimes we recommend options that may seem irrelevant to the exact user query, for example alternative dates for a specific accommodation. People who don’t know what they want should be offered a window of exploration so they are given the opportunity to broaden their search, not narrow it down too much.
Furthermore, some people are flexible, others not. Data Science can also help in predicting what people are flexible and who are not. Can we predict which dates are interesting? For which people? This is very important, because the most annoying thing to see when you are looking for a specific week only, is a 50% discount for the following week.
What kind of data do you use as model input?
We hardly have any user profile data available, which I think is one of the big misperceptions. When you land on our website, we only get your cookie, your IP-address and we know which browser you are using. That’s about it. We do know however, what you do. That is one of the reasons we have to show you so many irrelevant things, especially in the beginning. When we don’t know anything about a customer, we should assume that they could fit all profiles. We show signals of different types and interpret their reactions. This is very challenging, because ground truth is very hard to obtain. Is this customer really flexible? Or is he just reacting randomly? It all much relies on unsupervised learning.
“We hardly have any user profile data available, which I think is one of the big misperceptions.”
Life at Booking.com
And now for something completely different: do you travel a lot yourself?
I used to before I had kids haha! Now I have three little girls and exploring the world is a lot harder. One of the nice things about Booking.com is that we have more than 100 nationalities here in the office, so I still meet people from around the world without having to travel myself! This cultural diversity is actually very important to us. For a travel company serving people from all over the world it is important to try to understand our product from as many perspectives as possible. Diversity gives us strength.
What does your organisation look like?
Booking.com is organised a little differently from most companies and that was initially one of the reasons for me to want to work for Booking.com. In my consultancy years I saw many companies with totally isolated departments structured around internal responsibilities: IT, marketing, sales, etc. Here departments are structured around customer experiences and products instead. We work with multidisciplinary teams that are responsible for a specific part of our product and made as empowered and self-sufficient as possible. For example, a team of 8 people, including developers, designers, copywriters and a product owner, might be responsible for a tool that help users manage their reservations. They do their own research, build their own features, and make their own decisions about their product without much interference from leadership.
How is knowledge shared across the company?
Although we have internal social networks, the best way of knowledge sharing is still face to face, to get a cup of coffee and talk about stuff. Because people are almost always close to the people they need, communication can flow organically. We also rotate people and change teams a lot in response to what users need, so people get to know a lot of colleagues and different parts of our organisation very fast.
If you also like coffee and desire a more intimate view of life at Booking.com, have a look!
Philosophers, Telescopes and the Future of Data Science
How is data science changing in the future?
I think, and this is probably not what people want to hear, that the prestigious part of analytics; modelling, implementing neural networks, logistic regression or whatever you have, is the easiest to automate and therefore will be automated first. The layers just outside the black box are harder to automate, so things like data collection, feature engineering, output modelling, decision making and causal inference are the skills that will remain crucial. Every model can only be as good as whatever you put in it and the way it returns you something. I often refer to people working on these outer layers as “philosophers” and “telescope builders”. Philosophers think about the entire system and the world within which it operates, while telescope builders focus on the more technical aspects of accurately measuring the world around them. In the end, I think these jobs will remain.
“I think that the prestigious part of analytics; modelling, implementing neural networks, or whatever, is the easiest to automate and therefore will be automated first.”
So are you a philosopher or a telescope builder?
I think I am both, but I like philosophizing the most. Thinking about concepts; the system. I actually started studying computing science because I was good at biology. In the high school exam there was this question: a patient has proteins in his urine, what could be wrong? I loved to solve these types of problems and I think this is very similar to debugging code. The software yields an error, what caused it? There is this huge complex system like a human body, in which many parts communicate with each other and one way or another we don’t get the desired output. Solving those puzzles is what I really love.
In the inspiring talk below, Lukas elaborates these two phenomena, sauerkraut and much more.
Studying Data Science, Raspberries and Cats
What do you think data science studies should focus on?
I actually said it already: on raising philosophers and telescope builders. From the outside I see the focus is often on applying certain models or technologies, number crunching, prediction, Hadoop etc. I would like to see more focus on evaluating the reliability of measurements and inferences in the first place. Asking questions like: How can I engineer my features in such a way they are perfect input for the machine? How can I use the output to make the best decisions? What are the internal and external threats to validity? Don’t only focus on the analysis itself, also involve what comes before and after. Data science is not only about the analysis or the model, but also very much about the context in which they are used.
You can practice this yourself! Just buy a Raspberry Pi, USB thermometer and go measure the temperature in your house. Write a little script that records a measurement every minute and you’ll soon have a data set. Let’s say you want to predict the temperature in your home based on the temperature outside using a simple predictive model. Can you come up with ways in which this can go wrong? What are the threats to the model and the data that you are using? There can be a power failure, your cat sat on the thermometer, your mother in law came by and raised the heating. Lots of unexpected stuff can happen. A simple challenge like this can help you practise to anticipate on the possible flaws in your data, and learn how you can manage to still make accurate predictions, or at least know when you can’t.
I think that, when you’re tackling a problem like this, the least interesting part is what specific model you are going to use for the prediction. Whether it is a neural network, linear regression, or whatever. That is the easiest part! The hard part is: my cat has been sitting on the thermometer and I don’t know exactly when! How am I going to clean my data so that it is suitable for accurate predictions? Maybe you’ll even find a pattern and it turns out the cat sits there every morning at 7AM, then you can include your cat in the model, haha! But no jokes, guys, this is what data science is all about. These are eventually problems you will be facing in the real world as well.
“What you need is a data scientist capable of sanity checking. Someone who sees data from 2024 and thinks: “But it is not 2024 yet!”
The cat is a silly example of course, but there are many similarities with the things we are struggling with in practice. Do you know these mobile games where a player is supposed to wait another day to get more coins? Do you know what people often do? They simply set their phone date a day forward, because then they won’t have to wait for an entire day! And they do it again___and again. And then they open the Booking app. So when we retrieve the data, the app says: congratulations, this person has clicked on your link, tomorrow. Well, how accurate do you think my model is going to be if I used this data? Even the most intelligent models are not going to find this sort of thing! What you need is a data scientist capable of sanity checking. Someone who knows to not make a model without checking the data first. Someone who sees data from 2024 and thinks: “But it is not 2024 yet!” Those people are the people we need.
After being invited to lunch and discussing Lukas’ prior ambition to watch the entire IMDb top 250, Jasper and I were left in doubt, though inspired. While somewhat ashamedly resetting my phone to the actual date, we asked ourselves whether we wanted to be philosophers or telescope builders. Although I know my nature, there might just be valuable combinations between the two profiles. One of them for sure, is Lukas Vermeer.