Improving Deep Learning for Ranking Stays at Airbnb

Malay Haldar
Oct 6 · 6 min read

by Malay Haldar & Moose Abdool

Image for post
Image for post

Search ranking is at the heart of Airbnb. Data from search logs* indicate it is a feature used by more than 90% of guests to book a place to stay. In ranking, we want the search results (referred to as listings) to be sorted by guest preference, a task for which we train a deep neural network (DNN).

The general mechanism by which the DNN infers guest preference is by looking at past search results and the outcome associated with each listing that was shown. For example, booked listings are considered preferred over not booked ones. Changes to the DNN are therefore graded by the resulting change in booking volume.

Previously, we’ve focused on how to effectively apply DNNs to this process of learning guest preference. But this process of learning makes a leap — it assumes future guest preferences can be learned from past observations.

In this article, we go beyond the basic DNN building setup and examine this assumption in closer detail. In the process, we describe ways in which the simple learning to rank framework falls short and how we address some of these challenges. Our solution is what we refer to as the A, B, C, D of search ranking:

  • Architecture: Can we structure the DNN in a way that allows us to better represent guest preference?
  • Bias: Can we eliminate some of the systematic biases that exist in the past data?
  • Cold start: Can we correct for the disadvantage new listings face given they lack historical data?
  • Diversity of search results: Can we avoid the majority preference in past data from overwhelming the results of the future?


In response to this, we discarded the cheaper is better intuition, realizing what we really needed was an architecture to predict the ideal listing for the trip. The architecture, which is shown below in Figure 1, has two towers. A tower fed by query and user features predicts the ideal listing for the trip. The second tower transforms raw listing features into a vector. During training, the towers are trained so that booked listings are closer to the ideal listing, while unbooked listings are pushed away from it. When tested online in a controlled A/B experiment, this architecture managed to increase bookings by +0.6%.

Image for post
Image for post
Figure 1. Query and listing tower architecture


Image for post
Image for post
Figure 2. Click through rates by position in search results

To address this bias, we add position as a feature in the DNN. To avoid over reliance on the position feature, we introduce it along with a dropout rate. In this case, we set the position feature to 0 probabilistically 15% of the time during training. This additional information lets the DNN learn the influence of both the position and the quality of the listing on the booking decision of a user. While ranking listings for future users, we then set the input position to 0, effectively leveling the playing field for all listings. Correcting for positional bias led to an increase of +0.7% in bookings in an online A/B test.

Cold Start

To address this cold start issue, we developed a more accurate way of estimating the engagement data of a new listing rather than simply using a global default value for all new listings. This method considers similar listings, as measured by geographic location and capacity, and aggregates data from those listings to produce a more accurate estimation of how a new listing would perform. These more accurate predictions for new listing engagement resulted in a +14% increase in bookings for new listings and an increase of +0.4% for overall bookings in a controlled, online A/B test.

Diversity of Search Results

Our solution to address diversity involved developing a novel deep learning architecture, which consisted of Recurrent Neural Networks (RNNs), to generate an embedding of the query context using the entire result sequence. This Query Context Embedding is then used to re-rank the input listings in light of the new information about the entire result set. For example, the model could now learn local patterns and uprank a listing when it is one of the only listings available in a popular area for that search request. The architecture for generating this Query Context Embedding is shown in Figure 3 below.

Image for post
Image for post
Figure 3. RNN for search result context

Overall, we found this led to an increase in the diversity of our search results, along with a +0.4% global booking gain in an online A/B test.

The techniques described above enabled us to go beyond the basic deep learning setup, and they continue to serve all searches on Airbnb. That being said, this article touches on just a handful of the considerations that go into how our DNN works. Ultimately, we consider over 200 signals in determining search ranking. As we look into further improvements, a deeper understanding of guest preferences remains our guiding light.

Further Reading

We always welcome ideas from our readers. For those interested in contributing to this work, please check out the open positions on the search team.

*Data collected during first two weeks of Aug 2020.

Airbnb Engineering & Data Science

Creative engineers and data scientists building a world…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store