Prioritizing Home Attributes Based on Guest Interest

Joy Jing
The Airbnb Tech Blog
7 min readFeb 16, 2023

How Airbnb leverages ML to derive guest interest from unstructured text data and provide personalized recommendations to Hosts

By: Joy Jing and Jing Xia

At Airbnb, we endeavor to build a world where anyone can belong anywhere. We strive to understand what our guests care about and match them with Hosts who can provide what they are looking for. What better source for guest preferences than the guests themselves?

We built a system called the Attribute Prioritization System (APS) to listen to our guests’ needs in a home: What are they requesting in messages to Hosts? What are they commenting on in reviews? What are common requests when calling customer support? And how does it differ by the home’s location, property type, price, as well as guests’ travel needs?

With this personalized understanding of what home amenities, facilities, and location features (i.e. “home attributes”) matter most to our guests, we advise Hosts on which home attributes to acquire, merchandize, and verify. We can also display to guests the home attributes that are most relevant to their destination and needs.

We do this through a scalable, platformized, and data-driven engineering system. This blog post describes the science and engineering behind the system.

What do guests care about?

First, to determine what matters most to our guests in a home, we look at what guests request, comment on, and contact customer support about the most. Are they asking a Host whether they have wifi, free parking, a private hot tub, or access to the beach?

To parse this unstructured data at scale, Airbnb built LATEX (Listing ATtribute EXtraction), a machine learning system that can extract home attributes from unstructured text data like guest messages and reviews, customer support tickets, and listing descriptions. LATEX accomplishes this in two steps:

  1. A named entity recognition (NER) module extracts key phrases from unstructured text data
  2. An entity mapping module then maps these key phrases to home attributes

The named entity recognition (NER) module uses textCNN (convolutional neural network for text) and is trained and fine tuned on human labeled text data from various data sources within Airbnb. In the training dataset, we label each phrase that falls into the following five categories: Amenity, Activity, Event, Specific POI (i.e. “Lake Tahoe”), or generic POI (i.e. “post office”).

The entity mapping module uses an unsupervised learning approach to map these phrases to home attributes. To achieve this, we compute the cosine distance between the candidate phrase and the attribute label in the fine-tuned word embedding space. We consider the closest mapping to be the referenced attribute, and can calculate a confidence score for the mapping.

We then calculate how frequently an entity is referenced in each text source (i.e. messages, reviews, customer service tickets), and aggregate the normalized frequency across text sources. Home attributes with many mentions are considered more important.

With this system, we are able to gain insight into what guests are interested in, even highlighting new entities that we may not yet support. The scalable engineering system also allows us to improve the model by onboarding additional data sources and languages.

An example of a listing’s description with keywords highlighted and labeled by the Latex NER model.
An example of a listing’s description with keywords highlighted and labeled by the Latex NER model.

What do guests care about for different types of homes?

What guests look for in a mountain cabin is different from an urban apartment. Gaining a more complete understanding of guests’ needs in an Airbnb home enables us to provide more personalized guidance to Hosts.

To achieve this, we calculate a unique ranking of attributes for each home. Based on the characteristics of a home–location, property type, capacity, luxury level, etc–we predict how frequently each attribute will be mentioned in messages, reviews, and customer service tickets. We then use these predicted frequencies to calculate a customized importance score that is used to rank all possible attributes of a home.

For example, let us consider a mountain cabin that can host six people with an average daily price of $50. In determining what is most important for potential guests, we learn from what is most talked about for other homes that share these same characteristics. The result: hot tub, fire pit, lake view, mountain view, grill, and kayak. In contrast, what’s important for an urban apartment are: parking, restaurants, grocery stores, and subway stations.

Image: An example image of a mountain cabin home
An example of home attributes ranked for a mountain cabin vs an urban apartment.
An example of home attributes ranked for a mountain cabin vs an urban apartment.
Image: An example of an urban apartment home

We could directly aggregate the frequency of keyword usage amongst similar homes. But this approach would run into issues at scale; the cardinality of our home segments could grow exponentially large, with sparse data in very unique segments. Instead, we built an inference model that uses the raw keyword frequency data to infer the expected frequency for a segment. This inference approach is scalable as we use finer and more dimensions to characterize our homes. This allows us to support our Hosts to best highlight their unique and diverse collection of homes.

How can guests’ preferences help Hosts improve?

Now that we have a granular understanding of what guests want, we can help Hosts showcase what guests are looking for by:

  • Recommending that Hosts acquire an amenity guests often request (i.e. coffee maker)
  • Merchandizing an existing home attribute that guests tend to comment favorably on in reviews (i.e. patio)
  • Clarifying popular facilities that may end up in requests to customer support (i.e. the privacy and ability to access a pool)

But to make these recommendations relevant, it’s not enough to know what guests want. We also need to be sure about what’s already in the home. This turns out to be trickier than asking the Host due to the 800+ home attributes we collect. Most Hosts aren’t able to immediately and accurately add all of the attributes their home has, especially since amenities like a crib mean different things to different people. To fill in some of the gaps, we leverage guests feedback for amenities and facilities they have seen or used. In addition, some home attributes are available from trustworthy third parties, such as real estate or geolocation databases that can provide square footage, bedroom count, or if the home is overlooking a lake or beach. We’re able to build a truly complete picture of a home by leveraging data from our Hosts, guests, and trustworthy third parties.

We utilize several different models, including a Bayesian inference model that increases in confidence as more guests confirm that the home has an attribute. We also leverage a supervised neural network WiDeText machine learning model that uses features about the home to predict the likelihood that the next guest will confirm the attribute’s existence.

Together with our estimate of how important certain home attributes are for a home, and the likelihood that the home attribute already exists or needs clarification, we are able to give personalized and relevant recommendations to Hosts on what to acquire, merchandize, and clarify when promoting their home on Airbnb.

Cards shown to Hosts to better promote their listings.
Cards shown to Hosts to better promote their listings.

What’s next?

This is the first time we’ve known what attributes our guests want down to the home level. What’s important varies greatly based on home location and trip type.

This full-stack prioritization system has allowed us to give more relevant and personalized advice to Hosts, to merchandize what guests are looking for, and to accurately represent popular and contentious attributes. When Hosts accurately describe their homes and highlight what guests care about, guests can find their perfect vacation home more easily.

We are currently experimenting with highlighting amenities that are most important for each type of home (i.e. kayak for mountain cabin, parking for urban apartment) on the home’s product description page. We believe we can leverage the knowledge gained to improve search and to determine which home attributes are most important for different categories of homes.

On the Host side, we’re expanding this prioritization methodology to encompass additional tips and insights into how Hosts can make their listings even more desirable. This includes actions like freeing up popular nights, offering discounts, and adjusting settings. By leveraging unstructured text data to help guests connect with their perfect Host and home, we hope to foster a world where anyone can belong anywhere.

If this type of work interests you, check out some of our related positions at Careers at Airbnb!

Acknowledgments

It takes a village to build such a robust full-stack platform. Special thanks to (alphabetical by last name) Usman Abbasi, Dean Chen, Guillaume Guy, Noah Hendrix, Hongwei Li, Xiao Li, Sara Liu, Qianru Ma, Dan Nguyen, Martin Nguyen, Brennan Polley, Federico Ponte, Jose Rodriguez, Peng Wang, Rongru Yan, Meng Yu, Lu Zhang for their contributions, dedication, expertise, and thoughtfulness!

--

--