Scaling Knowledge Access and Retrieval at Airbnb

Introducing our Knowledge Graph for encoding relationships and surfacing relevant information

Spencer Chang
Sep 4, 2018 · 9 min read
Cherry blossoms in full bloom in Tokyo

Imagine you’re finally getting to take that vacation you’ve dreamed of — three countries, seven cities, thousands of miles. It’s everything you could want and more, right? But where do you start? How do you know what to eat, where to visit, how to experience what makes your destination truly unique? All the information you’re looking for is on Airbnb…somewhere…the question is, “How do we surface the relevant parts of that information to you at the exact time you’re looking for it?”

Discovering what you want and need to know about a destination is crucial to the overall trip experience, especially when traveling to a place you’ve never been to before. In order to surface relevant context to people, we need to have some way of representing relationships between distinct but related entities (think cities, activities, cuisines, etc.) on Airbnb to easily access important and relevant information about them. These types of information will become increasingly important as we move towards becoming an end-to-end travel platform as opposed to just a place for staying in homes. The knowledge graph is our solution to this need, giving us the technical scalability we need to power all of Airbnb’s verticals and the flexibility to define abstract relationships.

In this post, I will first give a high-level overview of how the knowledge graph works. Then, we’ll dive a little deeper into how this enables scaling to the entire platform and how my summer project of introducing location data plays into our goal of surfacing relevant content.

Types of Information Needed for Traveling

Where do I travel and what area do I stay in?

  • Which destinations are popular/trending among people similar to me, and what destinations have activities that match my interests?
  • Which neighborhoods match my interests? (am I more interested in a neighborhood close to nature or one known for its nightlife)

What should I do?

  • food & drink (what is the local cuisine known for?)
  • entertainment / activities (what activities are good in this destination?)
  • landmarks / points of interest (what are popular/trending places here?)

So how do we surface all this information to people in a generalizable and scalable manner?

The Knowledge Graph

A visualization of the Knowledge Graph

The knowledge graph is not a new concept. It has been used successfully at many companies (the most famous example being Google which uses it to power their search engine and surface relevant context for particular queries).

Why is a graph structure scalable?

Although our underlying data store is still relational, structuring our queries in terms of this graph gives us power in maintaining data semantics. We want the same Surfing that an Experience is associated with to be the same Surfing that Hawaii is known for. This type of structure around the relationships between the entities on Airbnb’s platform gives us the scalability and flexibility needed to expand categorization to any number of things. By having the same object representing all of the things in our world, we remove the operational overhead for redefining the world whenever we introduce a new product to our platform.

In this way, we can support our objectives to 1) encode an exponentially growing number of relationships between entities and 2) enable easy traversal along those connections.

Structure of the graph

sample “taxonomy” of our graph

The taxonomy of our knowledge graph refers to the vocabulary that we use to describe our inventory and world around us. The taxonomy is hierarchical (as shown above), so that we can map high-level concepts like “Sport” down to a very specific activity such as “Surfing.” The main constraint that we want to maintain is that the knowledge graph is Mutually Exclusive and Collectively Exhaustive, so that we can keep the taxonomy very streamlined in avoiding duplicate data. Because of the graph structure that we have, it’s very easy to scale this taxonomy to tens or hundreds of layers deep and still surface the relevant inventory for high level concepts.

In the graph, we have nodes and edges. Nodes refer to any type of entity on the Airbnb platform (restaurants, neighborhoods, experiences, events, etc.). Edges refer to the types of relationships that exist between any of the entities in the graph. Under this model, there are different types of nodes for different types of entities, and different types of edges for different types of relationships (located in, tagged by, etc.). From there, we have a flexible API to query for neighbors connected by certain types of relationships, and can index our inventory items by the unique identifiers of their corresponding representation in the knowledge graph.

page from https://www.airbnb.com/s/experiences?refinement_paths[]=/experiences/Concept/Refinement/Nature

For example, you can see that we use refinement paths in the URL of search to populate Experiences of a given type.

Once a critical mass of data is reached, we can start thinking about making automatic inferences based on the data already in the graph. For example, if something is tagged with “Nature” and “Walking,” maybe we can infer that it should be tagged with “Hiking” as well. We’re currently doing some work with training a text embedding model on the entities and relations in the graph to see what edges are likely to exist that do not currently exist. In the future, this kind of automatic inference of connections will allow us to quickly categorize inventory as it comes on our platform without requiring manual work.

Location Data in the Knowledge Graph

searching for “Hiking” a tag in our taxonomy

My main project for the summer was introducing the concept of locations to the knowledge graph. Why locations? Two main reasons:

  1. Easily traverse Airbnb inventory by location: This goes back to what we discussed above with the graph structure enabling easy traversal along relationships. Because of this, it intuitively makes sense for locations to be represented in the graph, because the geographical entities are hierarchical by nature (think country → city → neighborhood → restaurant, etc.).
  2. Infer attributes about locations for personalization: One of the ways we can improve the discovery experience for travelers coming to Airbnb is by personalizing the content we show by location. You might imagine that different types of food/activities are emphasized when you search for Tokyo and when you search for New York. Maybe a section for the best sushi places in Tokyo appears near the top while New York features a section on Broadway experiences. This type of personalization can be achieved when we link experiences to tags and concepts (like “Theatre” to “New York” and “Sushi” to “Tokyo”).

Structure of geo-hierarchy nodes/edges

Visualization of the hierarchy for location relationships

As seen in the image above, we can now easily represent the geographical relationships between things on Airbnb. In any given market, there are several neighborhoods, each of which contains tons of inventory items on Airbnb (Experiences, Homes, Places, and Restaurants).

Neighborhoods, Markets, and other location nodes have different types in the knowledge graph because different location types may require different types of information, and explicitly encoding the hierarchy makes it easy to find the appropriate level of desired granularity. In addition, there is a single edge type, contains_location, that represents geographical hierarchies, in order to make it simple to traverse from a high-level geographical node to the leaf inventory entities.

As a proof-of-concept, I launched an experiment with a Neighborhood section in our post Home-booking emails to surface a description of and Experiences for the neighborhood that the guest is staying in, providing more local context for guests. You can imagine in the future having search results grouped by neighborhoods for what they’re known for (Neighborhoods great for Nightlife, Food, etc.), and personalizing what kinds of inventory we show cross-verticals (Experiences for activities that people do a lot in that location and Restaurants for specialty cuisines in that region).

neighborhood section in email

Final Thoughts

Acknowledgements

Feel free to reach out if you have any questions about my experience at Airbnb or anything else!!

Check out more of our cool work over at airbnb.io and follow us on Twitter: @AirbnbEng

The Airbnb Tech Blog

Creative engineers and data scientists building a world…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store