Scaling Knowledge Access and Retrieval at Airbnb
Introducing our Knowledge Graph for encoding relationships and surfacing relevant information
Imagine you’re finally getting to take that vacation you’ve dreamed of — three countries, seven cities, thousands of miles. It’s everything you could want and more, right? But where do you start? How do you know what to eat, where to visit, how to experience what makes your destination truly unique? All the information you’re looking for is on Airbnb…somewhere…the question is, “How do we surface the relevant parts of that information to you at the exact time you’re looking for it?”
Discovering what you want and need to know about a destination is crucial to the overall trip experience, especially when traveling to a place you’ve never been to before. In order to surface relevant context to people, we need to have some way of representing relationships between distinct but related entities (think cities, activities, cuisines, etc.) on Airbnb to easily access important and relevant information about them. These types of information will become increasingly important as we move towards becoming an end-to-end travel platform as opposed to just a place for staying in homes. The knowledge graph is our solution to this need, giving us the technical scalability we need to power all of Airbnb’s verticals and the flexibility to define abstract relationships.
In this post, I will first give a high-level overview of how the knowledge graph works. Then, we’ll dive a little deeper into how this enables scaling to the entire platform and how my summer project of introducing location data plays into our goal of surfacing relevant content.
Types of Information Needed for Traveling
Besides providing a welcoming home to stay in on your trip, what else do you need to know to get towards, as Brian says, an 11-star experience? Here’s a few things you might think about when you’re thinking about going somewhere.
Where do I travel and what area do I stay in?
In this phase, you need to decide on a destination for your trip and what parts of that destination you want to explore. This can include:
- Which destinations are popular/trending among people similar to me, and what destinations have activities that match my interests?
- Which neighborhoods match my interests? (am I more interested in a neighborhood close to nature or one known for its nightlife)
What should I do?
In the second phase, you’ve figured out where you want to visit and stay, but you need to figure out what you actually want to do and see. Perhaps you think about:
- food & drink (what is the local cuisine known for?)
- entertainment / activities (what activities are good in this destination?)
- landmarks / points of interest (what are popular/trending places here?)
So how do we surface all this information to people in a generalizable and scalable manner?
The Knowledge Graph
The knowledge graph is not a new concept. It has been used successfully at many companies (the most famous example being Google which uses it to power their search engine and surface relevant context for particular queries).
Why is a graph structure scalable?
Normally, engineers work with relational structures, where a schema defines what each row of data contains. This is the preferred way for holding data because it works great for transactional processes since it makes it really quick to access rows of data. However, there is an operational burden when you have many table for distinct objects that may contain the same relational information in individual columns (ex: the city homes or experiences are located in, or the type of activity that an experience and that a destination is known for). This is where the graph structure comes into play.
Although our underlying data store is still relational, structuring our queries in terms of this graph give us power in maintaining data semantics. We want the same Surfing that an experience is associated with to be the same Surfing that Hawaii is known for. This type of structure around the relationships between the entities on Airbnb’s platform gives us the scalability and flexibility needed to expand categorization to any number of things. By having the same object to represent all of the things in our world, we remove the operational overhead for redefining the world whenever we introduce a new product to our platform.
In this way, we can support our objective to 1) encode an exponentially growing number of relationships between entities and 2) enable easy traversal along those connections.
Structure of the graph
The taxonomy of our knowledge graph refers to the vocabulary that we use to describe our inventory and world around us. The taxonomy is hierarchical (as shown above), so that we can map high-level concepts like “Sport” down to a very specific activity such as “Surfing.” The main constraint that we want to maintain is that the knowledge graph is Mutually Exclusive and Collectively Exhaustive, so that we can keep the taxonomy very streamlined in avoiding duplicate data. Because of the graph structure that we have, it’s very easy to scale this taxonomy to tens or hundreds of layers deep and still surface the relevant inventory for high level concepts.
In the graph, we have nodes and edges. Nodes refer to any type of entity on the Airbnb platform (restaurants, neighborhoods, experiences, events, etc.). Edges refer to the types of relationships that exist between any of the entities in the graph. Under this model, there are different types of nodes for different types of entities and different types of edges for different types of relationships (located in, tagged by, etc.). From there, we have a flexible API to query for neighbors connected by certain types of relationships and can index our inventory items by the unique identifiers of their corresponding representation in the knowledge graph.
For example, you can see that we use refinement paths in the URL of search to populate experiences of a given type.
Once a critical mass of data is reached, we can start thinking about making automatic inferences based on the data already in the graph. For example, if something is tagged with “Nature” and “Walking,” maybe we can infer that it should be tagged with “Hiking” as well. We’re currently doing some work with training a text embedding model on the entities and relations in the graph to see what edges are likely to exist that do not currently exist. In the future, this kind of automatic inference of connections will allow us to quickly categorize inventory as it comes on our platform without requiring manual work.
Location Data in the Knowledge Graph
Before the summer, the main use of the knowledge graph was enabling searching our taxonomy for experiences associated with those tags.
My main project for the summer was introducing the concept of locations. Why locations? Two main reasons:
Easily traverse Airbnb inventory by location
This goes back to what we discussed above with the graph structure enabling easy traversal along relationships. Because of this, it intuitively makes sense for locations to be represented in the graph because they geographical entities are hierarchical by nature (think country → city → neighborhood → restaurant, etc.).
Inferring attributes about locations for personalization
One of the ways we can improve the discovery experience for travelers coming to Airbnb is by personalizing the content we show by location. You might imagine that different types of food/activities are emphasized when you search for Tokyo and when you search for New York. Maybe a section for the best sushi places in Tokyo appears near the top while New York features a section on Broadway experiences. This type of personalization can be achieved when we link experiences to tags and concepts (like “Theatre” to “New York” and “Sushi” to “Tokyo”).
Structure of geo-hierarchy nodes/edges
As seen in the image above, we can now really easily represent the geographical relationships between things on Airbnb. In any given market, there are several neighborhoods which each contain tons of inventory items from Airbnb (Experiences, Homes, Places, and Restaurants).
Neighborhoods, Markets, and other location nodes have different types in the knowledge graph because different location types may require different types of information and encoding the hierarchy explicitly makes it easy to find the appropriate level of granularity desired. In addition, there is a single edge type, contains_location, to represent geographical hierarchies in order to make it simple to traverse from a high-level geographical node to the leaf inventory entities.
As a first proof-of-concept for the data, I launched an experiment with a Neighborhood section in our post home-booking emails to surface a description and experiences for the neighborhood that the guest is staying in, providing more local context for guests. You can imagine in the future having search result groupings for neighborhoods by what they’re known for (Neighborhoods great for Nightlife, Food, etc.) and personalizing what kinds of inventory we show cross-verticals (Experiences for activities that people do a lot in that location and Restaurants for specialty cuisines in that region).
The knowledge graph is the start of a foundation that empowers Airbnb to transform trip planning to be more intuitive and personalized than before. By providing the infrastructure and interface to accessing clean, structured data about the world around us, the knowledge graph can empower the Airbnb community to discover things they would have never gotten exposure to before. Imagine being able to see the information that you might need without having to explicitly type it into the search bar. Airbnb can provide the inspiration for the trip and then inform people about how to make the most of their trip in a particular destination. At every step of the trip planning and decision process, we can provide additional context and content that is personalized to each individual to make it easier than ever to go from that dream trip in your head to soaking in a new environment and meeting the amazing locals.
Thanks to my manager, Elizabeth Ford, the team on Knowledge Graph that helped with the post: Xiaoya Wei, Lei Shi, Manuel Ebert, and everyone else on Inform and Trip Platform who helped me out along the way, including Yizheng Liao and Chris Zhu for giving me more insight into data science/inference work at Airbnb and Xiaoyou Wang and Lumen Bigott for helping me launch the neighborhood email experiment! I had an awesome time this summer and really enjoyed getting to have ownership over something that will hopefully help the Airbnb community for years to come :) Shout out to all the amazing interns and people I met who made the summer so fun and rewarding! I’ll miss the ping pong table and the ramen bar :’(