Airbnb at KDD 2023
KDD (Knowledge and Data Mining) is a flagship conference in data science research. Hosted annually by a special interest group of the Association for Computing Machinery (ACM), it’s where you’ll learn about some of the most ground-breaking developments in data mining, knowledge discovery, and large-scale data analytics.
Airbnb had a significant presence at KDD 2023 with two papers accepted into the main conference proceedings and 11 talks and presentations. In this blog post, we’ll summarize our team’s contributions and share highlights from an exciting week of research talks, workshops, panel discussions, and more.
Deep learning and search ranking
Even though search ranking is a problem that researchers have been working on for decades, there are still many nuances to explore. For example, at Airbnb, guests are typically searching over a period of days or weeks, not minutes. And being a two-way marketplace, there are factors like the potential for hosts to cancel the booking that we’d like to account for in ranking.
Optimizing Airbnb Search Journey with Multi-task Learning, our paper accepted at KDD 2023, presents Journey Ranker, a new multi-task deep learning model. The core insight here is that for this kind of long-term search task, we want to optimize for intermediate steps in the user journey.
The Journey Ranker base module assists guests in reaching positive milestones. There is also a Twiddler module that assists guests in avoiding negative milestones. The modules work off a shared feature representation of listing and guest context, and their output scores are combined.
Because of its modular design, Journey Ranker can be used whenever there are positive or negative milestones to consider. We’ve implemented it in different Airbnb search and other products to drive improvements in business metrics.
We also co-presented a tutorial on Data-Centric AI (DCAI). DCAI is a fast-growing field in deep learning, because as model design matures, innovation is being driven by data. We shared DCAI best practices and trends for developing training data, developing inference data, maintaining data, and creating benchmarks, with many examples from working with LLMs.
Online experimentation and measurement
Online experimentation (e.g., A/B testing) is a common way for organizations like Airbnb to make data-driven decisions. But high variance is frequently a challenge. For example, it’s hard to prove that a change in our search UX will drive value when bookings are infrequent and depend on a large number of interactions over a long period of time.
Our paper Variance Reduction Using In-Experiment Data: Efficient and Targeted Online Measurement for Sparse and Delayed Outcomes presents two new methods for variance reduction that rely exclusively on in-experiment data:
- A framework for a model-based leading indicator metric that continually estimates progress toward a delayed binary outcome.
- A counterfactual treatment exposure index that quantifies the amount a user is impacted by the treatment.
In testing, both methods achieved a variance reduction of 50% or more. These techniques have greatly improved our experimentation efficiency and impact.
With more than 50% variance reduction, the new model-based leading indicator metric (listing-view utility, on the right) aligns with the target uncancelled booking metric much better than other indicators such as listing-view with dates (on the left).
Another interesting challenge in online experimentation is avoiding interference bias, which can happen when you have competition between your A/B test subjects. Airbnb presented a keynote talk on this topic at KDD’s 2nd Workshop on Decision Intelligence and Analytics for Online Marketplaces. As an example, if you ran an A/B test where group B saw lower booking prices, they might “cannibalize” the bookings from group A. There are two imperfect solutions: clustering (isolating the options for participants) and switchbacks (grouping participants by time intervals).
Also at the workshop, we presented the paper The Price is Right: Removing A/B Test Bias in a Marketplace of Expirable Goods. This discusses the problem of lead-day bias: where items like concert tickets, air travel, and Airbnb bookings vary in price based on the distance from their expiration date. This can wreak havoc on A/B tests, and in the paper we present several mitigation techniques, such as limited rollout, smart overlapping of experiments, and Heterogeneous Treatment Effect (HTE) remixed estimator to correct for bias and accelerate R&D process.
Along with limited rollout and smart overlapping of experiments, HTE-remixed estimator can provide sufficiently robust estimation of the long-term experiment impact from the short-term result and significantly shorten the experiment run-time.
Causal inference for marketing and user journey optimization
In marketing, the million-dollar question is how much should you spend per channel? This can be reframed as a causal inference problem: how many incremental conversions does each channel drive?
When we look at marketing activities across Nielsen’s Designated Marketing Areas (DMAs) we find moderate to strong correlation across channels. This makes it hard to isolate the impact of one channel from another. In fact, when we include the correlated channels in the same regression, the coefficients flip signs for most channels, a clear sign of multicollinearity.
Existing solutions to multicollinearity, such as shrinkage estimators, principal component analysis, and partial linear regression, are particularly helpful for prediction problems but work less well for our use case where we need to maintain business interpretability while isolating causality. Our approach, described in the paper Hierarchical Clustering as a Novel Solution to Multicollinearity, is to hierarchically cluster DMAs based on their similarity in marketing impressions over time. With such clustering, cross-channel correlation dropped by up to 43% and the channel coefficients no longer flip signs.
Not only does our method provide an intuitive and effective solution to multicollinearity, it also circumvents the need for complex transformation and preserves the interpretability of the data and the results throughout, empowering broad applications to causal inference problems.
We presented this paper at the new KDD workshop, Causal Inference and Machine Learning in Practice: Use cases for Product, Brand, Policy, and beyond. Airbnb’s Totte Harinen co-organized this workshop, which strongly resonated with KDD’s audience — it had 12 papers and four invited talks from 37 authors in 14 institutions.
In addition, we were invited to present two talks and one poster at KDD’s 2nd Workshop on End-End Customer Journey Optimization, and joined the workshop’s panel discussion. One of these talks covered CLV (customer lifetime value) modeling. At Airbnb, we want to grow our brand and community by growing all users. Our CLV ecosystem applies two frameworks:
- The value of Airbnb customers. We use traditional ML approaches along with research into more customer-lifecycle-focused architectures (i.e. HMMs). We augment this with demand-supply incrementality modeling to properly account for guest and host contributions to value.
- The value growth that Airbnb delivers to customers. By accounting for long-term incremental effects of booking on Airbnb along with incremental contributions from marketing and attribution strategies, we can measure incremental changes in CLV and optimize towards them.
Causal inference can also be applied to search. At the CJ workshop, we presented our paper Low Inventory State: Identifying Under-Served Queries for Airbnb Search, which explored the problem of searches that return a low number of results. Whether or not that number is “too low” and will deter a guest from booking depends on search parameters and intent to book. For a given search query, we can use causal inference to determine the incremental effect of an additional result on the probability of booking. Our model outperforms non-causal methods and can assist with supply management as well.
Finally, our poster discussed how we measure the effects of national TV advertising campaigns. We analyzed TV exposure data and demographic data with data on Airbnb onsite behavior using a third-party identity graph. We were able to resolve disparate datasets to a unique identifier and model individual households.
We use propensity score matching to estimate TV effects, and then scale these estimates to a nationally-representative population. We leverage this data to provide tactical insights for marketing and understand how long TV effects take to decay.
The plot above (from simulated study for illustration) shows the results of an analysis for a TV campaign from August — October. We can see that the TV campaign was effective at increasing bookings for households that saw an Airbnb TV ad and was more effective for one subgroup (red line) than the other subgroup.
Data science and analytics infra
How can you achieve science at scale in a medium-to-large engineering organization? At the KDD’s 2nd Workshop on Applied Machine Learning Management, we shared Airbnb’s solution for data science reproducibility and reuse, Onebrain. The core of Onebrain is a coding standard for configuring data science projects entirely in YAML. Onebrain’s backend abstracts away CI/CD, configuration/dependency management, and command-line parsing. Since it’s “just code,” Onebrain projects can be checked into a version-controlled repo, and any repo can be a Onebrain repo.
User interaction with Onebrain happens through a CLI. With a single command, anyone can use an existing project as a template for their own work, or generate a one-click URL to spin up a server and run the project. Usage is growing fast with over 200 distinct projects and over 500 users at Airbnb within just a year.
While most of our research focuses on high-order data use-cases like models, data capture is essential as it’s the starting point for any analysis. Event logging libraries typically capture actions on and impressions of app components (buttons, sections, pages). But with this level of granularity, it can be difficult to abstract out user behavior, measure the total time spent on a surface, or understand the context surrounding an action.
At the 2nd Workshop on End-End Customer Journey Optimization, we spoke about a new type of client-side event called Sessions. Part of Airbnb’s client-side logging solution, Sessions provide a way to track user context and behaviors within the Airbnb product. Unlike traditional time-based sessions used in web analytics, these Sessions can be tied to various aspects of the Airbnb user experience. For example, they can be tied to specific surfaces like the checkout page, API calls used for observability, or even internal states of the app that abstract away complex UI components. The flexibility of Sessions allows us to capture a wide range of user interactions and better understand their journey throughout our platform.
Conclusion
KDD is an amazing opportunity for data scientists from around the world, and across industry and academia, to come together and exchange learnings and discoveries. We were honored to be invited to share techniques we’ve developed through applied research at Airbnb. The strategies and insights we presented at KDD have been essential to improving Airbnb’s platform, business, and user experience. We’re constantly motivated by innovations happening around us, and we’re thrilled to give back to the community and eager to see what kinds of new applications and advancements may come about as a result.
At the bottom of the page, you’ll find a complete list of the talks and papers shared in this article along with the team members who contributed. If you can see yourself on our team, we encourage you to apply for an open position today.
List of papers and talks
Optimizing Airbnb Search Journey with Multi-task Learning [link]
Authors: Chun How Tan, Austin Chan, Malay Haldar, Jie Tang, Xin Liu, Mustafa Abdool, Huiji Gao, Liwei He, Sanjeev Katariya
Variance Reduction Using In-Experiment Data: Efficient and Targeted Online Measurement for Sparse and Delayed Outcomes [link]
Authors: Alex Deng, Michelle Du, Anna Matlin, Qing Zhang
Beyond the Simple A/B test: Mitigating Interference Bias at Airbnb
Speaker: Ruben Lobel
The Price is Right: Removing A/B Test Bias in a Marketplace of Expirable Goods [link]
Author: Thu Le, Alex Deng
Unveiling the Guest & Host Journey: Session-Based Instrumentation on Airbnb Platform
Speaker: Shant Torosean
Devoted to Long-Term Adventure: Growing Airbnb Through Measuring Customer Lifetime Value
Speaker: Sean O’Donell, Jason Cai, Linsha Chen
Low Inventory State: Identifying Under-Served Queries for Airbnb Search [link]
Author: Toma Gulea, Bradley Turnbull
Measuring TV Campaigns at Airbnb
Speaker: Adam Maidman, Sam Barrows
Tutorial: Data-Centric AI [link]
Presenter: Daochen Zha, Huiji Gao
Hierarchical Clustering As a Novel Solution to the Notorious: Multicollinearity Problem in Observational Causal Inference [link]
Authors: Yufei Wu, Zhiying Gu, Alex Deng, Jacob Zhu, Linsha Chen
Onebrain — Microprojects for Data Science [link]
Authors: Daniel Miller, Alex Deng, Narek Amirbekian, Navin Sivanandam, Rodolfo Carboni