How Airbnb Achieved Metric Consistency at Scale
At Airbnb, we lean on data to inform our critical decisions. We validate product ideas through randomized controlled experiments, and we track our business performance rigorously to ensure that we maximize values for our stakeholders. To achieve these goals, we needed to build a robust data platform that serves the internal users’ end-to-end needs.
While we have previously shared how we ingest data into our data warehouse and how to enable users to conduct their own analyses with contextual data, we have not yet discussed the middle layer: how to properly model and transform data into accurate, analysis-ready datasets.
In this post, we will share our journey in building Minerva, Airbnb’s metric platform that is used across the company as the single source of truth for analytics, reporting, and experimentation. Specifically, we will set the context on why we built it, describe its core features and the ecosystem of tools it has enabled, and highlight the impact it has had on Airbnb. In upcoming posts, we will deep dive into the technology behind Minerva and share the lessons we learned along the way. By publishing this series, we hope our readers will appreciate the power of a system like Minerva and be inspired to create something similar for their organizations!
A Brief History of Analytics at Airbnb
Like many data-driven companies, Airbnb had a humble start at the beginning of its data journey. Circa 2010, there was only one full-time analyst at the company working on data, and his laptop was effectively the company’s data warehouse. Queries were often run directly against the production databases, and expensive queries occasionally caused serious incidents and took down Airbnb.com. In spite of the pitfalls, this simple solution helped Airbnb identify many growth opportunities over the years.
As Airbnb’s footprint continued to grow in the early 2010s, more data scientists were brought on to the company and data kept growing both in terms of size and of variety. It was around then that we went through the first phase of changes, upgrading and stabilizing our data infrastructure. We switched from Chronos to our home-grown, now open sourced, Apache Airflow for workflow orchestration and invested in building a set of highly critical data tables called `core_data`.
With `core data` serving as the foundation, analytics at Airbnb began to blossom. First, we brought the culture of A/B testing to Airbnb by building and scaling Airbnb’s experimentation platform. We built an in-house data catalog, Dataportal, to organize and document our data and created, now open sourced, Apache Superset so more users could analyze data independently and interactively. Last but not least, we focused on data education by launching Data University, a program to teach non-data scientists useful skills in an effort to democratize data analysis at Airbnb.
While `core_data` brought several step-function changes to Airbnb’s data capabilities, our success did not come without some significant cost. In fact, the proliferation of data and use cases caused serious growing pains, both for data producers and for data consumers.
First, as `core_data` continued to rise in popularity, more data producers wanted to use it for analytics, forecasting, and experimentation. New tables were created manually on top of `core_data` tables every other day, but there was no way to tell if similar tables already existed. The complexity of our warehouse continued to grow, and data lineage became impossible to track. When a data issue upstream was discovered and fixed, there was no guarantee that the fix would propagate to all downstream jobs. As a result, data scientists and engineers spent countless hours debugging data discrepancies, fighting fires, and often feeling unproductive and defeated.
For data consumption, we heard complaints from decision makers that different teams reported different numbers for very simple business questions, and there was no easy way to know which number was correct. Years ago, when Brian, our CEO, would ask simple questions like which city had the most bookings in the previous week, Data Science and Finance would sometimes provide diverging answers using slightly different tables, metric definitions, and business logic. Over time, even data scientists started to second guess their own data, confidence in data quality fell, and trust from decision makers degraded.
Overcoming Our Growing Pains with Minerva
As these pain points worsened, Airbnb embarked on a multi-year journey to revamp its data warehouse with the goal of drastically improving data quality at the company. As a first step, our data engineering team rebuilt several key business data models from scratch, which resulted in a set of certified, lean, normalized tables that do not use unnecessary joins. These vetted tables now served as the new foundation for our analytics warehouse.
Our work hardly stopped there, however. In order to translate these tables into insights, we needed to be able to programmatically join them together to create analysis-friendly datasets. We needed to be able to backfill data whenever business logic changed. Finally, we needed data to be presented consistently and correctly in different consumption tools.
This is when Minerva — Airbnb’s metric platform — came onto the scene. Minerva takes fact and dimension tables as inputs, performs data denormalization, and serves the aggregated data to downstream applications. The Minerva API bridges the gap between upstream data and downstream consumption, enabling Data Engineering teams the flexibility to modify core tables while maintaining support for various downstream consumers. This API serves a vital role in Airbnb’s next-generation data warehouse architecture.
To date, we have more than 12,000 metrics and 4,000 dimensions in Minerva, with more than 200 data producers spanning across different functions (e.g., Data, Product Management, Finance, Engineering) and teams (e.g., Core Product, Trust, Payments). Most teams now regard Minerva as their preferred framework for analytics, reporting, and experimentation at Airbnb.
Data Production in Minerva
From an infrastructure perspective, Minerva is built on top of open-source projects. It uses Airflow for workflow orchestration, Apache Hive and Apache Spark as the compute engine, and Presto and Apache Druid for consumption. From metric creation through computation, serving, consumption, and eventually deprecation, Minerva covers the full life cycle of a metric.
- Metrics Definition: Minerva defines key business metrics, dimensions, and other metadata in a centralized Github repository that can be viewed and updated by anyone at the company.
- Validated Workflow: The Minerva development flow enforces best data engineering practices such as code review, static validation, and test runs.
- DAG Orchestration: Minerva performs data denormalization efficiently by maximizing data reuse and intermediate joined results.
- Computation Runtime: Minerva has a sophisticated computation flow that can automatically self-heal after job failures and has built-in checks to ensure data quality.
- Metrics / Metadata Serving: Minerva provides a unified data API to serve both aggregated and raw metrics on demand.
- Flexible Backfills: Minerva version controls data definitions, so major changes to the datasets are automatically tracked and backfilled.
- Data Management: Minerva has built-in capabilities such as cost attribution, GDPR selective deletion, data access control, and an auto-deprecation policy.
- Data Retention: Minerva establishes usage-based retention and garbage collection, so expensive but infrequently utilized datasets are removed.
The above mentioned features allow us to standardize metric creation, data computation, and data delivering. In the next post, we will deep dive into these features and explain them in more detail!
Data Consumption in Minerva
Minerva’s product vision is to allow users to “define metrics once, use them everywhere”. That is, a metric created in Minerva should be easily accessed in company dashboarding tools like Superset, tracked in our A/B testing framework ERF, or processed by our anomaly detection algorithms to spot business anomalies, just to name a few. Over the last few years, we have partnered closely with other teams to create an ecosystem of tools built on top of Minerva.
First, we partnered closely with the Analytics Product team to index all Minerva metrics and dimensions in the Dataportal, Airbnb’s data catalog. When a user interfaces with the Dataportal and searches for a metric, it ranks Minerva metrics at the top of the search results. The Dataportal also surfaces contextual information, such as certification status, ownership, and popularity so that users can gauge the relative importance of metrics. For most non-technical users, the Dataportal is their first entry point to metrics in Minerva.
Upon selecting a metric, users are redirected to Metric Explorer, a component of the Dataportal that enables out-of-the-box data exploration. On a metric page, users can see trends of a metric with additional slicing and drill down options such as `Group By` and `Filter`. Those who wish to dig deeper can click into the Superset view to perform more advanced analytics. Throughout this experience, Metric Explorer surfaces metadata such as metric owners, historical landing time, and metric description to enrich the data context. This design balances the needs of both technical and non-technical users so they can uncover data insights in-place seamlessly.
Historically, Airbnb’s Experimentation Reporting Framework (ERF) had its own experiment metrics repository called “metrics repo”. Experimenters could add any business metric to an experiment and compare the results of the control and treatment group. Unfortunately, the metrics repo couldn’t be used for other use cases beyond experimentation, so we decided to integrate Minerva with ERF so all base events for A/B tests are defined and sourced from Minerva. Using the same source across experimentation and analytics means data scientists can be confident in their understanding of how certain experiments could affect the top line business metrics.
Long since Airbnb became a public company, we have adopted a practice of reviewing Airbnb’s business performance on weekly, monthly, and quarterly cadences. In these meetings, leaders across different functions meet and discuss the current state of the business. This type of meeting requires executive reports that are high-level and succinct. Data are often aggregated, trends are analyzed and plotted, and metrics movements are presented as running aggregations (e.g., year to date) and time ratio comparison (e.g., year over year).
To enable this type of reporting, we built an eXecutive Reporting Framework (XRF). XRF takes a list of user-specified Minerva metrics and dimensions and turns them into aggregated metric time series that are report-friendly. This framework automates a lot of the manual work and allows us to standardize high-fidelity, business-critical reports by leveraging the same Minerva metrics and dimensions used for analysis and experimentation.
Last but not least, Minerva data is exposed to Airbnb’s custom R and Python clients through Minerva’s API. This allows data scientists to query Minerva data in a notebook environment with ease. Importantly, the data that’s being surfaced in the notebook environment is computed and surfaced exactly the same way as they were in the aforementioned tools, such as Superset and Metric Explorer. This saves enormous amounts of time for data scientists as they can pick and choose the right tool for the job depending on the complexity of the analysis. Notably, this data API encourages lightweight prototyping of internal tooling, which can later be productionalized and shared across the company. For example, data scientists have built a time series analysis tool and an email reporting framework using this API over the last two years.
How We Responded To the COVID-19 Crisis with Minerva Data
As Minerva became a centerpiece of analytics at Airbnb, we saw again and again the power and productivity gain it has brought to the data community at Airbnb. In this last section, we want to give a concrete example of how Minerva aided the business during the COVID-19 crisis.
In March 2020, global travel came to a halt due to COVID-19. Almost overnight, Airbnb bookings plummeted and cancellations skyrocketed. This was a scary moment for us, and it raised many important business questions: How was the coronavirus affecting our nights backlog? How was it affecting our occupancy rate? What was the financial impact of rising cancellations? How had the coronavirus altered travel demand in terms of travel distance? We needed to answer all these questions quickly and correctly.
In response to this influx of inquiries, our data science team gathered the questions and started to brainstorm how we could leverage data to answer them. Crucially, since many important business metrics and dimensions for supply, demand, finance, and customer support were already defined in Minerva, our Central Analytics team was able to wireframe an executive dashboard and roll out the initial version in just under a few days. The COVID-19 dashboard quickly became the single authoritative source of truth and was reviewed closely by our executive team in the midst of the crisis. Since then, it has amassed more than 11,000 views and 1,500 distinct viewers. Not surprisingly, the COVID-19 dashboard was the most viewed Superset dashboard at Airbnb in 2020.
Insights generated from Minerva metrics also allowed the company to confidently pinpoint the rapidly changing landscape. For example, we uncovered market opportunities such as demand shift to local travel and longer-term stays. These findings led us to redesign several important touch points of our product pages to meet the shift in user preference. In moments of crisis, the ability to answer questions and uncover insights is more important than ever. We are able to do this efficiently and effectively, thanks to the single source of truth data in Minerva!
In this post, we briefly summarized the history of Airbnb’s analytics journey, the growing pains we faced in the last few years, and why we built Minerva, Airbnb’s metric infrastructure. In particular, we covered how data is produced and consumed via Minerva. Toward the end of the post, we also highlighted a recent example of how Minerva helped Airbnb to react to the COVID-19 crisis.
In the next post, we will deep dive into Minerva’s technical architecture, including the design principles, the user development flow, as well as the data computation graph. In the last post of the series, we will introduce Minerva API, which is our single layer of data abstraction that made all the integrations outlined above possible. We will close the series by sharing the lessons that we’ve learned from building Minerva in the hope that these lessons will be helpful for others building similar systems.
Until then, stay tuned for our next post!
Minerva is made possible only because of the care and dedication from those who worked on it. We would also like to thank Lauren Chircus, Aaron Keys, Mike Lin, Adrian Kuhn, Krishna Bhupatiraju, Michelle Thomas, Erik Ritter, Serena Jiang, Krist Wongsuphasawat, Chris Williams, Ken Chen, Guang Yang, Jinyang Li, Clark Wright, Vaughn Quoss, Jerry Chu, Pala Muthiah, Kevin Yang, Ellen Huynh, and many more who partnered with us to make Minerva more accessible across the company. Finally, thank you Bill Ulammandakh for creating the beautiful visualization so we can use it as our header image!
Apache®, Apache Airflow, Apache Superset, Apache Hive, Apache Spark and Apache druid are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
Presto is the registered trademark of LF Projects, LLC.
GITHUB® is the exclusive trademark registered in the United States by GitHub, Inc.
All trademarks are the properties of their respective owners. Any use of these are for identification purposes only and do not imply sponsorship or endorsement.