Unified Payments Data Read at Airbnb
How we redesigned payments data read flow to optimize client integrations, while achieving up to 150x performance gains.
In recent years, Airbnb migrated most of its backend services from a monolith to a service-oriented architecture (SOA). This industry standard architecture brings countless benefits to a company that is at the scale of Airbnb; however, it is not free of challenges. With data scattered across many services, it’s difficult to provide all the information clients need in a simple and performant way, especially for complex domains such as payments. As Airbnb grew, this problem started to crop up for many new initiatives such as host earnings, tax form generation, and payout notifications, all of which required data to be read from the payments system.
In this blog post, we introduce Airbnb’s unified payments data read layer. This read layer was custom built to reduce the friction and complexity for client integrations, while greatly improving query performance and reliability. With this re-architecture, we were able to provide a greatly optimized experience to our host and guest communities, as well as for internal teams in the trust, compliance, and customer support domains.
Evolution of Airbnb’s Payments Platform
Payments is one of the earliest functionalities of the Airbnb app. Since our co-founder Nate’s first commit, Payments Platform has grown and evolved tremendously, and it continues to evolve at an even faster pace given our expanding global presence.
Similar to other companies, Airbnb started its journey with a monolithic application architecture. Since the feature set was initially limited, both write and read payment flows were “relatively” simple.
Predictably, this architecture couldn’t scale well with the rapid growth and expansion of our company. Payments, along with most other parts of the tech stack, started to migrate to the SOA architecture. This brought a significant overhaul of the existing architecture and provided many advantages, including:
- We had clear boundaries between different services, which enabled better domain ownership and faster iterations.
- Data was separated into domains in a very normalized shape, resulting in better correctness and consistency.
For more, take a peek at our blog post detailing the payments SOA migration.
New Architecture Introduces New Challenges
Payments SOA provided us with a more resilient, scalable, and maintainable payments system. During this long and complex migration, correctness of the system was our top priority. Data was normalized and scattered across many payments domains according to each team’s responsibilities. This subdivision of labor had an important side effect: presentation layers now often needed to integrate with multiple payments services to fetch all the required data.
At Airbnb, we believe in being transparent with our host and guest communities. Our surfaces related to payments and earnings display a range of details including fees, transaction dates, currencies, amounts, and total earnings. After the SOA migration, we needed to look into multiple services and read from even more tables than prior to the migration to get all the requested information. Naturally, this foundation brought challenges when we wanted to add new surfaces with payments data, or when we wanted to extend the existing surfaces to provide additional details. There were three main challenges that we needed to solve.
The first challenge was that clients now needed to understand the payments domain well enough to pick the correct services and APIs. For client engineers from other teams, this required a non-trivial amount of time investment and slowed down overall time to market. On the payments side, engineers needed to provide continuous consultation and guidance, occupying a significant portion of their work time.
The second challenge was that there were many instances in which we had to change multiple payments APIs at the same time in order to meet the client requirements. When there are too many touchpoints, it becomes hard to prioritize requests since many teams have to be involved. This problem also caused significant negative impact to time to market. We had to slow down or push back feature releases when the alignment and prioritization meetings did not go smoothly. Similarly, when payments teams had to update their APIs, they had to make sure that all presentation services adopted these changes, which slowed down progress on the payments system.
Last but not least, the technical quality of the complex read flows was not where we wanted it to be. Application-level aggregations worked fine for the average use case, but we had space for improvement when it came to our large hosts and especially for our prohosts, who might have thousands of yearly bookings on our platform. To have confidence in our system over the long term, we needed to find a solution that provided inherently better performance, reliability, and scalability.
Introducing the Payments Unified Data Read Layer
To achieve our ambitious goals for payments, we needed to re-think how clients integrate with our payments platform.
Unified Entry Points
Our first task was to unify the payments data read entry points. To accomplish this, we leveraged Viaduct, Airbnb’s data-oriented service mesh, where clients query for the “entity” instead of needing to identify dozens of services and their APIs. This new architecture required our clients to worry only about the requisite data entity rather than having to communicate with individual payments services.
In these entry points, we provided as many filtering options as possible so each API could hide filtering and aggregation complexity from its clients. This also greatly reduced the numbers of APIs we needed to expose.
Unified Higher-Level Data Entities
Having a single entry point is a good start, but it does not resolve all the complexity. In payments, we have 100+ data models, and it requires a decent amount of domain knowledge to understand their responsibilities clearly. If we just expose all of these models from a single entry point, there would still be too much context required for client engineers.
Instead of making our clients deal with this complexity, we opted to hide payments internal details as much as possible by coming up with higher-level domain entities. Through this process, we were able to reduce the core payments data to fewer than ten high level entities, which greatly reduced the amount of exposed payments internal details. These new entities also allowed us to guard clients against changes made in Payments platform. When we internally update the business logic, we keep the entity schema the same without requiring any migrations on the client side. Our principles for the new architecture were the following:
- Simple: Design for non-payments engineers, and use common terminology.
- Extensible: Maintain loose coupling with storage schema, and encapsulate concepts to protect from payments internal changes while allowing quick iterations.
- Rich: Hide away the complexity but not the data. If clients need to fetch data, they should be able to find it in one of the entities.
Materialize Denormalized Data
With unified entry points and entities, we greatly reduced the complexity for client onboardings. However, the “how” of fetching the data, combined with expensive application layer aggregations, was still a big challenge. While it’s important that clients are able to integrate with the payments system smoothly, our valued community should also enjoy the experience on our platform.
The core problem we identified was dependency on many tables and services during client queries. One of the promising solutions was denormalization–essentially, moving these expensive operations from query time to ingestion time. We explored different ways of pre-denormalizing payments data and materializing it reliably with less than 10 seconds replication lag. To our great luck, our friends in the Homes Foundation team were piloting a Read-Optimized Store Framework, which takes an event-driven lambda approach to materializing secondary indices. Using this framework, teams are able to get both near real-time data via database change capture mechanisms and historical data leveraging our daily database dumps stored in Hive. In addition, the maintenance requirements of this framework (e.g., single code for online and offline ingestion, written in Java) were much lower compared to other existing internal solutions..
After combining all of above improvements, our new payments read flow looked like the following:
We provide data in a reliable and performant way via denormalized read-optimized store indices.
Migrate and Elevate: Transaction History
The first test surface for the new unified data read architecture was Transaction History (TH). Hosts on our platform use the Transaction History page to view their past and future payouts along with top-level earning metrics (e.g., total paid out amount).
On the technical side, this was one of the most complex payments flows we had. There were many different details required, and the data was coming from 10+ payments tables. This had caused issues in the past, including timeouts, slow loading times, downtime due to hard dependencies, and slow iteration speed as a result of complex implementations. While doing the initial technical design for TH migration from Airbnb monolith to SOA, we took the hard path of re-architecting this flow instead of applying band-aids. This helped to ensure long-term success and provide the best possible experience to our host community.
This use case was a great fit for our unified read layer. Using the data used by TH as a starting point, we came up with a new API and high-level entity to serve all data read use cases from similar domains.
After locking down the entity and its schema, we started to denormalize the data. Thanks to the read-optimized store framework, we were able to denormalize all the data from 10+ tables into a couple of Elasticsearch indices. Not only did we greatly reduce the touchpoints of the query, we were also able to paginate and aggregate much more efficiently by leveraging the storage layer instead of doing the same operations on the application layer. After close to two years of work, we migrated 100% of traffic and achieved up to 150x latency improvements, while improving the reliability of the flow from ~96% to 99.9+%.
Unlocking New Experiences: Guest Payment History
Our next use case, called Guest Payment History, came out of Airbnb’s annual company-wide hackathon. This hackathon project aimed to provide a detailed and easy way for our guest community to track their payments and refunds. Similar to Transaction History, this scenario also required information from multiple payments services and databases, including many legacy databases.
Guest Payment History (GPH) also helped to showcase many benefits brought by the unified read layer: a new unified entity to serve GPH and future similar use cases, along with an extensible API which supported many different filters. We denormalized and stored data from legacy and SOA payment tables using the read-optimized store framework into a single Elasticsearch index, which reduced the complexity and cost of queries greatly.
We released this new page to our community with our 2021 Winter launch and achieved a huge reduction on customer support tickets related to questions about guest payments; which resulted in close to $1.5M cost savings for 2021. It also illustrated our move towards a stronger technical foundation with high reliability and low latency.
The architecture is very similar to TH, where data is provided to clients via unified API and schema, backed by a secondary store.
After exposing these new entities via TH and GPH, we started to onboard many other critical use cases to leverage the same flow in order to efficiently serve and surface payments data.
Microservice/SOA architectures greatly help backend teams to independently scale and develop various domains with minimal impact to each other. It’s equally important to make sure the clients of these services and their data will not be subject to additional challenges under this new industry-standard architecture.
In this blog post, we illustrated some potential solutions, such as unified APIs and higher-level entities to hide away the internal service and architectural complexities from the callers. We also recommend leveraging denormalized secondary data stores to perform expensive join and transformation operations during ingestion time to ensure client queries can stay simple and performant. As we demonstrated with multiple initiatives, complex domains such as payments can significantly benefit from these approaches.
If this type of work interests you, take a look at the following related positions:
We had many people at Airbnb contributing to this big re-architecture, but countless thanks to Mini Atwal, Yong Rhyu, Musaab At-Taras, Michel Weksler, Linmin Yang, Linglong Zhu, Yixiao Peng, Bo Shi, Huayan Sun, Wentao Qi, Adam Wang, Erika Stott, Will Koh, Ethan Schaffer, Khurram Khan, David Monti, Colleen Graneto, Lukasz Mrowka, Bernardo Alvarez, Blazej Adamczyk, Dawid Czech, Marcin Radecki, Tomasz Laskarzewski, Jessica Tai, Krish Chainani, Victor Chen, Will Moss, Zheng Liu, Eva Feng, Justin Dragos, Ran Liu, Yanwei Bai, Shannon Pawloski, Jerroid Marks, Yi He, Hang Yuan, Xuemei Bao, Wenguo Liu, Serena Li, Theresa Johnson, Yanbo Bai, Ruize Lu, Dechuan Xu, Sam Tang, Chiao-Yu Tuan, Xiaochen He, Gautam Prajapati, Yash Gulani, Abdul Shakir, Uphar Goyal, Fanchen Kong, Claire Thompson, Pavel Lahutski, Patrick Connors, Ben Bowler, Gabriel Siqueira, Jing Hao, Manish Singhal, Sushu Zhang, Jingyi Ni, Yi Lang Mok, Abhinav Saini, and Ajmal Pullambi. We couldn’t have accomplished this without your invaluable contributions.
All product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement.