OMNI is an intuitive, homegrown platform that supports message creation, processing, and distribution to engage our guests and hosts at the right time and on the right channels.
Airbnb is a busy marketplace with millions of active users across the globe. Billions of messages are exchanged each year — both between Airbnb and our users, and between our host and guest communities. To facilitate the scaling of these messages, we invested in building a reliable, scalable, and cost-effective communication platform.
The Promotions and Communications Platform (internally called “OMNI”) forms a core pillar of Marketing Technology at Airbnb. Marketing Technology aims to provide a state of the art platform and measurement tools to enable marketing and product teams to engage with our customers effectively. This platform is the bedrock for all promotional, marketing, and transactional communications at Airbnb. We serve nearly 50 billion communications every year to a growing community of guests and hosts, while bringing in millions of bookings through promotional campaigns.
In addition to supplying the systems and APIs that make these communications possible, we provide a self-service tool for creation and management of promotional campaigns. This tool enables product and marketing teams to deliver content to the Airbnb community and measure its efficacy. Our systems serve multiple product and business teams and use cases.
Over the course of the past few years, we have developed and battle-tested this technology platform to effectively meet our promotion and communication needs. In this blog post, we will introduce the business problems, discuss our overall platform architecture, and share some key learnings through the development process.
OMNI is the platform that we use to create, manage, and distribute content and messages to our community through multiple channels. The name OMNI refers to the ability of the platform to deliver omni-channel content. A few examples of content and messages supplied by OMNI are listed in Figure 1.
OMNI delivers two types of content:
- Transactional Content: Transactional content is usually automated, real-time messages that are triggered based on a customer’s action with the company’s website or application. Examples of transactional messages include order confirmation notifications, order status emails, password reset emails, and email receipts.
- Promotional Content: Promotional content involves sending a message (e.g., an email or push notification) solely to communicate a special offer or a product catalog item. It can also include messages mandated by legal or regulatory policies, such as Terms of Service updates. A campaign or promotion usually requires a set of declarative configurations, which specify the content and delivery time period.
With this understanding, let’s dive a little deeper into the various components of our platform. Figure 2 is a modular diagram of platform features and dependencies. OMNI is built on top of the shared infrastructure at Airbnb, which leverages AWS as the primary cloud provider. It consists of two major subsystems: the promotion creation and management tool, called OMNI UI, and a set of backend services.
OMNI UI is a web application built to provide full life-cycle support for content creation and distribution. It consists of multiple components that handle campaign management, content building, approval processes, translation, and analytics. Campaign data is created, updated, and deleted in a backend database through the Campaign Service, which is also responsible for managing versions, providing pre-made templates, and enforcing access control. Content created through the Campaign Service can also be enqueued for the Translation Service. After a campaign has been launched, the tool allows for visualizing and understanding campaign performance via a suite of analytics dashboards. This makes it easy for non-technical teams to gain actionable insights all within the same tool.
To best understand the services, let’s take a look at how someone would create a new promotion or communication. In order to create content, a campaign creator needs to define the “what”, “who”, “when”, and “where”. This is illustrated by an example email sent to an Airbnb guest, seen in Figure 3.
Each function in the figure above is powered by one of the following backend services:
- Audience Service: This service defines the “who”, supporting functionality such as getting all eligible users in a specific channel or getting users based on a particular rule.
- Workflow Service: This service orchestrates the “when” by listening to delivery request events, evaluating delivery conditions, and eventually passing appropriate events to the Delivery Service.
- Optimization Service: This service performs message personalization via content ranking, user propensity-based personalization, and send time optimizations.
- Presentation Service: This service handles the “what” and “where” by enabling request validation, translation, and content generation.
- Rendering Service: This service provides rendering for content specific to different channels (e.g., emails or push notifications).
- Delivery Service: This service delivers the finalized message to Airbnb users through various vendor services (e.g., SendGrid, Twilio, FCM, etc.)
One of our key design principles is to build with omni-channel content delivery in mind. Table 1 illustrates how we serve some of the common use cases across the Airbnb product ecosystem. In the subsequent sections, we will dive deeper into a few of the services.
Audience Builder, seen in Figure 4, is a rich user interface that sits within the OMNI UI and is powered by the Audience Service. Our internal customers can create and target different user audiences by filtering on user attributes. Currently, we support 100+ such user attributes (both static and machine learning-derived), enabling our marketing teams to create audience segments. It is also possible to upload a predefined set of users (either from Airbnb’s data warehouse or CSV files) for customized audiences. The size of user audience can often reach up to tens of millions for certain policy or promotional messages.
The Audience Service, the backend service supporting Audience Builder, has the high-level architecture shown in Figure 5.
ElasticSearch is used to serve queries aimed at getting target users — for example, a common use case is to find all users who are eligible for a nearby promotion. Audience Service also contains an internally built key-value storage of additional object information, such as user information based on a user ID. All user attributes are ingested into online storage through either batch pipelines from offline storage systems (e.g., Hive tables) or real-time streaming jobs from other online systems. Audience rules defined in Audience Builder are converted to queries against ElasticSearch and key-value storage to obtain information about target user audience.
The Workflow Service, which orchestrates the “when”, is a queue-based delivery job processing system (see Figure 6).
A cron job runs every hour in the Workflow Service and processes scheduling-based campaigns. This job includes the following steps:
- Fetch all active campaigns from campaign storage
- Get reach estimation for each campaign from Audience Service
- Split the delivery quota among campaigns based on reach estimation size
- Fetch user IDs for active campaigns from Audience Service
- Loop through each user to check user states and decide top-ranked campaigns
- Post events to the Delivery Service to send messages at scheduled times
An event-driven job tries to match an incoming real-time event (e.g., a new booking) to identify the corresponding user, then performs the following procedure:
- Evaluate and update the state of the user in storage
- Call Audience Service to fetch all eligible campaigns for the user
- For eligible campaigns, determine the top-ranked campaign
- Update the job and user state storage with the latest information
- Post an event to the Delivery Service to send a scheduled message to the user
The Delivery Service sends messages reliably and efficiently to Airbnb’s users through offline channels (e.g., email, SMS, push notification). It is on the critical path for crucial use cases at Airbnb such as messaging, login, signups and customer support. As shown in Figure 7, the Delivery Service takes delivery events as input, processes these events, and eventually sends the final message to Airbnb users through appropriate vendors. Event processing leverages Amazon’s Simple Queue Service (SQS) to enqueue jobs to delivery workers for various channels. The typical end-to-end delivery latency is less than 30 seconds through the Delivery Service, and all delivered messages are logged into a data store for analysis and debugging purposes.
The Delivery Service aims to provide a simple interface for internal users. It efficiently and robustly handles delivery-related issues, such as spam reputation, legal compliance, user experience, deliverability, reachability, optimization, and metrics. While the Delivery Service has been quite effective in supporting OMNI, there are still many areas for improvement. The areas that we’re currently focusing on include:
Uptime and Deliverability: Important metrics for our team include uptime and delivery success rate. We are working towards a goal of 99.99% uptime and 90% delivery success rate across all channels at present.
Cost: Each year millions are spent on maintaining the system and paying for vendor APIs, especially for SMS channels. We are investigating some optimization opportunities and channel shift ideas in an effort to bring down this spend.
Reachability: There is limited geographic support for popular channels such as WhatsApp and WeChat. The system needs to be extended further for such channels so that Airbnb can better serve its customers in the future.
There was a lot to learn throughout the journey of building the promotions and communications platform from scratch. We share some key learnings so you can utilize them in your promotional and marketing adventure.
Include Content Governance Mechanisms: One of the biggest challenges we encountered when building the system was ensuring high quality content and appropriate review processes. Tooling made it easier to create and deliver a lot of content to users, but also increased the challenge of quality control. In hindsight, the approval processes, debug tooling, and systematic guardrails should have always been a mandatory part of a promotions and communications platform from the start.
Design for Unpredictable Traffic Patterns: Promotional content delivery tends to be spiky in nature, which translates to unpredictable traffic patterns on our underlying services. By integrating elastic scaling of resources, rate limiting, and back pressure mechanisms in our design, we ensured that promotional traffic would be isolated from the regular user traffic on our website.
Scaling ElasticSearch Requires Intention: As the platform gained traction, we learned that operating ElasticSearch requires specialized expertise. Our storage strained under the load of an ever-growing set of attributes. Some of these were added and used only once, but were continually processed. To manage this growth, we audited the data we store in ElasticSearch and added an allowlist process to limit indexed fields. We also refined our storage schema to make it sparse, and actively manage realtime and batch updated fields.
We built OMNI as the unified and shared communication platform across Airbnb. There are many benefits to building products and features with OMNI campaign life-cycle management, flexible audience selection, real-time traffic monitoring and analytics, and scalable delivery across multiple channels.
Developing a communication platform requires close collaboration with many cross-functional teams across the company, as well as strong support and commitment from leadership. An architecture based on distributed services and storage is critical to achieving the scalability and extensibility required for such a platform. We plan on authoring additional posts to deep-dive into the relevant technical details and features of major OMNI components (e.g., Optimization Service, Presentation Service, Rendering Service), as well as the challenges encountered through the design and implementation process. Stay tuned for follow-up posts.
If you are passionate about building distributed systems to solve communications-related technical challenges, then apply here to join our team.
This work was only possible through a massive amount of hard work and great support across our entire organization. Special thanks to Arjun Raman, Bita Gorjiara, David Yang, Emre Ozdemir, Ganesh Venkataraman, Haiqing Deng, Irene Kai, Jasmine Price, Laurie Jin, Mengting Li, Michael Endelman, Michael Kinoti, Min Li, Mukund Narasimhan, Nnenna John, Priyank Singhal, Sharvari Apte, Vidhur Vohra, Xin Tu, and Zhentao Sun.