Himeji: A Scalable Centralized System for Authorization at Airbnb
Over the last couple of years, Airbnb engineering moved from a monolithic Ruby on Rails architecture to a service oriented architecture. In our Rails architecture, we had an API per resource to access the underlying data. These APIs had authorization checks to protect sensitive data. As there was a single way to access a resource’s data, managing these checks was easy. In the transition to SOA, we moved to a layered architecture where there are data services that wrap databases and presentation services hydrating from multiple data services. The initial approach to moving the permission checks from monolith to SOA was to move these checks to presentation services. However this led to several problems:
- Duplicate and difficult to manage authorization checks: Often multiple presentation services that provided access to the same underlying data had duplicate code for authorization checks. In some cases these checks became out of sync and difficult to manage.
- Fan out to multiple services: Most of these authorization checks required calling into other services. This was slow, the load was difficult to maintain, and it impacted overall performance and reliability.
To tackle these issues, we made two changes:
- We moved the authorization checks to data services, instead of performing authorization checks only in presentation services. This helped us alleviate duplicate and inconsistent check issues.
- We created Himeji, a centralized authorization system based on Zanzibar, which is called from the data layer. It stores permissions data and performs the checks as a central source of truth. Instead of fanning out at read time, we write all permissions data when resources are mutated. We fan out on the writes instead of the reads, given our read-heavy workload.
Himeji exposes a check API for data services to perform authorization checks. The API signature is as follows:
// Can the principal do relation on entity?
boolean check(entity, relation, principal)
A permissions check will look like the following, which states “can user 123 write to listing 10’s description?”:
check(entity: “LISTING : 10 : DESCRIPTION”,
This is interpreted by Himeji as the statement “is user 123 in the set of users that can write to listing 10’s description?”.
Similar to Zanzibar, the basic unit of storage for Himeji is a tuple in the form
entity # relation @ principal.
- An entity is the triple
(entity type : entity id : entity part); this comes from a natural language approach:
LISTING : 10 : DESCRIPTION→ “listing 10’s description”.
- An entity id is the corresponding id in the data source of truth.
- An entity type defines the data permissions apply to. Examples:
- An entity part is an optional component. Examples:
- A relation describes the relationship, like
WRITEbut can be specific to use cases; some examples include
HOSTfor the host of a reservation and
DENY_VIEWfor denying access to a listing.
- A principal is either an authenticated user identity like
User(123), or another entity like
If we had to write a tuple for each exact permission that is checked, the volume of data and denormalization would grow exponentially. For example, we’d have to write both
LISTING : 10 # WRITE @ User(123) and
LISTING : 10 # READ @ User(123) for the listing owner to be able to both read and write.
Based on the Zanzibar configuration, we use a YAML-based configuration language that allows for the resolution of permissions checks via set algebra, allowing a developer to map a check to a set operation:
Suppose user 123 is the owner of listing 10. Then the database will have the tuple
LISTING : 10 # OWNER @ User(123).
When we request
check(entity: "LISTING : 10", relation: WRITE, userId: 123), Himeji interprets
LISTING # READ as the union of
WRITE, and transitively
LISTING # WRITE as the union of
OWNER. Therefore, it will fetch the following from its database, with any matches belonging to the set of
LISTING # WRITE:
Query LISTING : 10 # WRITE @ User(123) => Empty
Query LISTING : 10 # OWNER @ User(123) => Match User(123)
So for example, user 123 need only have
LISTING : 10 # OWNER @ User(123) to be in the
LISTING : 10 # WRITE set.
We observed that entities at Airbnb frequently grant access to other entities as a result of their existence. For example, a guest of a reservation gains access to a listing’s location, along with other pieces of the listing’s information. We represent this use-case with a tuple where the principal is a reference to an entity, i.e.
LISTING : $id # RESERVATION @ Reference(RESERVATION : $reservationId). This allows us to express the concept that a user in the ‘guest’ set of a reservation that is in the ‘reservation’ set of a listing is in the
LISTING : LOCATION # READ set, minimizing the amount of data that needs to be stored:
- LISTING : $id # RESERVATION @
Reference(RESERVATION : $reservationId # GUEST)
Where this approach differs from Zanzibar is that such a tuple does not contain a relation (i.e.
Reference(RESERVATION:$id # GUEST) ) within the principal. The relation following a referenced entity is static and retrieved from configuration. Taking the listing example and then checking against other use cases, we found that typically a reference will be followed to multiple relations. In our product, there is no variance in the set of relations used between two entity types; a change in the set means a product change and applies across all entity types. If the set of relations between two entity types (i.e.
Reference(RESERVATION:$id # GUEST),
Reference(RESERVATION:$id # COTRAVELLER),
Reference(RESERVATION:$id # BOOKER), … ) has size
M, writing a tuple for each of these leads to
N*M tuples. By pulling the relation into configuration, we reduce the size of the stored data to
At read execution time, suppose the following tuples are stored in the database:
LISTING : 10 # OWNER @ User(123)
LISTING : 10 # RESERVATION @ Reference(RESERVATION : 500)
RESERVATION : 500 # GUEST @ User(456)
Now, if a client sends a request like:
check(LISTING : 10 : LOCATION # READ, User(456))
then based on the configuration, Himeji issues the first DB fetch based on the information from the request and the above config:
Query LISTING : 10 # RESERVATION => Match Reference(RESERVATION:500)
Query LISTING : 10 # OWNER @ User(456) => Empty
Himeji will then issue the 2nd DB fetch, substituting in the id of the reservation found, where a match indicates that the user 456 is in the set of users allowed to read listing 10’s location.
Query RESERVATION : 500 # GUEST @ User(456) => Match User(456)
Architecture & Performance
Himeji is split into three layers:
- The orchestration layer receives requests from clients and is responsible for issuing fetches for data, according to configuration logic, and parses the results. The orchestration layer routes to the caching layer with consistent hashing.
- The caching layer, which is sharded and replicated (one instance per AZ per shard), is responsible for filtering in-memory and deduplicating loads from the database on misses. Each shard is assigned a set of data to own via consistent hashing. We target a ~98% hit rate on the cache.
- The data layer, which consists of logically sharded databases.
The most significant changes we made to Himeji over Zanzibar’s setup are to:
- Separate the request orchestration tier from the caching tier, so that the orchestration tier can be updated more easily without restarting the cache.
- Invalidate the cache shards based on published mutations from the databases.
- Use Amazon Aurora for database storage as part of our cloud journey, which differs from Zanzibar’s usage of Spanner.
We implement the same reliability (hedging, tiered caching) and load shedding features as Zanzibar for availability.
Himeji has been serving checks in production for about a year and its throughput has scaled up from 0 in March 2020 to 850k entities / sec in March 2021, while maintaining its availability and latency targets over the last year:
P50 Latency 1.8 ms
P95 Latency 7 ms
P99 Latency 12 ms
In order to cut down integration time and and drive developer adoption, we built some tools such as:
- Configuration-based backfill: Migrating the existing permission checks into Himeji required us to backfill the permission tuples for existing entities. Instead of each data service owner building their own backfill flow, we built a generic solution based on Apache Airflow and Apache Spark. Service owners have to only provide a small config which indicates how their tuple should be formed from their database exports.
- Automatic code generation: To make onboarding easier, we provided scripts to auto-generate Java and Scala code.
- Thick client: We provided a thick http client with logging, metrics, and migration rollout controls.
- UI tool for debugging and one-off tasks: Investigating one-off permission issues can get tedious and requires checking permission data written in the system, so we built a UI to analyze data and fix permissions issues.
The Himeji authorization system, based on Zanzibar, unifies authorization data and logic for Airbnb. Prior to its introduction, maintaining consistency and performance across disjoint pieces of logic was difficult. Himeji utilizes a simple data model with a flexible logic configuration to centralize all product and data authorization. Himeji expands on Zanzibar’s scalability and performance attributes, and pushes latencies lower through its high hit rate tiered distributed cache. All these together result in Himeji storing tens of billions of relations and serving nearly a million entity authorizations a second while maintaining low latency and high availability.
Himeji was made possible through the contributions of many members of the team within Airbnb. We thank previous and current members of the team — Max Burkhardt, Alex Rosenblatt, Jefferson Lee, Divya Gupta, Clare Liu, Houkun Li, Leelakrishna Nukala, Karen Kim, Gary Leung, Ryan Flood, Tony Tran, and Gurer Kiratli. Additional thanks to our current and previous management that is incredibly supportive of this work — Anish Das Sarma, Vijaya Kaza, Jason Sobel, Bipin Suresh, Marc Blanchou, Raymie Stata, and Aristotle Balogh.
This work, and many exciting things are always happening at Airbnb. If you want to join us, check out our Airbnb Careers page.
“Rails” and “Ruby on Rails” are the registered trademark of David Heinemeier Hansson.
Apache Kafka, Apache Airflow, Apache Spark and Apache are either registered trademarks or trademarks of The Apache Software Foundation in the United States and/or other countries.
AWS and Amazon Aurora are the trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries.
Java is registered trademarks of Oracle and/or its affiliates.
All trademarks are the properties of their respective owners. Any use of these are for identification purposes only and do not imply sponsorship or endorsement.