One Step Forward in Data Protection

By Lifeng Sang

Encrypting sensitive data is an important security measure to help protect our users’ data that presents unique technical challenges around key storage and management. We built an encryption service that we call Cipher to address those technical challenges and enable engineers at Airbnb to encrypt data easily and consistently across our infrastructure. Our goal with Cipher was to provide an easy to use, language agnostic interface to any service that has encryption requirements.

Cipher abstracts away all of the complexities that come with encryption, like algorithms, key bootstrapping, key distribution and rotation, access control, monitoring, etc. Service owners can focus on their service and rely on Cipher to take care of encryption. Cipher is only responsible for computation, but not storage of ciphertext. The encryption keys used in Cipher never leave the service, which is fundamentally different from many other encryption solutions (e.g. using native encryption libraries directly), where the key materials often coexist with the sensitive data.

There are a number of technical decisions that come with designing a secure encryption service. In the course of building Cipher, we have made a couple of important design choices that may benefit others who are looking to build something similar. We’ll cover each of these in detail below:

  • Segmentation between computation and storage
  • Authorization models
  • Encryption key storage
  • Monitoring and auditing access to the encrypted data

1. Cipher owns computation

There are several benefits to isolating computation from storage. (1) Simplicity: The Cipher service can be optimized to provide high availability and low latency without an additional hop to the storage layer for reading and writing encrypted data. As a result, additional complexity and maintenance requirements aren’t drawn into the Cipher service. Cipher is designed to do one thing and do it really well; (2) Lower Security Risk: Since Cipher doesn’t own storage, obtaining and decrypting a large set of data becomes more difficult. The attacker would have to compromise both the service that owns the data and the Cipher service in order to decrypt the data; (3) Flexibility: Leaving the storage decisions to the teams who own the data ensures that data treatment can be individualized. The encrypted data, along with other business data, is owned by different services. Some business data has different retention policies, some have strong requirements on transactional integrity, etc.

2. Cipher builds the authorization model over the latest TLS 1.2

Cipher only grants access to authorized clients, and therefore it needs to authenticate each client for each request. Cipher leverages mutual TLS using x509 certificates because (1) It provides transport security so the data is encrypted over the wire; (2) It allows us to identify who the caller is via the client certificate; (3) TLS is widely adopted in industry and supported by many HTTP clients, which makes integration, testing and maintenance easier; (4) TLS performs reasonably well in terms of reliability and latency.

We built a customized authorization model in Cipher over TLS to enforce which client is authorized to perform which action on which data. All the data being encrypted are classified as resources, e.g. “RESOURCE_FOO”, “RESOURCE_BAR” etc. Cipher applies different encryption keys for different resources, and each key is rotated periodically. When a client wants to encrypt a new type of data, it pre-registers a new resource in Cipher with an appropriate access policy. The resource name is provided along with the data in both encryption and decryption API calls, so Cipher knows which keys to use, and whether the client is authorized to perform the requested action. The authorization model follows the “no more, no less” principle, i.e. the access is granted only by specification. For example, you can configure a resource to only be encrypted by client ‘foo’, but decryption can only occur on client ‘bar’, using the following code snippets in Cipher:

{
//…
add(RESOURCE_FOO).allow(ENCRYPT).by(CLIENT_FOO);
add(RESOURCE_FOO).allow(DECRYPT).by(CLIENT_BAR);
//…
}

3. The master keys are encrypted using AWS KMS

Airbnb runs most services on Amazon Web Services (AWS), so we were well positioned to leverage Amazon’s Key Management Service (KMS) instead of using a customized Hardware Security Module (HSM). As this document details, AWS KMS uses Hardware Security Module (HSMs) to protect the security of the keys. On provisioning a new resource, Cipher creates a random secret, which is versioned and rotated periodically in an automatic manner. The secrets are used to encrypt and decrypt the client resources, and they never leave Cipher. The secrets are encrypted by a versioned master key and stored in a database for persistence. Instead of relying on some security hardware to protect the master keys, we encrypt the master keys using the AWS KMS in multiple regions for availability and disaster recovery. The AWS KMS makes our system much simpler and easier for operation and maintenance.

4. Cipher provides centralized monitoring, alerting and auditing

All the encryption and decryption actions go through Cipher. As a result, it’s easy to lock down to minimize the risk of exposure. Since Cipher handles all encryption and decryption requests, we can easily audit the usage of keys and log who/what/where/when/why for each resource accessed.

Architecture Overview

The architecture of Cipher is pretty straightforward, as illustrated below. Cipher bootstraps the resource encryption keys by reading the encrypted copies from a dedicated database. It then decrypts those keys using AWS’ KMS and stores them in volatile memory.

As you can see above, Cipher receives traffic from different clients over mutual TLS, extracts the identity from the client certificate for each individual call, checks the authorization model and proceeds only if the client is permitted to perform the requested action.

As previously mentioned, Cipher hides all the complexities in terms of computation, access control, bootstrapping, algorithm/key evolvement and secret rotation by providing simplified client libraries for languages that are supported in Airbnb.

Conclusion

Efficiently encrypting data presents unique technical challenges. But investing in encryption can help defend against an attacker who is trying to exfiltrate large amounts of user data. Cipher attempts to make hardened encryption universally available to all engineering teams at Airbnb, so that we can more easily protect our sensitive data.


Check out all of our open source projects over at airbnb.io and follow us on Twitter: @AirbnbEng + @AirbnbData