Field Level Encryption in Azure CosmosDb Documents

Indranath Bardhan
Walmart Global Tech Blog
5 min readAug 29, 2020

In today’s world customer’s data security and privacy is of utmost importance. This becomes additionally important when you start adopting cloud.

Often, you have adhere to regulatory compliances of specific product domains as well. For example, if you are into the domain of e-commerce and handle credit card information of the users, you must be PCI DSS compliant. If you are into the healthcare domain, you probably have to be HIPAA compliant depending on the kind of healthcare data you deal with

Restricting your cloud PaaS (Platform as a Service) components via firewall settings, may be the most basic security that can be applied. However that does not completely protect your data from other internal teams like Cloud Administrators, DBAs etc.

Encryption to your rescue — With encryption the full control of data is now owned by the specific business owners and technology teams without revealing the decrypted data to unwarranted and unauthorised parties.

Use-case

As part of cloud migration, we had requirements to store JSON documents in Cosmos where some of the fields in the document are highly sensitive.

The encryption options on our table were below.

  1. Store the entire document as an encrypted blob
  • Pros : This is easier to implement
  • Cons : No searchability on fields (reduces the database to be a key-value store), Higher cost of encryption and decryption, Loss of basic debuggability i.e., even basic inspection of document now requires decryption

2. Encrypt only highly sensitive fields

  • Pros : Allows search on indexed fields, lower cost of encryption and decryption, enables basic debuggability.
  • Cons : Relatively complex to implement, still need to solve for search on encrypted fields.

Due to advantages of the “Option 2” above we chose to go ahead with that. In addition encryption has two standard forms

  • Randomized : Use of a random CEK (Content encryption key), so the same text will have a different encrypted output. This is considered to be more secure, but you lose the ability to search on the field
  • Deterministic — Use of standard CEK, so that same text will have the same encrypted output.

As mentioned above, we had search requirements on few of these fields, so we had to settle down with Deterministic encryption.

Challenges

As CosmosDb does not provide any such out-of-box functionality where specific fields can be encrypted, we had to come up with a solution of our own.

We also had to solve the way in a seamless way so that all the complexity of encryption and decryption is absorbed by the query/DAO layer, and the client does not need to specifically get involved in the nitty-gritties of the same e.g., encrypting the text and passing into the Cosmos query

As per our Information security requirements we also had to rotate the encryption key every so often, if using the Deterministic mode.

Encryption Basics

Before we delve into the details of how we solved it, below are few basics of the encryption approach.

  • Encryption is done by a standard key called CEK (Content Encryption Key)
  • The CEK does not change in the lifetime of a system, because there is a substantial overhead of re-encrypting all historical data if we need to change the CEK.
  • In order to ensure sanity/security of your CEK, it is stored in an encrypted manner in your configuration systems, encrypted via another key called CMK (Content Master Key).
  • The CMK can be rotated, and at any point of time there should be at least two versions of encrypted CEKs (one encrypted with the older CMK and another encrypted with the new CMK), that your application (DAO layer in this case) can understand, so as to support a seamless rotation of the CMK.
  • The CMK is mostly backed/provided by an HSM.

Summarizing the flow:

CMK-CEK

Application issues a parameterised query. The library (Cosmos client library) has the metadata for the columns which are encrypted.

1. Library obtains encryption algorithm from the column metadata, and thereafter encrypted CEK and location of CMK

2. Library then contacts the KeyStore and retrieves CMK to decrypt CEK. Cache decrypted CEK to reduce roundtrips (if allowed)

3. Library then encrypts the parameters in the query using the decrypted CEK, and sends the query to the server.

4. If READ query, the encrypted columns returned in result are decrypted by the library using the same CEK.

Solution

Usually Spring data repositories do a very good job in abstracting Crud methods and we use spring-data-cosmosdb for our cosmosdb interactions. One of the approach was to add the encryption functionality as a wrapper on top of the spring data layer.

  1. Define encryption metadata

We came up with an annotation that can be defined on the fields of Entity objects which contain the metadata of encryption.

  • EncryptionType & EncryptionAlgorithm — are self-explanatory, they denote the typed of encryption and the algorithm used for encryption.
  • ceKid — This is the key identifier for the CEK in the configuration system i.e., indicating the key against which the encrypted CEK is stored. As different fields may have different CEK, the metadata indicates the specific key-identifier used for this field.
  • cacheable — its a field which indicates whether you can cache the unencrypted key or decrypt it everytime it needs to be used (its a security call that you need to take as memory dumps can expose your CEK)

2. Store the CEKs

We chose to use a separate Cosmos collection to store the encrypted CEKs. However the CEKs need to store the metadata of the master key as well alongside. So we came up with following POJO objects for CEK and CMK.

  • keyPathValue — The CMK key identifier in the HSM layer.
  • providerName — The providerName is an implementation of the HSM access layer. As it can vary between teams/organisations, we chose to keep it abstract which implements the following interface.

The client-specific implementations of KeyStoreProvider can be registered with a KeyStoreManager on application startup, so that they become available to the crypto utility class.

3. Extend the DocumentDbPersistentProperty to store the CryptoMetadata

The DocumentDbPersistentProperty class in spring-data-cosmosdb stores the field metadata e.g., whether its an id or partitionKey field. We extended it to store the crypto metadata from the annotation defined in Step 1.

4. Extend the MappingDocumentDbConverter

The MappingDocumentDbConverter is the EntityConverter class which owns the responsibility of converting the POJO to Cosmos Document objects or vice-versa. So, the functionality of encrypting (write) and decrypting (read) fields were subsumed by extending the converter.

Conclusion

Encryption comes with some downsides which we all should be aware of.

  1. Convert all queries to bind queries — Earlier we used to provide prebuilt Strings as queries because all Cosmos queries are transformed into POST requests and it did not give us much advantage binding parameters unlike query caching in JDBC. This approach did not help us in identifying the field metadata i.e., if it is an encrypted field, the data should be encrypted and passed to the server in the query parameter.
  2. No Range Based queries supported in encrypted columns — As obvious, encrypting columns breaks the order/sort semantics of the text/number. So range based queries cannot be supported on these columns.

Field level encryption ensures nice balance between protecting customer data on cloud while enabling ease of basic debugging.

Acknowledgements

Co-Authored by Rakesh Pandit

With critical inputs from Srinivas Devarakonda and Krishna Kanth Annamraju

--

--