The Guide to MongoDB Field Level Encryption

How to better protect your and your customers’ data

Stefan Pfaffel
Oct 9 · 7 min read
smiling woman sitting on floor holding up hand-drawn picture of a big lightbulb, with a background of hand-drawn symbols
smiling woman sitting on floor holding up hand-drawn picture of a big lightbulb, with a background of hand-drawn symbols
Photo by Andrea Piacquadio from Pexels

As always, while it’s fun to set up a new project, and most technologies nowadays are easy to integrate. But once you start thinking about operation and data security, things tend to become trickier. MongoDB provides three encryption options, and two of them are only available with an enterprise license.

  1. Encryption at rest is available from version 3.2 but only for enterprise customers.
  2. Automatic field-level encryption is only available on MongoDB 4.2 Enterprise and MongoDB Atlas 4.2.
  3. Manual field-level encryption is available on MongoDB 4.2 Community Edition, the free version.

Only paying licensees are eligible for using automatic MongoDB encryption. Revenue-wise, that’s not a bad decision by Mongo and not a massive challenge for us as we can still use the explicit client-side field level encryption (CSFLE). We did some research and found some docs and some guides but not a sufficient explanation and solution.

We built a small MVP and a short test that writes data to a MongoDB instance and reads the same document afterward, and that began to bring all the pieces together.

Our initial test result:

Infrastructure Setup

MongoDB uses a concept called envelope encryption to encrypt and decrypt data. Enveloping requires that data is encrypted using a data key, and the data key itself is encrypted by another key called the master key. This procedure's advantage is that your data key is never stored in plain text. In the MongoDB context, the data key will be stored encrypted in a dedicated collection inside the database. Setup- and operation-wise, this adds some complexity, but it’s all for a better good.

Therefore, we need to create a master key and then use the master key to generate a data key. Per definition, a master key must be exactly 96 bytes long. A single shell command can create this random sequence and store it base64-encoded in a text file.

The creation of the data key is also a one-off task we run after the database's initial setup. The script below does exactly that.

  • Line 1 — Line 8: Create encryption options with a new collection named __keys and database encryption, and the master key.
  • Line 10–13: Connect to the MongoDB instance and pass the encryption options.
  • Line 15: Get a reference to the key vault object.
  • Line 17–20: Create a new data key with names local and www.

The alternative name of the data key, www, we will later use in our application to reference the data key. We suggest running the script not on the database server itself but on a disposable server or in a container:

  • Use a template engine to replace all variables on a build server. If you’re running on Kubernetes, mount the file as a secret volume to prevent it from being written to disk.
  • Populate environment variables, preferably of a container or a short-lived server, pass the variables to the mongo shell, and invoke the script.

Client Application Setup

To enable client-side encryption, we need to install the required C libraries on our server or in our container first. The required C libraries are libbson and libmongocrypt.

We must also add the respective wrapper library as a dependency to our application. The NodeJS wrapper’s npm package is called mongodb-client-encryption; the Java wrapper is called mongodb-crypt and is available on Maven Central.

Having created the encryption keys, we can proceed with our client application setup. We want to store data encrypted and enforce that specific fields cannot be stored unencrypted. This way, we prevent certain information from being stored in plain text and accessible to anyone who has direct access to the database.

JSON Schema is the recommended means of performing schema validation.
docs.mongodb.com/

Requirements like encryption for particular fields can be added to MongoDB collections via schema definitions. A schema describes the structure and characteristics of a MongoDB document and can, therefore, define the following:

  • required and optional properties
  • property names and their type
  • min and max values
  • regular expressions, the values must match
  • a set of predefined values in case of an enumeration

MongoDB recommends the usage of JSON Schema to describe documents. A JSON Schema is a JSON object that outlines requirements that will be used for schema validation.

According to the documentation,

“JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.

- Describes your existing data format(s).
- Provides clear human- and machine- readable documentation.
- Validates data which is useful for:
o Automated testing.
o Ensuring quality of client submitted data.”

Our user object contains a unique random id, a name, and an email address. We want all these properties to be mandatory. Additionally, we enforce that:

  • id, name, and email address are set
  • id is a valid UUID
  • name and email address are stored encrypted
three colored rectangles representing the user object properties of id, name, and email address
three colored rectangles representing the user object properties of id, name, and email address
The user object with id, name, and email property

The JSON file below shows how a JSON Schema object that contains our requirements looks.

  • Line 2– Line 4: Metadata for this document: The title of the document and the type we’re going to define.
  • Line 4— Line 8: The keyword required defines an array of non-optional properties.
  • Line 9— Line 29: The keyword properties defines an object of known properties.
  • Line 10 — Line 14: The property id must be a string and match the given regular expression.
  • Line 15 — Line 21: The property name must be a string encrypted with the deterministic algorithm.
  • Line 22— Line 28: The property email must be a string encrypted with the non-deterministic algorithm.

The deterministic algorithm ensures that the same value always encrypts to the same output. This is necessary to look up encrypted data because it allows us to reconstruct the encrypted value and therefore use it in database queries.

In contrast, the non-deterministic random algorithm ensures that the encryption of equal values results in different outputs. Because the output changes with every encryption, it’s harder to calculate the input value, compared to the deterministic algorithm. Security-wise, that’s a plus. The drawback is that we cannot query data encrypted with the random algorithm. Nevertheless, we can still query documents by other criteria.

Data encrypted with one of these algorithms can, in any case, be decrypted by the application that has access to the master key. So, regarding the algorithm, we mainly have to decide if we want to use the encrypted data as a key for MongoDB queries. If yes, we have to use the deterministic algorithm. If not, we can use the randomized algorithm, which provides better data security.

The JSON schema shown above has to be added to the MongoDB collection to enable schema validation. In our case, we add the schema after the application started and before the first query is executed. As our application is not running in an elastic environment and we do not expect traffic spikes, that is not a performance problem. Applications running in a high-traffic environment with dynamic scaling should update the schema with dedicated applications/containers to improve performance and remove load from the database.

Anyway, here’s a class we use to create, cache, and retrieve connections to MongoDB instances. This class is also responsible for creating collections and enabling JSON Schema validation.

Line 41– Line 56: Create the collection if it cannot be found in the current set of collections.

Line 58 — Line 64: Look up the schema object from the schemas folder and update the validator of the current collection accordingly. Set the validation level to strict so that all existing and new documents are validated against the updated schema.

Inserts of new documents now fail because we still have to add the actual field encryption. The error message shown on the client side is very generic, but in our case, it’s directly related to the updated encryption requirement.

Our test now exits with the following message:

The final task. Let’s update the mongodb-connection class, shown in the snippet above, to handle encryption and decryption transparently. Therefore, we need to pass the master key and the encryption key collection name to the ClientEncryption constructor along with an active MongoClient.

ClientEncryption is a class from the mongodb-client-encryption package. Instances of ClientEncryption have a method enrypt and decrypt that returns a promise and will resolve with encrypted or decrypted data respectively.

The last thing we need to implement is the actual encryption. This is the easiest step so far, so let’s jump straight to the result.

Line 14 — Line 21: Before name and email are stored, they are encrypted. The encrypted model gets a unique UUID after the encryption.

Line 23 — Line 31: To look up a user by name, we encrypt the name first, because the name is stored encrypted in our database.

The final test results:

Summary

Out of the box, MongoDB provides two means of client-side field level encryption (CSFLE): automatic and manual CSFLE. Automatic CSFLE is a handy feature as it automatically encrypts data based on JSON schemas. Unfortunately, it’s only available in the MongoDB Enterprise version. Everyone running MongoDB Community Edition has to use manual CSFLE, which is why we described here how to configure it correctly.

Setting up CSLFE required quite a few changes to the MongoDB and the client application. In the end, these are essential steps to improving the integrity, authenticity, and overall security of our customer’s data.

Thanks for being here and thank you for reading.

Better Programming

Advice for programmers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store