Securing Logstash With AWS KMS

Avihoo Mamka
Onfido Product and Tech
3 min readFeb 19, 2018

At Onfido, we deal with a significant amount of data on a daily basis.

We store our data in the cloud, and while it’s obvious that the data at rest must be secure, the same goes for data in transit. In fact, data in transit is a lot more dangerous since it can go in routes you’ve never imagined or thought of before getting to its destination and one of the routes could definitely be a hacker just waiting for you to be frivolous.

We have built a robust and self-served data pipeline so anyone who wants their data to be stored in a secure way which is easily accessible (if you have the right credentials) can do so. It is also built assuming the person who stores the data is not necessarily a security ninja so all of the security concerns must be taken into account for them.

Our data pipeline architecture roughly looks like this:

Onfido’s data pipeline architecture

To ensure that data in transit is indeed secure, we needed to encrypt the data in client-side before logstash pushes it into S3. This seems like a reasonable, legitimate request and indeed it is. The only concern we had was that all of the existing logstash plugins for encrypting data (at the time of writing) were based on a static cipher key, which, although still hard to crack if using best practices, is not the same as using a dynamic key management service that can take all of the load in managing and storing the keys from us.

Our production systems are deployed in AWS, so it was only natural for us to choose AWS KMS.

This led us to to develop and recently open-source the logstash-filter-cipher_kms plugin. This plugin is a logstash filter plugin, written in Ruby, which can cipher and decipher any data being transferred through logstash using AWS KMS and all of its advantages. That way, as long as we have a well defined mechanism to control our keys creation, deletion and rotation, we can easily use a key alias which will always point to the current key in use.

With open-sourcing kept in mind while developing, we made sure our solution was as flexible as possible while still fitting to our purpose. For example, we allowed different authentication methods, to name a few:

  • Static AWS keys
  • AWS profile
  • Shared credentials files
  • Instance profile (mainly for continuous deployment purposes)
  • ECS credentials

In addition, we also made sure we’re using best practices in terms of security, so we consulted our security team and asked for continuous feedback as to whether we’re using all the high standards approaches. We’ve also allowed the following algorithms to be used:

  • AES 128 with CBC
  • AES 256 with CBC

As an additional precaution, we also added the option to use IV random length to make sure each record is encrypted in a unique way, and on top of that added the option to wrap the encoded record using base64 to make sure it’s easy to transport.

There are more features which we added and can be found on our GitHub repository for this project.

Using the approach described above, we not only ensured our data remains encrypted and secured while in transit, but also kept our data encrypted at rest on top of the server side encryption we’ve applied to our S3 bucket, thus providing two layers of security to our data.

To summarize, data is always a good thing to have, but you should always keep in mind that there’s someone out there just waiting for you to do one simple mistake that unknowingly will allow them to take advantage of that data, and that’s why you constantly need to make some room for security concerns to be applied to your overall solution.

--

--