Secret Management Architectures: Finding the balance between security and complexity

Managing secrets — sensitive strings such as passwords and API keys — has traditionally not been in the spotlight during discussions about improving software delivery and reliability. Teams usually either store the secrets as plain text within the source code or separately in configuration files. A limited level of security can be achieved if access to these files is restricted, such as placing them in a private repository for example. However, companies should ask themselves:

  • What happens when an employee leaves, can we easily rotate the secrets?
  • If there is a security breach, how long would it take us to limit exposure?
  • If an external party has access to a secret, for how long is that a threat?
  • Can we audit who accessed a particular secret last month?
  • Can we have different access levels for secret groups? (i.e. dev vs prod)
  • Can different cloud platforms and apps access the secrets easily?

The answers to these questions will depend on the particular company and project, but can serve as a starting point for defining a secret management architecture.

This post depicts technologies that can be combined to offer a solution that makes sense to the project — balancing acceptable levels of security and complexity. Certain tools presented here, such as Hashicorp Vault, are very versatile and can be labeled as too complex. However, the main point to be made is that a feature should only be used if the cost/benefit of implementing it is reasonable for the project. The following diagram exemplifies this balance: the more robust and secure a solution is, traditionally, the more complexity it will add to the development workflow.


Defining Goals

As you begin your journey towards developing a secret architecture, you should start by having an honest conversation with your team to assess:

  • Team skills
  • External threats
  • Internal threats
  • Auditing
  • Multiple datacenter support
  • Platform flexibility requirements

For each of these, the team should discuss current status, concerns, priority and expected timelines. Having this information will help you define basic features of the architecture to be implemented. Instead of attempting to introduce a complex secret management solution right away, I suggest starting by letting the team get comfortable with the technology. Over time, gradually evolve the complexity and sophistication of the implementation in sequential steps.


3 Levels of Complexity

To help facilitate the discussion and visualize how a secret management architecture evolve as the team gets better acquainted with the technologies, we can define three levels of complexity:

Where:
Limited Access: secrets are stored in a repository/server with limited access to the public. For example, a private git repository.
Encrypted Secrets: before being stored in the repository, the secrets are encrypted. 
Management: an application that allows high level control of the secrets — which entities can access, audit logs, etc.

These three levels can be seen from an evolutionary standpoint, where the architecture grows from more simple to more complex as needed. In addition, they can also be used to measure the current status of a project and to set goals as to where to go next.

Sometimes the concern is raised about “where does it end?” — if you encrypt the secrets before storing, where do you keep the encryption key? Isn’t that another secret, aren’t you just adding another abstraction layer?

The question of whether we are adding another abstraction layer will be addressed at the end of this article, in the example architectures, but the short answer is “not necessarily”. As an organization, you will have to reach a consensus on how to store this encryption key. You may choose to employ a trusted entity like AWS, or rely on more complex mechanisms such as Hashicorp Vault’s cubbyhole tokens, as described here.

In the end, it’s important to emphasize that the goal of a secrets management architecture should be to achieve acceptable levels of security for the project — at some point the complexity of adding additional layers of security outweighs the benefits of completely eliminating any security risks.


Technology

In this section we will cover technologies that can be used to compose a secrets management architecture. They don’t all need to be used at the same time, and not all functionalities provided need to be used.

Git-crypt
Git-crypt is an open source project for encrypting files in a git repository. 
Relying on git hooks, files listed in a .gitattributes file are encrypted before the repository is pushed to remote. To decrypt, you would clone the repository and issue a git-crypt command while specifying the path to the key that was used to encrypt. Here is a sequence diagram describing this workflow:

git-crypt sequence diagram

A very practical feature of git-crypt is the ability to specify more than one key for a repository. This means that prod and dev secrets can be stored in the same place, but only users with access to the correct keys will be able to decrypt the secrets.

Example .gitattributes file:

#.gitattributes
secrets/dev/* filter=git-crypt diff=git-crypt
secrets/prod/* filter=git-crypt-prod diff=git-crypt-prod

Here we have the default git-crypt key encrypting the secrets found in the folder “secrets/dev”, while the git-crypt-prod key is used to encrypt the secrets in “Secrets/prod”.

A side effect of local files being decrypted while remote files are encrypted can be git conflicts that are tricky to resolve. For example, let’s say we have developer A and developer B who both decide to clone the same git repository to their local machine and begin working on their own branches. Developer A completes their task first and pushes their code back to the repository, while Developer B completes their task a little later on and also pushes their code. If Developer A and Developer B were working on the same secrets, a conflict will be flagged when Developer B pushes their changes. Trying to merge the Developer B branch with the remote branch will generate the expected conflict warnings, however while usual git conflicts appear clearly on the code, due to the fact that the remote branch is encrypted, conflicts in files managed by git-crypt won’t receive the “HEAD>>>>” flag.

To address this:

  • Have a separate repository containing only secrets, no application code, to minimize number of contributors.
  • When resolving a conflict, pull remote and do a local diff to support the merge.

AWS Key Management Service (KMS)
Encryption key management, hosted by AWS.
Instead of storing secrets, the AWS Key Management Service can be used to encrypt an encryption key. Why would someone want to do that? Consider this: once you have an encrypted version of the original encryption key, it can’t be used to decrypt any of your secrets. Therefore it’s safe to store the encrypted version alongside the other secrets. You are then free to discard (or store in safe backup) the original encryption key, and trust AWS to be available to decrypt the now encrypted encryption key.

Here is the sequence diagram illustrating the workflow:

KMS sequence diagram

One of the caveats of this solution is that you become dependent on AWS’s service availability. However, there will always be some external dependency regardless of solution, be it a datacenter across the street or a server in the IT room. This debate goes back to the level of trust in a cloud solution, which will not be covered in this post.

The clear advantage of this solution is the low complexity and costs involved, potentially serving for as the foundation of a simple secrets management strategy.

Chef Vault
Chef is a provisioning tool that allows one to describe what an instance should look like — which users should be registered, which applications installed, which ports open, etc. These descriptions are stored in “recipes”, which are contained by “cookbooks”.

Chef Vault is a complement to that, allowing Chef Client Nodes to retrieve secrets from Chef Server.

Secrets can often be required to help complete a number of tasks during the provisioning process. For example, you may need passwords to install a database server or create new system users. In these situations, the instance being provisioned (Chef Client Node) will follow this workflow:

Chef-vault sequence diagram

When the operator pushes secrets to the Chef Server using the knife tool, they specify a query in the parameters that lists which nodes will have access to a set of secrets. Additional information about the query format can be found here. As an example, you can specify that the “dev” secrets can only be accessed by nodes registered in the “dev” Chef environment. These nodes won’t have access to subsequent “prod” secrets that might be registered.

Other provisioning solutions such as Ansible offer similar setups, with the clear drawback that access to the secrets will be limited to the instances being provisioned by these technologies. However, this could be an acceptable compromise depending on the use case of a particular project.

A particular limitation with Chef Vault is that the access list to the secrets is static — once the secrets are uploaded with an associated query, only nodes matching the query at that particular time will be added to the access list. Any new nodes added afterwards wont have access to these secrets. In order to solve this limitation the query must be executed again every time a new node is registered with Chef Server. This is not done automatically by the Chef Server and must be executed manually.

Hashicorp Vault
Allows secret storage with management capabilities.
Hashicorp Vault is a robust secret management tool whose features extend well beyond the scope of this post. In summary, it serves as a secret repository with access control lists, auditing and TTL (time to live) access to the secrets. Vault supports a variety of authentication mechanisms and secret storage backends, allowing highly available deployments.

Here’s a workflow highlighting the storage and retrieval of secrets from Hashicorp Vault:

Hashicorp Vault sequence diagram

The clear drawback of using Hashicorp Vault is the additional complexity it introduces. Even with a simple setup and limited features, it still requires a team member to be responsible for maintaining and managing the access list and the Vault server itself.

However, the benefits can be tremendous in projects with higher security requirements. Vault follows Hashicorp’s pattern of elegant solutions, separating different authentication and secret storage backends into independent units. It allows as fine grained control to the secrets as needed, with policies and TTL settings that can be defined per secret or per group of secrets in a folder. Additionally, the enterprise version comes with tech support and allows replication across multiple datacenters. It is self-hosted and open source, limiting dependency to external vendors and allowing auditors to evaluate the code as needed to attest the security of the solution.


Putting it All Together
The technologies described above do not need to be used in isolation — in fact they become even more powerful when combined. Here are two example architectures combining a few of these solutions:

Example 1: git-crypt + Chef Server

The above architecture utilizing git-crypt and Chef Vault encryption was implemented in a client project. Prior to the engagement, this particular client’s secret management solution had level 1 complexity: secrets were kept in plain text within the code and stored in a private git repository. The client needed a better solution quickly, but were constrained in terms of resources available to invest in security, and the skill level of the team.

Examining the above diagram, different contributors would begin by generating secrets that were stored in a separate private git repository using git-crypt. Jenkins, a CI solution, was responsible for fetching and decrypting the secrets, in addition to uploading to Chef Server using Chef Vault.

In order to solve Chef Vault’s static access list limitation, Jenkins also had a task that ran every 5 minutes to refresh the access list.

This solution was well received: while it didn’t completely address secret rotation or management of the git-crypt keys, it suited the client’s requirements and will serve as the foundation for a more complex architecture as priorities are reassessed.

Example 2: git-crypt + Chef +Hashicorp Vault

As a second example, the above secret management architecture utilizing git-crypt, Chef and Hashicorp Vault was developed as part of a recent client solution. This particular client needed a solution to manage secrets used by Chef Server provisioning that would also provide ACLs, greater levels of security and reliability, and access to applications beyond just Chef provisioning.

Examining the diagram, the two steps required to access secrets stored in Hashicorp Vault are shown. First, an instance needed to authenticate with the Vault Server, which would return an access token with the permissions associated with that particular user/role. This deployment was hosted in AWS, so we used AWS EC2’s user-data to download from S3 a custom application that authenticated against Vault, which returned an access token that the application stored in a specific local folder. This was done before the Chef provisioning started.

With the access token stored, chef-client was called, and provisioning would start. Hashicorp Vault has connectors to different languages and solutions: in this case we used a Ruby gem to communicate with Vault, sending the access token and retrieving the secrets.

In an ideal world, all instances are disposable when using a cloud hosted solution. In such a scenario, the expiration of an access token’s TTL wouldn’t matter since secrets would only need to be accessed during initial provisioning. However, the client wanted to have the concept of “patching”, where an instance could be reprovisioned with updated Chef recipes, without having to be recreated. To address the issue of Vault token TTL expiration, we created a regularly scheduled CRON job that would run the custom authentication app, and fetch a new access token before the existing one expired.

These two example architectures demonstrate how the different technologies were combined, with compromises made, in order to fulfill each project’s requirements. In both cases, the end result was an increased security level and the option to expand to more complex setups as needed.


References

Having a Secret Management Architecture is just one of the steps in the quest for increasing security in a project. An excellent reference guide is AWS’s Security Whitepaper Best Practices — despite being written by AWS, a good portion of the content is platform agnostic and has a great description on how to structure a ISMS (Information Security Management System).

https://aws.amazon.com/blogs/security/new-whitepaper-aws-cloud-security-best-practices/