An approach to Secret Management Architecture

Published in

Coccoc Engineering Blog

7 min readDec 12, 2019

Problems

Managing secrets / sensitive entities is quite a troublesome yet challenging task. There’re tremendous number of developers out there are still storing plaintext passwords inside collaborating platform such as gitlab, or keeping credentials inside Docker images which is accessible by anyone. At Coccoc, we’re using self-hosted Gitlab and Docker Harbor — locally opened — which helps mitigate the leak of secrets to the outside, it is, however, not a real solution to address this kind of problem.

Because the temptation to put secrets directly to git, or hardcode credentials in the source code is so huge, we procrastinated to restrict this practice at the beginning in the fast-paced development environment. Over time, some adopted tech choices such as Ansible and Kubernetes actually support a kind of secret management for their solutions, respectively called Ansible-vault or Kubernetes-secret, which help managing secrets easier. Story has not ended, another problem arose, the secrets are controlled in many places, even duplicated, and there’s no unified way for services to retrieve secrets. And how about the other services which are not managed by Ansible and Kubernetes?

When it looks too amateur to just leave it like that, we decided to make a Secret Management approach, where all the problems can be solved thoroughly and beautifully.

Design Goals

In the beginning stage, a set of goals is defined to guide architecture and development. Let’s review a few of these principles:

No secret on public resources: the ultimate goal.
Ease of use: the solution should be easily integrated with current development workflow, including Kubernetes ecosystem and traditional services running on host.
Centralized, simple yet full-featured: the solution must be centralized in controlling secrets, but needs to be highly available, scalable. Moreover, it should support necessary features like secret retrieval / revocation, authentication / authorization…

Implementation

Tech choice: Hashicorp Vault — a simple secret management service, which supports key-value storage, token-based, as well as Kubernetes authentication… For the sake of simplicity, we dont introduce Vault, and just list some of its features that’re being used here.

Start Simple

In the beginning, we want to make a Vault service that simply works for our purpose: a central secret management place!

Token is used for authentication and authorization in Vault. In this implementation, token is stored as a file in host filesystem, with proper permission. Services which need to retrieve secret from Vault can read the token file.

Vault itself needs a storage backend for storing its data. Despite it supports many kinds of backends such as databases like MySQL or config storages like Zookeeper / Etcd, we decided to keep it in filesystem for the ease of management, and to avoid the situation of deadlock— for example Etcd needs secret from Vault to form a cluster, while Vault also needs to retrieve data from Etcd. For that matters, all the Vault’s datafiles in the file storage are encrypted.

The workflow for secret retrieval looks like this:

Sample secret retrieval procedure:

python2 sample secret retrieval

# python vault.py admin/passwd/secret root-passwd
abcxyz

By putting this Vault class into universal Python library location in the system, the secret retrieval procedure can be integrated easily into current running services / scripts…

For this simple design, we solve the problem with traditional services, which can simply read the token file. But how about the containerized services? With this current approach, it needs either to mount the token into the container, or embeded the token file — a kind of secret — into container image. Moreover, for the services to be deployed on many physical nodes, token file needs to be exist on hosts as well, making it present on many places.

Into the Kubernetes world

As stated above, Kubernetes services go with different attributes than the traditional counterpart, and using token-based approach doesnt seem to fit very well. Vault comes with another authentication method especially for Kubernetes, which will be described below.

The workflow can be explained simply like this (though the complete setup is not very straightforward):

Pod uses the service-account-token (which is mounted by default into every pod) to ask for a Vault token which can be used to authenticate with Vault.

Mounts:
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-x27q4 (ro)

This token is used for authenticate with Vault kubernetes-auth module.
Vault then check with K8s API server, if the setup between Vault and K8s is appropriate, and the K8s account token is valid, Vault then sends back a temporary Vault token (which is similar to token-based authentication described earlier — but has a short TTL) for the pod to retrieve the secret.

In production, a special container is used to avoid all the manual work: vault-env. Simply it does all the steps above to get, and then return the secret to an ENVironment variable, which then can be injected into the main container.

The secret retrieval workflow for Kubernetes services:

As can be seen, the secret can be injected into main service in 2 ways:

File-based: Vault-env after receiving secret writes it to an emptyDir which is shared between containers inside the pod. The main container then read the file and get the secret.
ENVvar-based: There’s not any easy way to pass an ENV from a container to another container inside a pod. We need to perform this trick: put vault-env binary before the real command of the main service, so ENV from vault-env can be passed to the main container.

For example, a SSL certificate and key are stored inside Vault, we need to get them and put into nginx container. The design will be like this:

Vault-env will be the init container, which retrieves the secrets and put into a shared emptyDir. Nginx then mounts the emptyDir to appropriate location.
SSL_KEY and SSL_CERT are ENV vars which will be transformed from Vault path into real secrets and then be written into files.

Sample Kubernetes deployment with Vault-env

In the end, we’ve implemented the solution that works nicely with both traditional and modern Kubernetes services. Vault acts as a central Secret Management service, all the passwords / certificates / private keys… can be stored in one place and retrieved by the services which have valid tokens.

By Kubernetes, For Kubernetes

Vault gradually becomes more important to our infrastructure. Virtually all microservices running in the system require one ore more kinds of secrets, which make them undeployable when Vault is down.

High Availability

To cope with high availability issue, we decided to deploy it into Kubernetes system, using CephFS as Vault datastore backend (although Vault has support for HA by itself when using clustered storage backends like Zookeeper/Etcd or Databases…).

The architecture is simply an expansion of the simple design in the begining. When deploying Vault in Kubernetes cluster, without a network filesystem solution, the pod can only be deployed into host containing Vault data, and that ruins the advantage of Kubernetes — container orchestration.

The tech choice for the network filesystem solution is CephFS — which has been using in our infrastructure for a long time, while it’s stable, simple to setup, and the most importantly, supported by Kubernetes. With the help of CephFS, Vault can be scaled to more than one pod, orchestrated by Kubernetes, and they’re all using the same datastore. Additionally, to make the two instances synchronized between each other the cache need to be disabled by disable_cache — so when one instance updates the data, the other will re-read the storage and get the update.

On top of the Vault pods is the Kubernetes Service, which is a service discovery, automatically add or remove pods’ IP when they become available / unavailable. Vault is exposed to the world by this Service.

Disclaimer: This HA setup is not recommended by Hashicorp Vault, as File Storage engine (even backed by shared CephFS) is not supported officially by Vault.

Auto Unseal

One more feature that Vault supports is Sealing. Everytime the service started, it becomes sealed and can only be unsealed by manual work (or automatically using a very complex procedure with AWS — we dont use and cover it here). This feature is unfortunately not very suitable for High Availability solution that we designed above, because when the Vault pod restart, sysadmin’s action needs to be taken to unseal the service.

Basically to unseal Vault, 3 out of 5 tokens need to be entered. With the help of Kubernetes, after Vault pod is completely up, we can start the unseal procedure automatically with postStart lifecycle:

Final Thoughts

There’re many other Secret Management solutions out there, each comes with pros and cons. The best thing about the design is by using Kubernetes and a simple service Hashicorp Vault, we can set up a quite complete architecture, which satisfies the pre-defined requirements. There’s still works ahead with Secret, but the current solution helps us to solve the problem of exposing credentials, making the infrastucture more secure, and looks a lot more professional!