Setting up Vault at Doctrine

Ben Riou
Inside Doctrine
Published in
12 min readMay 4, 2023

Vault is the name of a Hashicorp product you’ve probably heard about. A trivial description to present it could be a secret manager that stores secrets securely in a centralized place.

However, it’s way more than this. At Doctrine, we describe it as a comprehensive toolbox to manage any access control of our infrastructure. Vault can achieve many things; however, without advanced configuration, Vault will not achieve anything. Although Vault comes with a web interface, it’s, in the first place, an API. Whatever the UI, the Vault CLI, or any Vault SDK is used, this is always the solicited Vault API. The Vault Web UI offers incomplete options, and you must often configure it directly via the API.

In this blog article, we’ll touch on every aspect of our Vault implementation at Doctrine to give you a clear overview of a few examples of what’s possible to do.

One product, two types of engines

To fully understand Vault, you might want to separate it into two pieces :

  • The Authentication engines, used to authenticate towards Vault and get a Vault Token, linked to an ACL policy,
  • The Secret Engines used to interact with your systems

This difference is also very noticeable within the Vault UI itself.

Once you’ve correctly registered to Vault, a token is issued, allowing you to interact with the API. This token is linked to a role and a policy.

Where do we come from?

Before Vault, we were using multiple secret sources to manage our secrets. Having several systems presents numerous challenges we want to get rid of.

First, we had to implement several mechanisms to interact with each system, and access management (as well as auditing) needs to be duplicated on each system.

Here is a short list of drawbacks for each system :

  • Kube Opaque : Secrets are not encrypted (base64 encoded only)
  • CircleCI : Secrets leaked in January 2023, and No access control is possible on the secrets
  • AWS Systems/Secrets manager : Require AWS SDK integration and doesn’t operate on other cloud providers
  • Terraform Output-Based secrets : Secrets not dynamically updated on projects
  • KMS-encrypted secrets : Secret cannot be versioned (you have to manage the versions by yourself)

What if… we could find a centralized solution to all these issues ?

Vault is the solution we retained because of its open-source code, self-manageable, and wildly adoption.

Multiple secret managements service were in use before Vault

What are we doing with Vault at Doctrine ?

Almost everything, except the coffee (for now). Any machine (CI systems, Kubernetes Pods, Airflow operators, …) or humans interacts with Vault to retrieve credentials or authentication mechanisms.

  • To retrieve Key/Value combinations
  • To access to databases
  • To authenticate to remove systems via SSH

Each usage matches a different Secret Engine, with its proper way of operating and configuration. This is something we’re going to review in the upcoming part of the article.

Authentication to Vault

Before using the Secret Engines, the first step is to retrieve a Vault Token via an Authentication Engine. Several Authentication engine exists in Vault.

A password ? What password ?

We never use static credentials to access vault : authentication is made through our Enterprise Directory (Google Workspace) and OIDC. In the case where the OIDC integration would break, we keep a break-glass access (static root token), stored in a physical vault.

The OIDC Authentication with Google

Have you read our article on our OIDC migration ?

Configuring the OIDC authentication is pretty straightforward, a Google Workspace OIDC is configured via the API or CLI. You cannot configure attached the Authentication Engine roles via the Vault Web UI. The developer can authenticate with Google via the Vault User Interface or the CLI

OIDC authentication via the Vault CLI

Once the authentication sequence is complete, Vault assigns the role requested based on the user’s group membership.

Vault screen for OIDC Authentication succeeded

Usage of secrets engines

Once authenticated with Vault, a Vault Token has been assigned to you. It allows you to interact with the Secret Engines, based on a role and a linked policy

Roles definition for humans and machines

Vault Roles are configured with CRUD verbs.

We started with two simple roles for humans : admin (engineers in charge of managing the Vault Cluster) and dev (engineers in charge of accessing the secrets).

For machines, additional roles were added along with other secret engines (i.e. CircleCI has it’s own role, each application running on a pod has a limited visiblilty).

A role assigns the policies. They’re also named Vault ACL in Hashicorp literature. and the policy targets paths.

Choosing the right naming convention for your Secret Engines is critical when setting up a Vault Cluster, because all the authorization schemes will be based on these paths.

Naming convention for our Secret Engines

As often in IT projects, choosing the proper naming convention was, by war, the most challenging topic on our Vault migration. We need to find something clear to understand, granular enough for a split per environment and per project, but also that can comply with the Vault ACL Engine limitations.

Some general rules to begin with. First, secret engines are provisioned via Terraform (infrastructure as code).

The naming for secret engines should be explicit enough to distinguish :

  • The linked environment (not the same secrets between production and development) ;
  • The name of the project (or application)
  • The name of the sub-system in the application (for large scale applications)

Because a sub-system of an application can require several secret engines at once, we added :

  • The engine type

And to detect any path invocation in our codebase, we’ve decided to always prefix our KV engine paths with /secret.

After several attempts, we ended up with two naming conventions : one for KV engines, and the second one for SSH/DB engines.

Access-Control List (ACL) for Secret engines

A user (human or programmatic) role is linked to one or more policies. Each Policy has one dedicated ACL.

The ACL syntax is based on CRUD, with some specific additions, and the authorization verbs are highly tied to the secret engine naming convention.

Now you understand why the naming convention is so critical - because your authorization scheme will depend on it.

Vault Backends

Vault runs on disposable Kubernetes pods, with a master election at cluster boot time. By default, no data is stored on the Kubernetes pods. To persist the data (configuration, accesses), you’ll require to define a storage backend for Vault.

Internal RAFT storage Backend

You can choose the recommended RAFT Internal Storage backend (a RAFT server is provided with the Vault engine on local port 8201). It implements an HA, distributed storage across pods.

The advantage is also a drawback : The encrypted Vault data is stored on the same host where the Vault server process runs. Simple to manage and lose your data in case of disaster on your Kubernetes cluster.

Automating snapshots to external object storage (S3, Azure, GCS) is possible, but this is a paid feature from Vault Enterprise.

AWS DynamoDB storage backend

Hence we decided to go for an external storage backend. Consul is the privileged external storage backend by Hashicorp. However, we did not want to maintain a Consul cluster for this sole purpose. Instead, we’ve chosen the AWS DynamoDB solution for several reasons :

  • reliable and proven technology
  • it’s serverless and pay-as-you-go
  • It provides PITR restore capabilities in a 35 days timespan, with a second granularity.

Setting up the AWS DynamoDB storage Backend is straightforward, with only three lines on configuration.

Vault PITR restoration hiccups

The Point-In-Time-Recovery (PITR) allows retrieving the DynamoDB table content at any time within 35 days. This can be used to restore a previous Vault cluster state, for example, to recover a secret engine accidentally deleted.

When performing the PITR restoration, a time-stamp is asked (at which point would you like to restore ?), the content is restored on another dynamoDB table, ready to serve.

The last step is to instantiate a new Vault cluster pointing to the restored DynamoDB table, then you can access and retrieve any value via the UI or API.

Vault Secret Engines

We’ll walk through our secrets engine implementation and our main takeaways.

Key-Value Version 2 (KVV2) engines

KVV2 is the most simple secret engine possible in Vault. Sets of secret key/values are grouped into “Secrets”. These “Secrets” are managed by a Key-Value engine. The Key-Value engine Version 2 has some benefits, like storing versions of the secret.

At Doctrine, we decided to manage the KVV2 engines provisioning via Terraform (infrastructure as code), but not their content : the key/values are populated directly within the Vault interface by accredited engineers.

Managing secrets contents (key/values) via Terraform is not recommended for several reasons :

  • Secret values are stored in clear text on the Terraform TFState file
  • It’s required to manually retrieve the existing secret content, patch it and submit the new value via a variable to re-upload it to the Vault Secret Engine (otherwise, any existing secret content would be deleted).

DB Engines

Any access to our databases is controlled via Vault. The user first authenticates against vault and then assumes a role and a dedicated policy. Then the user requests a set of database credentials to Vault via the DB engine.

Vault knows the database TCP endpoint and has a static set of crendentials, allowing to create new users via a Creation Statement.

Once a request is handled from Vault, a temporary, disposable database user is created following the Creation Statement (a templated-SQL request) then the credentials are returned to the User.

Once the Database User lease is revoked or expires (TTL is exceeded), a revocation statement (a templated-SQL request) is executed to delete the transitory user.

Here are a few challenges we’ve faced while setting up DB Engines :

  1. The naming of temporary user created by Vault is very long. This can be adjusted via username templating.
  2. By default, Vault does not revoke users’ permissions on other tables than public ; consequently, database transitory users cannot be deleted cause some permissions are still linked to it. Revocation statement should be configured accordingly.
-- Revokation
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM "{{name}}";
DROP OWNED BY "{{name}}";
DROP ROLE "{{name}}";

SSH Engines

The SSH engines allow to connect via SSH to a remote host without having its private key. The Vault Server acts as Certificate Authority and signs a disposable public key generated by the user.

Example of public-private key pair generation :

ssh-keygen -q -o -a 256 -t ed25519 -N "" -C "transitory SSH keypair" -f test_ben2

Example of Vault SSH signature via the Vault CLI

vault write ssh-client-signer/sign/my-role \\
public_key=@$HOME/.ssh/id_rsa.pub"

Key Value
--- -----
serial_number c73f26d2340276aa
signed_key ssh-rsa-cert-v01@openssh.com AAAAHHNzaC1...

Once the user has the Vault-Signed Public Key returned by Vault, it can connect to the SSH host with three identity files : the disposable public and private key, and the Vault-Signed Public Key (linked to the disposable public key).

It’s possible to retrieve the Vault-Signed SSH Certificate via SSH-KEYGEN. The SSH Certificate has a validity period, one or more principals (username used for SSH connection), and a list of Extensions (allowed options usage for SSH connection).

╰─ ssh-keygen -L -f signed_key2.pub                                                                                                            ─╯
signed_key2.pub:
Type: ssh-ed25519-cert-v01@openssh.com user certificate
Public key: ED25519-CERT SHA256:eTANO9j4m767kwYUGgjT61UrqLYtiGd+qYXZw3oZdaA
Signing CA: RSA SHA256:5PlJ6Gc+gwirKZ3QrqXrncUQ5A+//iW1Gi40ivYET8Y (using rsa-sha2-256)
Key ID: "vault-oidc-116446655950636464998-79300d3bd8f89bbebb9306141a08d3eb552ba8b62d88677ea985d9c37a1975a0"
Serial: 14304305573396308659
Valid: from 2023-04-05T17:01:29 to 2023-04-05T18:01:59
Principals:
circleci
Critical Options: (none)
Extensions:
permit-port-forwarding
permit-pty

During the connection operation, the remote SSH host receives the disposal public/private key pair, and the SSH Certificate. The remote SSH daemon running on the host has the Vault Certification Authority declared on it’s configuration, with the TrustedUserCAKeys argument.

# /etc/ssh/sshd_config
# ...
TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem

It’s also possible to connect to a remote host via the Vault CLI, using a single-line command

vault ssh -role=${assumed_role} -mode=ca -mount-point=${secret_engine_path} username@ssh_host

Link to the opened issue

Kubernetes Pod credential injection at boot time

A last — but not least — vault use case is to inject Vault Secrets directly into Kubernetes Containers. As we want the application to be secret management system agnostic, the secrets are automatically mounted into the container file system, or via environment variables.

The configuration is done via Kubernetes Annotations and a Mutating WebHook Admission Controller that will parse any API calls with CREATE or UPDATE HTTP verbs targeting PODS (except the vault-agent-injector itself), and send the pod metadata to the vault-injector-agent-service (HTTPS endpoint /mutate).

If the pod metadata contains specific annotations, the vault-injector-agent will edit the pod to add a vault-init container and a vault-agent sidecar. They’ll retrieve the required secrets (from a KV engine, or a DB engine) and inject them on a file or on some environment variables.

Once the pod mutation is completed, additional metadata is added by the vault-injector-agent named : agent-inject-status: injected.

Kubernetes Pod credentials renewal at run time

Each secret is linked to an initial lease with a time to live expiration (TTL). The first lease is retrieved from the Authentication engine, then most of the secret engines provide short-lived credentials with an additional lease.

All the leases retrieved by someone are linked to the original Authentication Engine lease. They can expire individually or expire globally if the Authentication Engine lease expires.

These expiring leases are especially important for database secret engines, as the provided credentials are planned to be dropped from the database once the expiration threshold is reached.

Vault has a lease renewal mechanism allowing the running application, via their vault-agent sidecar container, to declare that the application is still running and should maintain its access to the database, via the disposable credentials.

The vault-agent sidecar implementation is automatic via the vault-agent-injector mutating webhook admission controller. It has its embedded configuration (base64 encoded).

Conclusion

We have reviewed the multiple usages we’ve implemented at Doctrine for Vault. We hope that you enjoyed the panorama. Database disposable credentials, SSL certificates for SSH, or Key Value storage, this is one of the few capabilities provided by Vault to manage any secrets of your infrastructure.

Benefits

One of the main benefits for this Vault Migration is the centralisation and standardisation for secrets storage, retrieval and audit. One unique source of secret is easier to manage when it comes to secret rotations.

Additionally, we gained on application secret isolation. Before Vault, several applications could use the same secret, without strong relationship. As a consequence, changing a value could have significant side effects. This is not the case anymore : each application has it’s own secret.

Upcoming Challenges

Application secret isolation came with secrets duplication. This is a compromise we’ve made during the transition. Now we are targeting to have unique secrets across all our infrastructure.

Custom tooling for secret management. Vault tooling does not only allow large scale operations on key/value secret, like scanning for a value, or changing a wide range at once. We’ve decided to develop our custom vault client to ease our daily operations with Secret Management.

--

--