Secure Credential Management on a Budget: DC/OS with HashiCorp’s Vault — Part 2

Published in

MobileForGood

15 min readMay 2, 2017

This article is a continuation of the Secure Credential Management on a Budget series.

Why the focus on ‘on a budget’? The context of this exploration is its use by my place of work, Praekelt.org — an NGO that delivers mobile solutions to improve the wellbeing of people around the world. To help ensure that the users taking advantage of our services and giving us their personal information don’t risk their digital agency from data breaches, we’re revising the way we work with the secrets guarding our data sources. Working creatively with the budget constraints that can come with the NGO space often means adapting open-source tools instead of paying for out-of-box or managed services.

In Part 1, we looked at setting up a replicated instance of Vault with a Zookeeper backend on DC/OS. In this instalment, we’ll examine and configure two different kinds of Vault-related policy sets: the Vault secret backend policies and the Vault ACL policies.

This tutorial will take you, with a reasonable amount of granularity, through Vault secret backend policies by setting up some resources to make use of Vault, namely PostgreSQL. We’ll then tie everything together by creating some Vault ACL policies that will allow our DC/OS applications to incorporate Vault into their workflows.

Every Vault tutorial ever (Credit for original image to unknown author)

If, by the time you are reading this paragraph, you have no idea what Vault is, how it works, or why you want to use it, make yourself a drink of your choice and read the intro and startup guide to Vault on the official site.

Lastly, these commands are tested on Debian Jessie servers. Although you should be able to adapt most (if not all) of these commands to your distro, consider this fair warning for the instructions that lie ahead. Do not mail me x-rays of your bum if they crash your Gentoo servers.

Let’s get started once more, with feeling:

Section 1: Basic Theory

Secret Backends

The official Vault documentation introduces secret backends as follows:

Secret backends are the components in Vault which store and generate secrets.
Some secret backends, such as “generic”, simply store and read secrets verbatim. Other secret backends, such as “aws”, create dynamic secrets: secrets that are made on demand.

In this tutorial we’ll be making use of the Database secret backend, which supports dynamic secrets. The below diagram shows where secret backends sit in the Vault internal model, and provides a list of secret backends that Vault supports:

Benefits of Vault’s Dynamic Secret Model

It prevents hard-coding static secrets

In many conventional systems, secrets for resources are hard-coded and stored in a location that is accessible to the consumers of those resources. Managing the logistics of securely storing secrets at rest is a problem that is solved by Vault’s core function as a credential management system. However, dynamic secrets allows the creation of secrets to be lazily generated when they are needed, and not when they are anticipated.

2. It helps narrow the spatial attack surface of secrets

The spatial attack surface of a secrets increases proportionally with the number of system actors with knowledge of that secret. This means that there are more subsystems that an attacker could probe for weaknesses in order to obtain that secret. In theory, this is not an issue that is simply resolved with dynamic secrets, as those dynamic secrets may guard access to the same resource — ie. dynamic secrets narrow the attack surface of secrets, not resources. However, combining dynamic secrets with namespaced access (eg. database schemas) and ACLs on the target resource can be used to narrow the attack surface of the namespaced portion of the resource.

The ability to automate provisioning unique secrets for each actor that needs access to the same resource also helps with incident response when secrets are leaked. If each actor uses a unique secret for the same resource, the credentials with which unauthorised access to a resource was made can help to pinpoint the subsystem in which a weakness was found, and that subsystem can be examined for intrinsic or extrinsic security flaws.

3. It supports automated renewal and rotation of secrets

The temporal attack surface of a secret is proportional to the length of time that secret remains valid. This is why revocation of access for actors that no longer require access to a resource is recommended practice. If you’re running a cluster setup, chances are that you’re using it to run ephemeral to semi-permanent tasks. If you’re using unique secrets for each task’s access to a resource for the advantages discussed in point 1, revoking secrets associated with a task as soon as the task is no longer needed should form an integral part of your task’s lifecycle management.

To automate this requirement, Vault has a leasing system in place for every secret and token it controls. Each secret created is associated with its own unique lease ID, as well a Time-to-Live parameter. Consumers of a secret are required to check in with Vault using the lease ID in order to confirm that the credentials are still in use, and the lease is renewed for that secret. The Time-to-Live parameter for a secret typically denotes the maximum period for which a secret is valid without being renewed. With this system in place, tasks that no longer require a secret will cease to renew the lease on that secret, and the secret will be automatically revoked once its Time-to-Live window has elapsed.

Key rolling, ie. the ability to replace a secret with a new secret, is simplified by the Vault API. A secret can be inactivated by requesting Vault to revoke the secret associated with a given lease ID. As lease IDs are constructed to contain information on the Vault path from which the secret was generated (using prefixes), leases may be revoked wholesale on the policy level by supplying Vault with the common prefix of the leases to be revoked.

4. It increases the usefulness of audit logs during incident response

The lease ID associated with a secret will change whenever the lease on a secret is renewed. This means that examining a leaked secret’s access token can help pinpoint the approximate time at which the secret was leaked. Combined with the ability to approximate and remediate the subsystem from which the secret was leaked (discussed in point 2), the added contextual information aids the execution of timely incident response manoeuvres.

Vault ACL Policies vs. Vault Secret Backend Policies

The core Vault program controls two different types of policies. Vault ACL policies map Vault auth backend roles with ACLs which restrict the functionality a given Vault user can access when connected to the Vault UI.

A secret backend policy maps roles on that backend (ie. AWS/Postgres) to privileges that the roles may have on that specific resource, for example read/write access to a specific Postgres schema. As the specifications and workflows for access control vary from resource to resource, each Vault backend carries its own backend policy store, the policy set of which must be specified per secret backend.

See the following image to disambiguate secret backend policies from Vault ACL policies:

Figure 2: Vault ACL policies vs. Vault secret backend policies

Section 2: Tying It All Together

Setting Up Your PostgreSQL Instance

The first step is to set up your PostgreSQL instance for dynamic secret compatibility. Here, you will create a permanent PostgreSQL user that Vault will use to create these dynamic secrets. We’ll assume you have a database with one schema in it, public.

There’s a couple of additional steps you’ll need to take here, as PostgreSQL only allows the owners of database objects to drop or modify them. This can pose a problem when database objects are created by dynamically-generated users, which eventually expire and leave those objects orphaned. Additionally, for successful revocation of a role, all the privileges it was granted on other database objects like functions, sequences, and tables must also be revoked. This issue and some possible resolutions are described in this GitHub thread. For this tutorial, we will be using the workaround described by jdelic, where we create a role which our Vault user and dynamically-generated users can assume when creating, modifying, and revoking permissions and objects.

Log into your PostgreSQL instance as a superuser and create your vault role. Replace [password] below with your chosen password:

postgres=# CREATE ROLE vault WITH LOGIN ENCRYPTED PASSWORD '[password]' CREATEROLE;
postgres=# GRANT ALL PRIVILEGES ON DATABASE [database name] TO vault WITH GRANT OPTION;

Substitute [database name] with the database you wish to incorporate into your Vault workflow.

Next, you need to create the PostgreSQL role which owns the database and all its objects:

postgres=# CREATE ROLE db_owner WITH ENCRYPTED PASSWORD '[password]';
postgres=# ALTER DATABASE [database_name] OWNER TO db_owner;
postgres=# ALTER SCHEMA public OWNER TO db_owner;

Finally, grant the vault user membership to the db_owner role:

postgres=# GRANT db_owner TO vault;

Configuring the PostgreSQL Database Secret Backend

Now that we’ve set up our PostgreSQL database, we can move on to configuring the secret backend for Vault.

By default, Vault only mounts the generic secret backend. If you want to enable any of the other secret backends shown in Figure 1, you’ll have to mount it explicitly.

Ensure that your Vault server session is authenticated with the root token, or root-token-equivalent privileges.

As of version ≥ 0.7.1, Vault has deprecated individual database secret backends in favour of a generic Database secret backend that allows custom database types via an extendable plugin framework.

You can mount the Database backend on your instance by issuing the following command to any Vault node:

$ vault mount database

Once you’ve mounted the Database backend, the PostgreSQL plugin is available to use by default.

Now, we tell Vault about our vault Postgres role, so it can use it to generate dynamic Postgres secrets. We’ll call this postgres1:

$ vault write database/config/postgres1 \
plugin_name=postgresql-database-plugin \
allowed_roles="psql-readwrite-public" \
connection_url="postgresql://vault:[password]@[postgres hostname]:[postgres port name]/[database name]"

The key parameters for this command are bolded for readability:

plugin_name is the applicable Database plugin for this connection. In this case, we’re using Vault’s built-in PostgreSQL plugin, which is called postgres-database-plugin.

allowed_roles is the Database roles that are allowed to use this connection. Here, we apply some forethought and decide that we will create a role called psql-readwrite-public in the next section, which will use this connection.

connection_url is the full PostgreSQL connection url for your target database, including the username and password of your vault user.

Postgres Secret Backend Policies

So you’ve configured Vault with the ability to generate dynamic credentials with your choice of lease window. But you still need to specify what permissions these dynamically generated credentials/roles will be granted on the target Postgres schemas.

To do this, we’ll set up that psql-readwrite-public role we promised earlier. We want this policy to let us read and write on the public schema of our database. We provide the role name in the Vault path we write to:

$ vault write database/roles/psql-readwrite-public \
db_name=postgres1 \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN ENCRYPTED     PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE \"db_owner\" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION; GRANT USAGE ON SCHEMA public TO \"{{name}}\"; GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO \"{{name}}\"; GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO \"{{name}}\";" \
revocation_statements="REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM \"{{name}}\"; REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public from \"{{name}}\"; REVOKE ALL PRIVILEGES ON SCHEMA public FROM \"{{name}}\"; DROP ROLE IF EXISTS \"{{name}}\";" \
default_ttl="30s"

The parameters of interest in the above command:

db_name is the name of the database connection config to use in conjunction with this role. We use the connection we just created, postgres1.

creation_statements is the PostgreSQL command that Vault will execute as the Vault user to create dynamic credentials. Note that we add the dynamically-generated user to the app_owner role group we created earlier to prevent orphaned database objects.

revocation_statements is the PostgreSQL command that Vault executes to revoke a dynamically-generated user. We include the commands to revoke privileges given to this dynamic user so that we are allowed to drop the role without incident.

default_ttl is the default lifetime without renewal for a set of credentials before it would be revoked.

Note that your new policy name is psql-readwrite-public , so you would write to the database/roles/psql-readwrite-public path. In general, if your new policy name is [name], you would write its associated policy details to database/roles/[name] . Vault will create that path for you to write in if it does not exist.

Do not alter any of the values in braces — these will be dynamically populated by Vault. The only alteration you should make is substituting public with your own desired schema name. You’ll notice that this value, as well as any arguments specifying the scope of user’s privileges, cannot be dynamically populated. This is a pain, but such behaviour exists for a good reason — if the schema name for the policy could be dynamically generated, you would not be able to restrict access on a per-schema level, only the Vault ACL level. This means that any system actor with Vault authorisation to read credentials for that policy could simply request credentials to any schema it desires, breaking schema isolation. Put another way, if an attacker obtained the Vault credentials for a single system actor, hard-coding the scope of the backend privileges means that they can only request credentials with access to that specific portion of the database.

Here’s another example to create a PostgreSQL policy. We’ll make this a read-only policy to the public schema:

$ vault write database/roles/psql-readonly-public \
db_name=postgres1 \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN ENCRYPTED PASSWORD '{{password}}' VALID UNTIL '{{expiration}}' IN ROLE \"db_owner\" INHERIT NOCREATEROLE NOCREATEDB NOSUPERUSER NOREPLICATION; GRANT USAGE ON SCHEMA public TO \"{{name}}\"; GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
revocation_statements="REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM \"{{name}}\"; REVOKE ALL PRIVILEGES ON SCHEMA public FROM \"{{name}}\"; DROP ROLE IF EXISTS \"{{name}}\";" \
default_ttl="30s"

Finally, we can ask for dynamically-generated Postgres credentials from your Vault instance. We request dynamic credentials by reading from the database/creds/ path. Append that path with the name of the policy you wish to apply to the generated credentials:

$ vault read database/creds/psql-readwrite-public

The above command will request a credential pair which will let the holder read and write to the public schema on the target database.

Configuring Vault ACL Policies

[For clarity, this section will use ACLs to the description of path permissions in Vault policies, policies to refer to a set of ACLs on Vault grouped under an identifier, actor to refer to a concrete, embodied participant in the Vault workflow, and role to refer to an abstract definition of how a system actor should participate in the Vault workflow.]

When you first initialise your Vault instance, it will supply you with a root token. The root token is associated with the root policy and grants you the highest privileges on the Vault instance. As its name suggests, it allows you to do pretty much anything once Vault is unsealed (except reading the /cubbyhole storage of other tokens).

This is very convenient for initial manual setup and maintenance tasks. However, in practice, we may want several actors in our software infrastructure to assume a role in the Vault workflow. Depending on the role that that actor plays in your infrastructure and Vault workflow, it may only need a narrow subset of full Vault privileges in order for it to do its job. According to the Principle of Least Privilege, any actor in any system should only have the privileges they need in order to do its job — no more.

Vault uses policies to restrict the privileges for authenticated Vault sessions. Each set of valid credentials for Vault’s auth backend has its own set of policies describing its privileges. In this section, we will add some new policies and mix-and-match them to give our system actors credentials with the appropriate privileges on Vault for them to play their roles in the Vault workflow.

For our fake-production cluster scenario, suppose you use your DC/OS setup to launch a single Docker-container-based webapp on the cluster. Your webapp is pretty simple and just needs read/write access a PostgreSQL schema to do its work. You’ve made a PostgreSQL schema for your webapp to enjoy, called public.

On this system, you have a minimum of 2 DC/OS roles that need to speak with Vault: your web application, and your policy maintainer. We will refer to these roles as DC/OS App and DC/OS Vault Maintainer, respectively. Note that role designations are not supported/enforced explicitly by Vault — it is a convenience construct used in this section of the tutorial.

Your webapp will assume the DC/OS App role to request PostgreSQL and credentials from Vault. Your policy-maintaining module will assume the DC/OS Vault Maintainer role to maintain secret backends and their secret backend policies.

We will use these role descriptions as a basis to compile our new Vault policies.

Configuring and writing Vault policies is pretty straightforward. Vault policies are written in HashiCorp Configuration Language (HCL), a JSON-compatible configuration format. Make a directory to store your Vault ACLs:

$ sudo mkdir /etc/vault/policies

Vault has two policies ready-made and waiting for you: root and default. Here’s what the default policy looks like:

# Allow tokens to look up their own properties
path "auth/token/lookup-self" {
    capabilities = ["read"]
}# Allow tokens to renew themselves
path "auth/token/renew-self" {
    capabilities = ["update"]
}# Allow tokens to revoke themselves
path "auth/token/revoke-self" {
    capabilities = ["update"]
}# Allow a token to look up its own capabilities on a path
path "sys/capabilities-self" {
    capabilities = ["update"]
}# Allow a token to renew a lease via lease_id in the request body
path "sys/renew" {
    capabilities = ["update"]
}# Allow a token to manage its own cubbyhole
path "cubbyhole/*" {
    capabilities = ["create", "read", "update", "delete", "list"]
}# Allow a token to wrap arbitrary values in a response-wrapping token
path "sys/wrapping/wrap" {
    capabilities = ["update"]
}# Allow a token to look up the creation time and TTL of a given
# response-wrapping token
path "sys/wrapping/lookup" {
    capabilities = ["update"]
}# Allow a token to unwrap a response-wrapping token. This is a convenience to
# avoid client token swapping since this is also part of the response wrapping
# policy.
path "sys/wrapping/unwrap" {
    capabilities = ["update"]
}

You’ll see that the default policy will allow a role with this policy enabled to perform some basic meta-operations on its own Vault state. It’ll also allow the role to renew leases. As mentioned earlier, Vault allows several policies to be applied to a given role, which allows you to prevent redundancy when writing policies. We will apply the default policy to both DC/OS App and DC/OS Vault Maintainer roles when we generate the tokens later. For each role, its effective privilege list is the union of its applicable policies. Any paths not specified in the policies associated with a role are denied to the role by default.

Next, we’ll specify the rest of the DC/OS App role’s policy. There isn’t much to it, really. Our app just needs to be able to read PostgreSQL credentials for the public schema, with the psql-readwrite-public policy applied:

#Enables requesting credentials with the read-write-webapp Postgres policy applied
path "database/creds/psql-readwrite-public" {
  capabilities = ["read", "list"]
}

Save this as dcos-app.hcl in /etc/vault/policies.

Now, we’ll look at the rest of DC/OS Vault Maintainer’s policy specs. Because it needs to do maintenance on secret backends and secret backend policies, it will need a few more privileges:

#Allows updating (but not reading) the Postgres connection details
path "database/config/postgres1" {
  capabilities = ["create", "update", "delete"]
}#Allows CRUDL operations on all Postgres backend policies
path "database/roles/psql-*" {
  capabilities = ["create", "read", "update", "delete", "list"]
}

Save this as dcos-vault-maintainer.hcl in /etc/vault/policies.

Now that you have some policy specs for your two Vault-on-DC/OS roles, load these custom policies into Vault:

$ vault policy-write dcos-app [path/to/dcos-app.hcl]
$ vault policy-write dcos-vault-maintainer [path/to/dcos-vault-maintainer.hcl]

You can now test creating tokens for these roles. Create a token for your DC/OS App role with the following command:

$ vault token-create -orphan -policy=default -policy=dcos-app

We use the -orphan flag in our token creation command, to ensure that the token is the root token of its own tree. If not, the new token will be a child token of the Vault token which was used to generate the new token. The lifetime of a child token is tied to that of its parent, meaning that revocation of a parent tokens will also revoke all its child tokens. This may not be desirable in this case, as you may want to revoke your root token in the future for security purposes. Note that the -policy flag is specified multiple times in the above command, once each for each policy we wish to apply to the role.

Generate a token for the DC/OS Vault Maintainer role like so:

$ vault token-create -orphan -policy=default -policy=dcos-vault-maintainer

Congratulations! Your DC/OS webapp, and whichever system component you wish to handle the maintenance of your PostgreSQL backend, can now use the appropriate token to manage and request database credentials from Vault.

End of Part 2

By the end of this tutorial, you should have a Vault configured to generate dynamic PostgreSQL credentials for a schema of your choosing. You should also have some new Vault policies set up for your two DC/OS system roles: your webapp and your Vault maintenance module.

Stay tuned for Part 3: ‘Till Death Do Us Part, where I share some of my thoughts and observations from attempting to solve the Secure Introduction Problem for Docker-on-DC/OS infrastructure…without buying anything.