Dynamic Secrets with Terraform and Vault

Patrick Schulz
HashiCorp Solutions Engineering Blog
6 min readNov 16, 2019

This blog post is about how you can avoid any static secrets inside your infrastructure as code using Terraform and Vault’s dynamic secrets.

Workflow of the dynamic secret generation

One challenge that comes up when provisioning resources into a cloud using Terraform (and any other tool, like Ansible) is that you will need some form of credentials in order to log into your cloud account. You don’t want to store those credentials inside your code, especially if you check your code into a versioning control system (VCS) for obvious reasons.

Even if you would provide Terraform with your credentials in a secure way, like the secure variable store in Terraform Enterprise (or Cloud), and you are using them down the line inside your Terraform configuration, it will most likely be the case that they end up somewhere inside the state file. When using Terraform Enterprise (or Cloud), the state file is already protected through built-in encryption, RBAC and is being stored centrally, versus an individual’s personal computer. But even then some customers are looking for even higher levels of security.

Ideally, the credentials used are short-lived in nature, so that in case someone somehow gets access to the state file, he will not be able to use those to cause harm, as they are most likely expired already.

In order to solve for this, you can leverage Terraform and Vault to provide a secure solution for using short-lived credentials, so-called “dynamic secrets.”

For this to work, Vault first needs to be configured to be able to create dynamic secrets on your behalf. In this example, I’m going to use Azure as the cloud, but you can achieve the same for other cloud providers and other types of systems (e.g. databases, Active Directory, etc.) as well. I will not go into all the details of configuring the Azure Secret Engine in Vault. The details on how to set this up can be found here.

The important part is to make sure that you are using a short TTL for your dynamic secrets. You will need to balance the TTL versus the time it takes to provision your infrastructure, to avoid an expired secret during the deployment phase.

I’ll assume that you’ve completed the Azure Secret Engine configuration and tested it successfully e.g. via CLI at this point, as it will be needed moving forward.

Then you will need to configure the Terraform Vault provider in order to retrieve credentials (in case of Azure service principals) from Vault during every plan and apply phases, to deploy resources within the cloud. Starting with version 2.4.0 of the Terraform Vault provider, you are now also able to use AppRoles instead of tokens for authenticating against Vault. How you configure AppRoles can be found here.

Also I’ve created a very simple Vault policy, which I’ve attached to the AppRole I’m using, that allows the role to issue a read against the Azure Secret Engine and to create new child tokens. Despite the auth method being AppRole, Terraform still use tokens and it will issue itself a new token that is a child of the one given, with a short TTL to limit the exposure of any requested secrets.

path "azure/creds/azure-temp-creds" {capabilities = ["read"]}path "auth/token/create" {capabilities = ["update"]}

Now let’s have a quick look at the Vault provider configuration, which is pretty straightforward, as it only describes the path and credentials used to log in.

provider “vault” {address = var.vault_addrauth_login {  path = “auth/approle/login”    parameters = {      role_id   = var.login_approle_role_id      secret_id = var.login_approle_secret_id    }
}
}

Terraform will not output the secrets used for the Vault authentication into your state file. In Terraform Enterprise (or Cloud), you can easily provide your AppRole role_id and secret_id via variables using the Terraform secure variable store. This essentially means the AppRole secrets are stored securely and will not be exposed during runtime.

Terraform Enterprise Variable Store

Such a configuration would allow you to provide every workspace with their own AppRole and dynamic secrets using a dedicated account (if needed) but at least a custom role that fits the need of the infrastructure provisioned inside the workspace.

With every plan and apply, Terraform will login into Vault using the given AppRole and use the “vault_generic_secret” data source to generate a fresh set of dynamic secrets on the fly.

data “vault_generic_secret” “azure” {  path = “azure/creds/azure-temp-creds”}

The last step is the actual cloud provider definition. Let’s now look at the Terraform provider definition for Azure and how the provider gets its secrets from the Vault provider.

One important aspect is that you may need to implement some form of dependence, which helps to work around the eventual consistency of the clouds IAM system. Usually it takes a while for the secrets to become active across all cloud endpoints. If you do not implement the dependency, then chances are high that the plan and apply will fail as the cloud credential will not be available yet.

The script below acts as the actual timer, which gets called from an external data source, which passes along the argument subscription_id and the desired wait period.

delay-vault-azure.sh (store alongside the tf config flies and make it executable via chmod +x).

#!/usr/bin/env bashsubscription_id=$1sleep $2echo “{ \”subscription_id\”: \”$subscription_id\” }”data “external” “subscription_id” {  program = [“./delay-vault-azure.sh”, var.subscription_id, “120”]}

Inside the actual provider definition, we retrieve the Azure tenant_id from regular variables, and the the subscription_id from the external data source to create the dependency, in order to make the provider wait for the secrets to become available. And the dynamically generated client_id and client_secret get sourced from the vault_generic_secret data source.

provider “azurerm” {  tenant_id = var.tenant_id  subscription_id = “${data.external.subscription_id.result[“subscription_id”]}”  client_id = “${data.vault_generic_secret.azure.data[“client_id”]}”  client_secret = “${data.vault_generic_secret.azure.data[“client_secret”]}”}

Now let’s have a quick look into the state file after the Terraform apply has completed.

"resources": [{  "mode": "data",  "type": "vault_generic_secret",  "name": "azure",  "provider": "provider.vault",  "instances": [  {    "schema_version": 0,      "attributes": {        "data": {          "client_id": "a61699ab-6ecd-405c-b815-e38ba34fae0b",          "client_secret": "b54648a5-77c4-a5ef-d48d-6f3aa6a2b38f"
},
"data_json": "{\"client_id\":\"a61699ab-6ecd-405c-b832-e38ba34fae0b\",\"client_secret\":\"b54648a5-77c4-a5ef-d57d-6f3aa6a2b38f\"}", "id": "3c4c3f23-34df-79c2-2653-9d5f3af2f0f4", "lease_duration": 1200, "lease_id": "azure/creds/azure-temp-creds/UvhySVzPDxtTFpeSgbTR6IxO",

You can see that the dynamically generated secrets are only good for 1200 seconds and that Vault maintains a lease with the ID “UvhySVzPDxtTFpeSgbTR6IxO”. There are no traces of the AppRole’s role_id or secret_id.

If we check the leases inside Vault, we will find it there. In case of a breach, you can easily revoke the active lease and Vault will delete the secrets inside your Azure subscription even before the TTL has expired.

Before wrapping up this post, I would like to mention that the same thing can be achieved for other cloud platforms like GCP or AWS as well. The process of setting up the GCP and AWS secret engines in Vault is fairly similar to what I’ve done with Azure, while the usage may differ slightly from platform to platform. Like with GCP, where the actual roleset configuration creates a new service account and the usage (vault read) is only issuing new access tokens or service account keys (depending on the type used) for the account and will not create a new account/service principal, like in the Azure example. When it comes to the Terraform side of things, AWS even has a native data source for this use case, the “vault_aws_secret_backend_role”, so that you don’t need to rely on the “vault_generic_secret” data source, which also addresses the timer issue mentioned above.

Summary

The takeaway being, there will no longer be the need for any static secrets inside your code. Those secrets that will be generated on-demand, are short-lived in nature, and their potential risk gets eliminated automatically after the TTL has expired. Also, the secrets used to authenticate against Vault will not be exposed throughout the process. So, overall this will improve security as it eliminates the accidental exposure of secrets through a VCS for example and at the same time, you no longer need to worry about password rotation of the cloud accounts used for provisioning.

--

--