HashiCorp Vault and Terraform on Google Cloud — Security Best Practices

TL;DR

Use this guide when deploying Vault with Terraform in Google Cloud for a production-hardened architecture following security best practices that enable DevOps and the business to succeed!

Overview

HashiCorp’s Terraform is a tool for provisioning and managing resources through structured configuration files, an approach commonly called infrastructure as code (IaC). Security is always important and one of the most common security exposures involves storing credentials or other secrets in configuration files. HashiCorp’s Vault helps by providing secrets management which eliminates the requirement to store secrets such as credentials in configuration files.

In this post, I’ll describe a reference architecture for deploying and configuring Vault in GCP using Terraform tools that follows cloud security best practices and adheres to the Principle of Least Privilege. If you stick around long enough, I’ll also list out some security best practices for each of the components of this system.

Reference Architecture

Image for post
Image for post
Reference Architecture for Vault and Terraform on GCP

Project boundaries

Within GCP, you can isolate groups of resources into projects that have a hard boundary, which allows you to adhere to the Principle of Least Privilege. Each of the resources in these projects have their own set of permissions and can only talk to one another if explicitly allowed. In this case there’s a Shared VPC between multiple projects which allow the application projects to communicate over the network with the Secrets project.

Project functionality

This architecture lays out 4 major components which I’ll describe and then provide some best practices.

IaC Project

This is the GCP project where the CI/CD pipeline for Terraform should be deployed. This project is granted a large number of Cloud IAM privileges since it is responsible for creating and maintaining the rest of your infrastructure. If someone creates a trigger with undesirable behavior, the impact can be huge. As a result, monitoring of this service account as well code review become crucial.

As a caveat, this is only really necessary if you are using the Open Source version since Terraform Enterprise can handle automated deployments for you. In any case, this project should contain the build system (e.g. Jenkins, Spinnaker, etc.) configuration necessary to run terraform plan|apply in an automated fashion and send the right logs to the right folks when it fails. You should also store your Terraform state file in GCS within this project, protecting it with VPC Service Controls.

Secrets Project

This GCP project includes the necessary infrastructure for a Vault cluster including the cluster itself (which could run on GCE or GKE), the storage backend, an internal load balancer, and a bastion host (running on a Compute Engine VM) used to maintain Vault using it’s API.

Typically a bastion is placed on the public internet as a hardened VM whose only responsibility is to accept SSH connections. With Cloud IAP SSH Tunnelling, you not only gain this functionality but also prevent DDoS attacks. You may be asking then, why do I need a bastion host at all? Well in many cases you don’t, but in the case of maintaining internal services over HTTP where you don’t need SSH, a bastion host becomes useful. This means I don’t have to make the Vault server itself listen on port 22, but only on 443 as it should. The same concept can be used to maintain private GKE clusters as well. You can also turn off the bastion when you aren’t using it to save some money.

You’ll notice Vault is also behind an internal load balancer, which though not depicted in the diagram explicitly, should be a TCP/UDP Load Balancer. The reason for this is that Vault allows you to terminate TLS within the process itself ensuring total end-to-end encryption. If you use an HTTPS load balancer, you would have to re-encrypt traffic to get the same effect. You might as well use TCP listener with TLS that Vault provides.

Version Control System

This is not a GCP project, but the system that stores your Terraform code. I’ll talk about some best practices around securely configuring Terraform for a production environment a bit later. In general, you should pick a version control system (VCS) that has a high level of control over access to the master branch. As an example, in many VCS’s you can enforce that multiple users are required for code review before merging into master or even that you cannot use the rebase command to rewrite history on a particular branch. This level of control is important, especially when moving toward an automated system where a merge to a branch triggers another automated job.

Application Project(s)

These GCP projects contain the GCP resources that are the consumers of this shared infrastructure. They are the projects that are maintained by the Terraform config files and need to access secrets from Vault to function. For example, let’s say we have a Java app running in a GKE container that needs to talk to a MySQL database. You might have a file that specifies the environment config like the host of that MySQL database in source code, but would not want to have the credentials in source. Instead of baking these values into the container, you can use Vault to pull them into the container at run time. The same process can be used for GCE images as well.

Flow of the Architecture

The flow of the architecture above indicates the primary flow of data or interactions from one entity to the other. Starting from the top-left:

Finally, the application projects, depicted here as GKE clusters, pull secrets from Vault at startup as well as periodically using the Vault Agent.

So when should I use this?

This architecture should be applied when Terraform is used as the primary means to deploy Google Cloud infrastructure; part of which Vault is used for secrets management. Vault is not always an ideal solution for secrets management. If only static secrets are needed in certain contexts, you should consider Cloud KMS to encrypt secrets and store them in source code or GCS buckets. It’s perfectly fine to store secrets in source code if they are encrypted. Vault is an ideal solution for disparate teams storing secrets at a large scale or when you need some of Vault’s dynamic secret generation capability.

Terraform Security Best Practices

The IaC project should be the single source from which Vault Cluster/environment using Terraform is deployed. Terraform should run just like any other step within the build system. Once a merge happens in the relevant repositories, a build system job should execute to run terraform apply. We won’t go into details of a Terraform CI/CD pipeline here, but suffice to say that the IaC project is where that pipeline should live. (This is all assuming you are using the open source tool, since Terraform Enterprise handles automated deployment for you.) Some key points to make here are:

In general, if there are sensitive values being created and managed by a Terraform resource, or a sensitive value is being pulled in by a data provider, those secrets will be stored in the state. If you need to get around this and store secrets temporarily and in memory for a Terraform run, consider using null_resource, which does not store output in state.

Vault Security Best Practices

The Secrets project needs to be locked down further than most other projects considering the information it contains. In this case we are using a few key security controls including Cloud IAP, a bastion host, VPC Service Controls and others since this project will contain secrets for the entire environment (Prod, Staging, Dev, etc.). In addition to HashiCorp’s Vault Hardening Guide, here are some security best practices to keep in mind for using Vault with Terraform in Google Cloud environments.

Application Projects Best Practices

In this architecture, I used GKE clusters as an example, but this could represent any type of compute product with an application running on it, such as Compute Engine or Cloud Run. These applications should exist on the Shared VPC with Vault and be able to pull dynamic secrets from Vault. These projects do not necessarily need to talk to one another, and if necessary, VPC Service Controls can be used to further isolate project resources.

Written by

Cloud Security Engineer at Google Cloud http://github.com/onetwopunch

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store