Set up a production-ready Vault cluster with Kubernetes and Azure

Abdessamad Bayzi
OCP digital factory
8 min readSep 23, 2020

Secret management is not optional, It must be done, and in a good and secure way because:

  • Data breach are a routine occurrence these days
  • This can result in lawsuits and fines that may ruin businesses
  • Securing your secretes prevent or limits scoop of breach
  • Business secrets must also be secured from competition

One of the best secret management tools in the market is Vault, unlike other tools, It’s well designed for DevSecOps, security is managed as code, while other tools are driven by manual configurations, and in most cases, a vendor-specific consultant must be hired to maintain this Snowflake server.

This blog post is not an introduction to Vault, it’s a practical guide to deploy a production-ready, high available Vault with ease using Kubernetes and some scripted operations.

In this walkthrough, we are respecting the production hardening recommendations provided by Hashicorp, such as: End-to-End TLS, Single Tenancy, disable root token, Don’t Run as Root, Restrict Storage Access, Enable Auditing …

At the end of this post, we’ll have the following architecture:

Requirements:

1 — Setup certificates:

As indicated in the production hardening recommendations, communication between Vault and Consul must be encrypted, therefore we will set up certificates to be used by:

  • Consul service
  • Vault service
  • 127.0.0.1
  • Consul cluster

For this purpose, We will use Cloudflare SSL tool kit (cfssl and cfssljson), so make sure to install it before proceeding to the next step.

First, edit the files ca-conf/consul-csr.json and ca-conf/ca-csr.json to change the cert info in the “name” section

Make then all scripts executable

$ chmod +x scripts/*.sh

And run the create-certs script

$ ./01-create-certs.sh

This will:

  • Initialize a Certificate Authority (CA)
  • Create a private key and sign the TLS certificate

At this point, you should have the following files generated in the ca directory

ca-key.pem
ca.pem
consul-key.pem
consul.pem

2 — Setup HA storage backend — Consul cluster

Vault relies on an external storage backend for persistence, and this decoupling allows Vault to be managed immutably.

Currently, there are several storage backends that support high availability mode, including Consul, ZooKeeper and etcd. The Consul backend is the recommended HA backend, as it is used in production by HashiCorp and its customers with commercial support.

Consul uses a gossip protocol to manage membership and broadcast messages to the cluster. In order to secure gossip communication with encryption, we’ll generate and use an encryption key, we’ll use the Consul CLI command, consul keygen, to generate a cryptographically suitable key. So make sure to install Consul CLI command:

$ brew install consul

and run the setup-consul script

$ ./02-setup-consul.sh

The script will:

  • Create a consul service account
  • Create a secret with early created certificates
  • Create a secret with the gossip encryption key
  • Create Consul StatefulSet and Consul service

Note that we’re using node anti-affinity for more robustness. So make sure to have enough Kubernetes nodes, at least 3, the minimum number of nodes required to have a quorum in the consensus algorithm used by Consul for leader election.

Once all Consul pods are in Running state:

$ kubectl get podsNAME       READY     STATUS    RESTARTS   AGE
consul-0 1/1 Running 0 40s
consul-1 1/1 Running 0 23s
consul-2 1/1 Running 0 11s

make a port-forward and check all members are alive:

$ kubectl port-forward svc/consul 8500:8500
$ consul membersNode Address Status Type Build Prot DC Segment
consul-0 172.17.0.2:8301 alive server 1.8.0 2 dc1 <all>
consul-1 172.17.0.4:8301 alive server 1.8.0 2 dc1 <all>
consul-2 172.17.0.5:8301 alive server 1.8.0 2 dc1 <all>

Visit http://127.0.0.1:8500 in your web browser to access Consul Web UI

3 — Setup Vault cluster

As mentioned above, End-to-End encryption via TLS between the client, Vault, and Consul is recommended, therefore, the first thing to do is to create a secret of certificates created earlier. Then create a config map containing Vault and storage configurations (vault/vault-cm.yaml).

Node anti-affinity is also used with Vault, so make sure to have enough nodes in your cluster.

The last thing to mention is the auto-unseal feature. When a Vault server is started, it starts in a sealed state and it does not know how to decrypt data. Before any operation can be performed on the Vault, it must be unsealed. Unsealing is the process of constructing the master key necessary to decrypt the data encryption key.

The data stored by Vault is stored encrypted. Vault needs the encryption key in order to decrypt the data. The encryption key is also stored with the data (in the keyring), but encrypted with another encryption key known as the master key.

Before the 1.X version of vault, the unsealing operation used to be manual, hopefully it’s now a built-in functionality in the open-source version of Vault. It may be implemented using a cloud provider or the vault’s secret transit engine (on-premise unsealing). For this guide, I’m using Azure Key Vault for the auto-unsealing.

As a prerequisite, I assume:

  • You have a Microsoft Azure account, and at least, Contributor role in a resource group
  • Already created a resource group
  • jq installed, a lightweight and flexible command-line JSON processor

The setup-vault script will do the following:

  • Create a service principal in your Microsoft Azure account
  • Create a key Vault resource in the specified resource group
  • Set a policy for the service principal, to allow it to manage Vault keys in your resource group
  • Create the Master key in Azure Key Vault, that will be used for the auto-unseal operation
  • Setup Vault configuration in a config map, with the auto-unseal information
  • Create a Vault service account in your cluster
  • Create secret with required certificates to ensure End-to-End Encryption
  • Deploy Vault Statefulset and its services

After clarifying what the script does, edit the script 03-setup-vault.sh to set the resource group name and the subscription Id.

$ ./03-setup-vault.sh

Once all pods are in Running state, you can do the same as before, make a port-forward and check all members are alive:

$ kubectl port-forward svc/consul 8500:8500
$ consul membersNode Address Status Type Build Prot DC Segment
consul-0 172.17.0.2:8301 alive server 1.8.0 2 dc1 <all>
consul-1 172.17.0.4:8301 alive server 1.8.0 2 dc1 <all>
consul-2 172.17.0.5:8301 alive server 1.8.0 2 dc1 <all>
vault-0 172.17.0.7:8301 alive client 1.7.0 2 dc1 <default>
vault-1 172.17.0.8:8301 alive client 1.7.0 2 dc1 <default>
vault-2 172.17.0.9:8301 alive client 1.7.0 2 dc1 <default>

If you visit http://127.0.0.1:8500, you’ll see that you have Vault and Consul services. You may notice that vault checks are Red, and to be exact, the Vault Sealed Status is red.

As explained before, Vault starts in sealed mode. Since this is the first time, we need to unseal it manually. It will be done automatically by vault in the future, thanks to the auto unseal feature.

$ kubectl port-forward svc/vault 8200:8200

Visit https://127.0.0.1:8200 in your web browser to access Vault Web UI

Fill the form and download the keys. It’s your responsibility to keeps those keys secret. One of the recommended strategies is to share each key share with a Vault admin.

Go back to the Consul Web UI: http://localhost:8500, you’ll see that all checks are Ok.

From now on, the unseal operation will be done automatically. To be sure, delete all Vault pods:

$ kubectl delete po -l app=vault

Wait for Vault pods to be in Running state, and go to Consul Web UI: http://127.0.0.1:8500, you should see all checks are green.

4- Setup Ingress with traefik

Since Vault service will be highly used for secret management by other applications/users, we need to make it available. The simplest way is to use NodePort and point your alias DNS to your cluster. The cleanest way is to use Ingress resources.

Traefɪk is a modern HTTP reverse proxy and load balancer made to deploy microservices with ease. I highly recommend you do some searching for it.

For the deployment, we’ll be using a helm chart, with a custom value file:

# values.yamldeployment:
enabled: true
replicas: 1
service:
enabled: true
type: NodePort

This is to change the number of replicas and create à NodePort Service instead of LoadBalancer.

$ cd ..
$ helm repo add traefik https://containous.github.io/traefik-helm-chart
$ helm repo update
$ helm install traefik traefik/traefik -f traefik/values.yaml

This HelmChart does not expose the Traefik dashboard by default, for security concerns. Thus, there are multiple ways to expose the dashboard. For instance, the dashboard access could be achieved through a port-forward :

$ kubectl port-forward $(kubectl get pods --selector "app.kubernetes.io/name=traefik" --output=name) 9000:9000

Visit http://127.0.0.1:9000/dashboard/#/ to access to traefik dashboard

You can then expose Vault using traefik by creating an IngressRouteTCP resource:

$ kubectl apply -f traefik/ingress.yaml
$ TRAEFIK_PORT=$(kubectl get svc traefik -o json | jq .spec'.ports[1]'.nodePort)

Go to https://vault.example.com:$TRAEFIK_PORT.

Conclusion:

Congratulation! At this point, you have set up a production-ready Vault cluster. I decided to group all the commands into scripts so that you can take your time reading and understanding the purpose of each step, and make it easier to perform. This is the result of many applying and debugging cycles, so you won’t be faced with the same. If at some time you need to make some cleaning, you can find in the script directory the following scripts:

04-cleanup-vault.sh
05-cleanup-consul.sh

Take your time to read the scripts and adapt them to your needs.

--

--

Abdessamad Bayzi
OCP digital factory

DevOps & Plateform Engineer: Docker/Kubernetes/Openshift, CI-CD, GitLab-CI/Jenkins, Elastic stack, Ansible, Python/Java