Leveraging RAFT Storage for Resilient Secret Management: Deploying HashiCorp Vault on GKE (High Availability Mode)

PJ
Google Cloud - Community

--

August 15, 2024

PJ Resident Architect, Google Cloud

What is HashiVault, & Why Hashi Vault? Because your secrets deserve a vacation in Fort Knox, not a sticky note on your monitor. Jokes apart, In today’s world, applications and systems rely on a multitude of secrets — passwords, API keys, certificates, and more. These sensitive pieces of information are the keys to your kingdom, and keeping them safe is paramount. That’s where HashiCorp Vault comes in. HashiCorp Vault works “secret management as a service,” model. It provides a centralized, secure, and audit-able way to manage and protect these critical assets. Some of the benefits it provides over traditional secret management solutions is to act as a static secret store for encrypted key-value pairs; a secret generation tool to dynamically generate on-the-fly credentials; and pass-through encryption service so that applications don’t need to roll their own encryption.

Deployment Strategy There are few choices that need to be made for a well thought HashiVault deployment configuration. Those are mentioned below as you keep reading, at high level they fall in 3 categories — choice of platform, Vault Mode & Choice of Backend datastore.

  1. While you can certainly run HashiCorp Vault on a standalone virtual machine (VM), deploying it on a Google Kubernetes Engine (GKE) cluster unlocks a whole new world of benefits. GKE brings enhanced manageability, robust security, effortless scalability, and cost efficiency to the table, making it an increasingly popular choice for organizations looking to safeguard their sensitive data. Whether you choose GKE Autopilot or Standard, you’ll be well on your way to harnessing the power of Vault within a resilient and scalable Kubernetes environment. This article will guide you through the key steps to set up a GKE cluster and deploy Vault in High Availability (HA) mode.
  2. Vault can be deployed in 3 different modes depending upon the requirements and environment being deployed to. Dev Mode Ideal for development and testing environments, It’s pre-configured with an in-memory storage backend and automatically unseals and initializes itself upon startup. Standby Mode, the Lone Ranger is a solo Vault server that uses a file storage back-end. While it’s a simple setup, it lacks the security and resilience needed for production environments. High Availability (HA) Mode is where Vault shines in production This mode creates three replicas of your Vault server for fault tolerance. If one server fails, the others step in seamlessly, ensuring uninterrupted access to your secrets. HA Mode also utilizes a more sophisticated storage back-end like RAFT for data consistency and durability.
  3. Third & a critical decision determining the resilience and high availability (HA) of your HashiCorp Vault deployment is the selection of the storage back-end. This back-end forms the bedrock upon which Vault’s fault tolerance and data replication mechanisms are constructed, playing a pivotal role in ensuring continuous access to your sensitive data. Currently, several storage back-ends support HA mode for Vault, with HashiCorp’s Integrated Storage emerging as the recommended default for new deployments. This integrated solution boasts a streamlined setup process and optimized performance, making it an attractive option for many users. However, Consul remains a widely used and fully supported back end, particularly favored in environments where Consul is already established for other purposes. While Integrated Storage and Consul are the most common choices, it’s worth noting that ZooKeeper and etcd also offer viable alternatives. Each of these back-ends possesses unique strengths and considerations, making the selection process a nuanced one that should align with your specific organizational requirements and infrastructure landscape. This tutorial will follow setting us Integrated Storage with HashiVault instance. Some relevant links are mentioned for reference.

Comparison between integrated and External storage

Comparison between integrated and Consul storage

Implementation Details

Pre-requisites: Implementation requires a Google Cloud account, Google CLI (gcloud), Kubernetes CLI, Helm CLI installed & Google Project created. Please refer to Google Official Documentation for getting started with project creation. This implementation assumes you have above prerequisites met already.

Set the GCP project as your current project

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ gcloud config set 
project <your-project-id>
Updated property [core/project].
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Enable the Google container service

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ gcloud services enable 
container.googleapis.com

Operation "operations/acf.p2–1045873905056–9d50ad21–8942–417d-a79a-
7020fb07e443" finished successfully.
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Create a cluster in Autopilot mode named auto-cont-clust-hvault

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ gcloud container 
clusters create-auto auto-cont-clust-hvault

Creating cluster auto-cont-clust-hvault in us-west1… Cluster is
being health-checked (master is healthy)…done.

Created [https://container.googleapis.com/v1/projects/gke-asm-mtls-secure/
zones/us-west1/clusters/auto-cont-clust-hvault].
To inspect the contents of your cluster, go to: https://console.cloud.
google.com/kubernetes/workload_/gcloud/us-west1/auto-cont-clust-hvault?
project=gke-asm-mtls-secure
kubeconfig entry generated for auto-cont-clust-hvault.
NAME: auto-cont-clust-hvault
LOCATION: us-west1
MASTER_VERSION: 1.28.7-gke.1026000
MASTER_IP: 35.247.18.167
MACHINE_TYPE: e2-small
NODE_VERSION: 1.28.7-gke.1026000
NUM_NODES: 3
STATUS: RUNNING

pjoura@cloudshell:~ (gke-asm-mtls-secure)$

While Vault conveniently launches in standalone mode with a file storage back-end by default, achieving high availability (HA) through the Raft Integrated Storage necessitates overriding these default settings. To harness the benefits of HA, you’ll need to install the latest version of the Vault Helm chart, specifically configuring it for HA mode with integrated storage.

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ helm install vault
hashicorp/vault \
- set='server.ha.enabled=true' \
- set='server.ha.raft.enabled=true'
W0426 17:46:28.986256 1038 warnings.go:70] AdmissionWebhookController:
mutated namespaceselector of the webhooks to enforce GKE Autopilot policies.
NAME: vault
LAST DEPLOYED: Fri Apr 26 17:46:19 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:

Thank you for installing HashiCorp Vault!
Now that you have deployed Vault, you should look over the docs on using
Vault with Kubernetes available here:
https://developer.hashicorp.com/vault/docs
Your release is named vault. To learn more about the release, try:
$ helm status vault
$ helm get manifest vault

pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Get all the pods within the default namespace

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl get pods
NAME READY STATUS RESTARTS AGE
vault-0 0/1 Running 0 5m23s
vault-1 0/1 Running 0 5m23s
vault-2 0/1 Running 0 5m23s
vault-agent-injector-64c4d8fcc7-fpwfz 1/1 Running 0 5m23s
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

The vault-0, vault-1, and vault-2 pods indicate a “Running” state, yet they remain unready (0/1). This is due to the readiness.Probe’s status check generating a non-zero exit code, signaling that the pods are not yet prepared to accept traffic.

Meanwhile, the vault-agent-injector pod functions as a Kubernetes Mutation Webhook Controller. This controller acts as an interceptor, evaluating pod events and applying mutations to the pod configuration if specific annotations are present in the request.

Retrieve the status of Vault on the vault-0 pod. Status command shows that Vault is not initialized and is still sealed. Before Vault can authenticate with Kubernetes and manage secrets, it should be initialized and unsealed.

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec 
vault-0 - vault status
Key Value
- - - - -
Seal Type shamir
Initialized false
Sealed true
Total Shares 0
Threshold 0
Unseal Progress 0/0
Unseal Nonce n/a
Version 1.16.1
Build Date 2024–04–03T12:35:53Z
Storage Type raft
HA Enabled true
command terminated with exit code 2
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Initialize Vault with one key share and one key threshold. Below mentioned operator init command generates a root key that it disassembles into key shares -key-shares=1 and then sets the number of key shares required to unseal Vault -key-threshold=1. These key shares are written to the output as unseal keys in JSON format -format=json. Here the output is redirected to a file named cluster-keys.json.

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec 
vault-0 - vault operator init -key-shares=1 -key-threshold=1
-format=json > cluster-keys.json
pjoura@cloudshell:~ (gke-asm-mtls-secure)$ cat
cluster-keys.json | jq -r ".unseal_keys_b64[]"
7XGAQcwYa/om+v6pqE1O2omN2lEJc12wb4dkRDpwaR4=
pjoura@cloudshell:~ (gke-asm-mtls-secure)$
VAULT_UNSEAL_KEY=$(cat cluster-keys.json | jq -r ".unseal_keys_b64[]")
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Upon initialization, Vault establishes the knowledge of where and how to access its storage, but remains unable to decipher the data within. Unsealing, therefore, is the crucial process of constructing the root key required to unlock the decryption key, ultimately granting access to the encrypted contents of the Vault. Retrieve the status of Vault on the vault-0 pod. The Vault server is initialized and unsealed.

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec vault-0 
- vault operator unseal $VAULT_UNSEAL_KEY
Key Value
- - - - -
Seal Type shamir
Initialized true
Sealed false
Total Shares 1
Threshold 1
Version 1.16.1
Build Date 2024–04–03T12:35:53Z
Storage Type raft
Cluster Name vault-cluster-6940722a
Cluster ID 31e56d13-b223–7187-df26–6f9dc3d5fb79
HA Enabled true
HA Cluster https://vault-0.vault-internal:8201
HA Mode active
Active Since 2024–04–26T17:55:28.684602682Z
Raft Committed Index 51
Raft Applied Index 51
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Currently Vault server running on vault-0 pod is a single node. Next step would be to join the nodes to make it highly available. When using the HA mode, vault requires a root token to list the peers ( nodes in that vault cluster) , or else it would throw a 403 Permission error.

Store root token from json file generated with operator_init to a variable & login

CLUSTER_ROOT_TOKEN=$(cat cluster-keys.json | jq -r ".root_token")
pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec vault-0 -
vault login $CLUSTER_ROOT_TOKEN
Success! You are now authenticated. The token information displayed below
is already stored in the token helper. You do NOT need to run "vault login"
again. Future Vault requests will automatically use this token.
Key Value
- - - - -
token <removed the output intentionally>
token_accessor <removed the output intentionally>
token_duration ∞
token_renewable false
token_policies ["root"]
identity_policies []
policies ["root"]
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

List the vault cluster peers. This displays the one node within the Vault cluster. The Vault servers on the other pods need to join this cluster and be unsealed.

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec vault-0 - 
vault operator raft list-peers
Node Address State Voter
- - - - - - - - - - - -
db92143a-875a-e209–50b7-c1f1378372ae vault-0.vault-internal:8201 leader true
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Join the Vault server on vault-1 & vault-2 to the Vault cluster.

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec 
vault-1 - vault operator raft join http://vault-0.vault-internal:8200
Key Value
- - - - -
Joined true

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec vault-2
- vault operator raft join http://vault-0.vault-internal:8200
Key Value
- - - - -
Joined true

pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Unseal the Vault server on vault-1 & vault-2 with the unseal key

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ VAULT_UNSEAL_KEY=
$(cat cluster-keys.json | jq -r ".unseal_keys_b64[]")
pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec
vault-1 - vault operator unseal $VAULT_UNSEAL_KEY
Key Value
- - - - -
Seal Type shamir
Initialized true
Sealed true
Total Shares 1
Threshold 1
Unseal Progress 0/1
Unseal Nonce n/a
Version 1.16.1
Build Date 2024–04–03T12:35:53Z
Storage Type raft
HA Enabled true
pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec
vault-2 - vault operator unseal $VAULT_UNSEAL_KEY
Key Value
- - - - -
Seal Type shamir
Initialized true
Sealed true
Total Shares 1
Threshold 1
Unseal Progress 0/1
Unseal Nonce n/a
Version 1.16.1
Build Date 2024–04–03T12:35:53Z
Storage Type raft
HA Enabled true
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Listing the nodes for this vault cluster will show all 3 nodes in HA mode

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl exec vault-0 - 
vault operator raft list-peers
Node Address State Voter
- - - - - - - - - - - -
db92143a-875a-e209–50b7-c1f1378372ae vault-0.vault-internal:8201 leader true
b36a6a5f-7839–3c2d-f154–512d2ce4c144 vault-1.vault-internal:8201 follower true
9990dfb5-b521–2be3–682c-7c373d721664 vault-2.vault-internal:8201 follower true

pjoura@cloudshell:~ (gke-asm-mtls-secure)$

Finally, list the pods & Vault will show READY!! 🎉

pjoura@cloudshell:~ (gke-asm-mtls-secure)$ kubectl get pods
NAME READY STATUS RESTARTS AGE
vault-0 1/1 Running 0 4d4h
vault-1 1/1 Running 0 4d4h
vault-2 1/1 Running 0 4d4h
vault-agent-injector-64c4d8fcc7-fpwfz 1/1 Running 0 4d4h
pjoura@cloudshell:~ (gke-asm-mtls-secure)$

--

--