HashiCorp Vault Operations Professional exam practice guide — Part 6

Vault HA cluster & data snapshots

Glen Yu
4 min readFeb 16, 2024

This will be the last part in my Vault Ops Pro exam guide and I hope you have found it to be useful. As I mentioned at the very beginning, this is not meant to be an exhaustive guide and the questions you get in the exam scenarios will not be as straightforward as some of the examples I presented. However if you are able to follow along and complete all the challenges I laid out along the way, then I think you will be in very good shape.

Fellow HashiCorp Ambassador, Bryan Krausen is an instructor and has produced many Udemy courses with the bulk of them being HashiCorp tooling related. You can find links to his courses and discount codes on Bryan’s GitHub page.

Vault HA cluster setup

HashiCorp recommends a 3 or 5-node HA setup for Vault, so feel free to spin up four more Vault server instance VMs, but I will just be adding two more to my setup. The setup is actually very simple. Your Vault server configuration file is going to look exactly the same as the one we ended with in part 2 of this guide for now (update IPs throughout and node_id in the storage stanza, of course). Start the Vault server but DO NOT initialize it. Instead, you want to join the already active (and initialized) Vault server:

vault operator raft join http://10.128.0.44:8200

Optionally, if you know the servers’ IPs ahead of time, you can also include it in your Vault server config by configuring the retry_join in your storage stanza so that it joins automatically. Here is what final Vault server config file for the third node of my Vault cluster looks like:

If you run vault operator raft list-peers, you should see something similar to the following:

Node                  Address             State       Voter
---- ------- ----- -----
raft_node_server01 10.128.0.44:8201 leader true
raft_node_server02 10.128.0.49:8201 follower true
raft_node_server03 10.128.0.50:8201 follower true

Voters vs non-voters

While it is out of the scope of this guide, I wanted to touch upon some of the differences between voter and non-voter Vault server nodes. The voter nodes contribute to quorum — they vote to elect the new leader should the existing one step down (intentional or otherwise). This is the type of setup you should have if you followed the steps I listed above and you can confirm this under the “Voter” column in the output of vault operator raft list-peers. Here, all reads and writes are forwarded to the leader node, so whether you have 3 or 5-node HA cluster, at the end of the day it is still just one node doing all the work. In a large and/or busy environment, this can induce a lot of latency to your requests.

Vault Enterprise introduces a new type of node called “performance standby nodes”. Performance standbys can handle read-only requests locally (writes will still be forwarded to the leader). You can run an unlimited number of them as they are non-voters and do not contribute to quorum.

Backup & recovery

A high-availability Vault cluster will protect your from hardware failures or outages, but what about your data? Vault offers a snapshot feature and is only available for integrated storage (Raft) backends. Backing up your data is as simple as:

vault operator raft snapshot save mybackup-20240216.snapshot

NOTE: you can perform this from anywhere, but you need to be connected to your active/leader node

I will actually not be discussing the recovery process — that is my challenge to you in challenge #1 below.

CHALLENGE #1: recover deleted secret

  • Delete one (or more) of the KV secrets we created in some of the earlier parts of this guide:
vault kv delete mykv2/passwords/admin

vault kv metadata delete mykv2/passwords/admin

NOTE: for KV version 2 secrets, you need to delete the metadata as well (not required for KV version 1)

  • Verify that your secret no longer exists with a vault kv get.

CHALLENGE #2: promote another Vault node to active mode by putting the current active Vault node into standby

>> SPOILER ALERT!!! SOLUTION GUIDE BELOW!!! <<

Challenge solutions guide

SOLUTION #1

  • Restore snapshot from your leader Vault server node:
vault operator raft snapshot restore ./mybackup-20240216.snapshot

The secret(s) should be available again.

SOLUTION #2

From your active Vault server node, simply run vault operator step-down to resign from active duty and thus promoting the next node to leader status.

--

--

Glen Yu

Cloud Engineering @ PwC Canada. I'm a Google Cloud GDE, HashiCorp Ambassador and HashiCorp Core Contributor (Nomad). Also an ML/AI enthusiast!