Backup a Consul cluster using `consul snapshot` command

When Consul is a Vault storage backend and runs on Kubernetes

Ionut Craciunescu
Wealth Wizards Engineering
6 min readMay 9, 2018

--

Taking full Consul backups for DR scenarios it’s actually quite straightforward: you just need to run consul snapshot save backup.snap. Job done! But what if my Consul cluster is a storage backend for Vault, running on Kubernetes and I want to encrypt my backup and ship it to an AWS S3 bucket? How can I go about running my backup tool of choice on a Kube pod and minimise secrets management?

Bash to the rescue! And Docker of course, and Vault’s awesome features👏!!

This post describes how to use consul snapshot save command inside a Kubernetes pod and use Vault Kubernetes auth backend alongside AppRole and AWS secrets engine to backup a consul Cluster.

The goal is to provide a quick and easy way to restore Vault, mainly in case of a DR, but not limited to this. The compute part of both Consul and Vault are defined in code, stored in git and can be brought back online in no time by running a single bash script for each product. Backing up Vault’s data, which is actually stored in Consul is achieved by using consul snapshot command.

As described in consul snapshot docs, the snapshot command has subcommands for saving, restoring, and inspecting the state of the Consul servers for disaster recovery. These are atomic, point-in-time snapshots which include key/value entries, service catalog, prepared queries, sessions, and ACLs. This command is available in Consul 0.7.1 and later.

There are some utilities out there for backing-up a Consul cluster, and the best one I found so far is https://github.com/pshima/consul-snapshot which can ship backups to AWS S3, runs as a daemon and has integrated monitoring and backup 👌. The drawback for using it , is that it cannot restore Consul ACLs 😞, which makes the restore of our Vault service slightly more complicated and a bit lengthier process. Being able to restore Consul ACLs gives the fastest and simplest restore process: bring up a new Consul cluster and then just run consul snapshot restore backup.snap: all data is restored at this point. Now just start Vault pods, unseal and this is it!

So I’ve created a Docker container called consul-backup that runs some Bash wrapper scripts which allow me to take Consul backups and has a restore functionality as well. Since Consul is a storage backend for Vault and is running on Kubernetes, this is how it all hangs together:

  • Consul-backup pod uses Vault Kubernetes Auth Method to obtain an initial Vault token
  • Use the token to authenticate against Vault using AppRole Auth Method
  • Use AWS Secrets Engine to get a dynamically generated set of AWS keys that allow access to the S3 backup bucket
  • Get a secret from Vault to use for encrypting the backup before pushing it to S3.
  • Get a Consul management token so it can used when running `consul snapshot save`. Since Consul snapshots actually contain ACL tokens, the Snapshot API requires a management token for snapshot operations and does not use a special policy.

Going into details, below are the steps on how it was all set up assuming these names were used for the clusters:

  • Kubernetes: kube.mydomain
  • Vault: vault.mydomain
  • Consul consul.mydomain

These are Vault configuration steps:

  1. Configure Consul management tokens lease:

2. Configure Vault AppRole Auth Method:

bound_cidr_list is optional and if set, specifies blocks of IP addresses which can perform the login operation.

3. Create the policy for the above AppRole:

4. Configure Vault Kubernetes Auth Method:

  • Enable Kubernetes Auth:
    vault auth enable -path=kube -plugin-name=kubernetes kubernetes
  • Create Kubernetes certificate chain:
  • Configure Vault to talk to Kubernetes:
  • Create a named role:

5. Configure Vault AWS Secrets engine:

  • Enable AWS secrets:
  • Configure the credentials that Vault uses to communicate with AWS to generate the IAM credentials:
  • Configure a role that maps a name in Vault to a policy or policy file in AWS. When users generate credentials, they are generated against this role:

Now that Vault is configured and all is in place, we can go into the details of the Bash scripts used by consul-backup container. The container repo is published here: https://github.com/WealthWizardsEngineering/consul-backup. It also contains a sample Kubernetes cronjob so it can be run as a job on the cluster. consul-backup has basically 3 scripts as shown below:

  1. environment.sh : as the name indicates, this one sets-up the environment by getting required secrets from Vault. It does things like:

2. backup.sh : it’s quite obvious what it does, and thanks Rich Marshall for adding InfluxDB events and a status on our current backup monitoring tool of choice. Here is the script:

3. restore.sh : it expects a brand new Consul cluster to restore to. It warns the user about the restore impact, asks for confirmation and if answered ‘yes’ it proceeds with the restore. The restore is a full restore, and most likely needed in a full DR scenario, but it can also be used for selective restore by doing a full restore to a new Consul instance followed by consul kv export and consul kv import . The important bits of the restore script are:

And… this is it! Bash to the rescue 👍!! Now this approach is specific for the use case described at the beginning of this article but it shows how you can quickly add some bash wrapper scripts around consul snapshot command and create something fit for your purpose when needed. Happy scripting!!

Ps. below diagram shows the relations between Vault Kubernetes Auth AppRole, Vault role and AWS secrets engine:

--

--