HashiCorp Vault Performance Benchmark

Published in

HashiCorp Solutions Engineering Blog

14 min readMar 7, 2020

Vault is HashiCorp’s solution for managing secrets. It supports modular and scalable architectures, allowing deployments as small as a dev server in a laptop all the way to a full-fledged high availability cluster for production environments.

Whenever I am introducing Vault to prospects as a Solution Engineer at HashiCorp, I am frequently asked two questions:

Flexibility is great, but is there a reference architecture for deploying Vault in production?
What is the load supported by this architecture?

For the first, there is an easy answer — this page has the official HashiCorp reference architecture, this other page has step-by-step documentation on how to deploy Vault, and we even have this page describing production hardening guidelines.

However, unfortunately, we do not have benchmarks for this reference architecture — measures such as average latency, number of concurrent requests per second supported, average number of errors after a certain threshold, etc. The main reason for this is that:

Vault is a highly customizable system which supports a wide array of workloads, therefore synthetic benchmarking wouldn’t necessarily be indicative or helpful.

If you change one of the variables of a deployment scenario, such as the secret backend, the underlying network latency, or the distribution of different workloads, the numbers can be completely different.

While I agree and subscribe to the above reasoning, I have met several companies where the teams understood their required Vault use cases, had control of their infrastructure, would like to perform tests to validate what the load thresholds would be, and at which point they should consider scaling Vault (side note — Vault Enterprise has Performance Standbys to support horizontal and vertical scaling).

The great news is that there are a variety of Git repositories to support this type of benchmarking. A few of these are listed in the references section — if you are an advanced practitioner, you can skip the following instructions and go directly to the repos. On the other hand, if you have less experience with Linux, perhaps you will find the step-by-step guide in this post helpful. As a bonus, you can also apply these techniques to benchmark other server-based/ API solutions! :-)

In the following sections we will describe two ways of simulating workloads for Vault — locally, for a quick and easy (although unrealistic) test, and using the reference architecture. For these, we will be using the benchmarking code from the vault-guides Github repo.

Note: the awesome Terraform code and lua scripts were created by Lance Larsen, Roger Berlind, and Kawsar Kamal with inspiration from Jacob Friedman.

Requirements

Server running benchmark scripts: There are a multitude of ways for creating scripts and using tools to simulate requests to a remote server. In this blog post, we will focus on tools made for macOS/Linux. These tools have underlining dependencies and haven’t been tested in Linux emulators for Windows, such as git-scm. However, feel free to test and post results in the comments section; otherwise, you can just use a Linux virtual machine in your preferred cloud provider or with Virtual Box.
Platform where Vault is deployed: This blog post references macOS commands and AWS resources, however, the instructions can be adapted to other operating systems and deployment platforms.

Benchmarking — Local Test

This section describes how to set up a test benchmark locally in a Mac laptop, with the objective of introducing the benchmarking tools to be used.

The code has two dependencies:

LuaJit: This is a compiler for the Lua scripting language. While the linked page contains information on how to install in the different platforms, for Mac/OS I prefer using the brew dependency manager. Once you have brew installed, you just need to run:

brew install luajit

wrk tool: An open source tool to simulate simultaneous requests. On the right side of this page you will have the install instructions for different platforms. Once again, on macOS you can use brew:

brew install wrk

Running the benchmarking tool

For a simple test, we can open a terminal window, download, and spin up a dev instance of Vault:

# Download and stand up a dev version of Vault (edit link to match your OS):curl -o vault.zip https://releases.hashicorp.com/vault/1.3.2/vault_1.3.2_darwin_amd64.zipunzip vault.zipvault server -dev -dev-root-token-id=root -dev-listen-address=127.0.0.1:8200

With the Vault server running and listening for requests, we can open another tab on the terminal, clone the repository, and go to the benchmarking folder:

git clone https://github.com/hashicorp/vault-guides.git
cd vault-guides/operations/benchmarking/wrk-core-vault-operations

The README.md documents how the tests are written and how they can be updated. For a simple test of our local Vault, we can execute the commands:

# Prepare environment variables:
export VAULT_ADDR=http://127.0.0.1:8200
export VAULT_TOKEN=root# Prepare Vault for simple test
vault auth enable userpass
vault write auth/userpass/users/loadtester password=benchmark policies=default# Concurrent write of random secrets, with:
# 6 concurrent threads
# 16 connections per thread
# 20 seconds to run the test
# 1000 secrets to write
# Write benchmark results to file prod-test-write-1000-random-secrets-t6-c16-20sec.lognohup wrk -t6 -c16 -d20s -H "X-Vault-Token: $VAULT_TOKEN" -s write-random-secrets.lua $VAULT_ADDR -- 10000 > prod-test-write-1000-random-secrets-t6-c16-20sec.log &# The above command will start a background process, which will finish in 20 seconds. In my computer, this is the resulting log file:cat prod-test-write-1000-random-secrets-t6-c16-20sec.logNumber of secrets is: 10000
thread 1 created
Running 20s test @ http://127.0.0.1:8200
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   183.45us   72.36us   3.90ms   95.43%
    Req/Sec     5.25k   250.41     5.53k    84.08%
  104993 requests in 20.10s, 47.26MB read
  Non-2xx or 3xx responses: 104993
Requests/sec:   5223.65
Transfer/sec:      2.35MB
thread 1 made 104995 requests including 104995 writes and got 104993 responses

As you can see in bold, there were 104,993 responses, which matches the number of unsuccessful responses. Looking into the write-random-secrets.lua code, we see that it is trying to write to the path “secret”, which in Vault dev mode is already setup as KV2, requiring different commands. To fix this, let’s execute:

# Update write-random-secrets.lua to the path "secret2"
sed -i -e 's+/v1/secret/+/v1/secret2/+g' write-random-secrets.luavault secrets enable -path secret2 -version 1 kv# Now you getnohup wrk -t1 -c1 -d20s -H "X-Vault-Token: $VAULT_TOKEN" -s write-random-secrets.lua $VAULT_ADDR -- 10000 > prod-test-write-1000-random-secrets-t6-c16-20sec.log &cat prod-test-write-1000-random-secrets-t6-c16-20sec.log
Number of secrets is: 10000
thread 1 created
Running 20s test @ http://127.0.0.1:8200
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   224.89us  240.09us   6.37ms   97.29%
    Req/Sec     4.73k   166.61     5.12k    73.50%
  94069 requests in 20.00s, 10.86MB read
Requests/sec:   4703.19
Transfer/sec:    555.75KB
thread 1 made 94070 requests including 94070 writes and got 94069 responses# Meaning all requests were successful

Congrats! You just ran your first Vault performance benchmark test!

This clearly is not representative of a production environment, however, it shows that benchmarking Vault doesn’t need to be complicated. In the next section we will bring this closer to reality.

Benchmarking— Reference Architecture

For this next section, we will deploy a Vault cluster following the reference architecture. You can create everything by hand, however since at HashiCorp we believe in infrastructure as code, we will use the Terraform code in this repository.

This code not only deploys the reference architecture, but it also includes supporting tools to help visualize the metrics in a more robust way:

InfluxDB: open source storage for telemetry data
Telegraf: an agent for collecting telemetry information and forwarding to InfluxDB
Grafana: web UI dashboard to visualize telemetry
Envoy: provides observability on Vault ingress data. In this deployment, the load balancer points to the Envoy server, which acts as a load balancer sending requests to the Vault servers.

Finally, this code will also install Prometheus which is an alternative to InfluxDB, and wrk2, an alternative to the wrk tool, however, neither will be used for this blog.

This is a diagram of what will be deployed:

Creating Machine Images

Before executing Terraform, we first must create machine images for Consul and Vault. This follows best practices of immutable infrastructure, which guarantees that your servers will always be identically configured.

Machine images can be created in many ways, for this blog post we will use HashiCorp Packer, outputting the images to AWS (you will need to have AWS API keys to continue).

To build the images, open your local terminal and execute:

# Download and unzip Packer (update link to your OS)
curl -o packer.zip https://releases.hashicorp.com/packer/1.5.4/packer_1.5.4_darwin_amd64.zip
unzip packer.zip
# Clone repo
git clone https://github.com/hashicorp/guides-configuration
cd guides-configuration/vault# Set environment variables #If you need help getting or creating AWS API keys, you can find more information in https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/
export AWS_ACCESS_KEY_ID=[YOUR ACCESS KEY HERE]
export AWS_SECRET_ACCESS_KEY=[YOUR KEY HERE]
export AWS_DEFAULT_REGION=[DESIRED REGION HERE]export CONSUL_VERSION="1.7.1"
export VAULT_VERSION="1.3.2"# You can use the enterprise URLs here if you prefer (will need to add license manually once deployed)export VAULT_ENT_URL="https://releases.hashicorp.com/vault/1.3.2/vault_1.3.2_linux_amd64.zip"
export CONSUL_ENT_URL="https://releases.hashicorp.com/consul/1.7.1/consul_1.7.1_linux_amd64.zip"# Release version of this image, just for referenceexport RELEASE_VERSION="0.0.1"packer build -only=amazon-ebs-ubuntu-16.04-systemd vault-aws.json
cd ../consul
packer build -only=amazon-ebs-ubuntu-16.04-systemd vault-aws.json# This will run Packer twice, building two Ubuntu-based images, for Vault and Consul.

Note: this Packer code is for another project, provided as a shortcut and reference. This particular code will generate servers that have Vault running in dev mode. However, once we deploy our main Terraform code in the next section, the Vault servers will be configured to use Consul as the secret backend.

Setting Terraform Variables

Once we have the machine images created, we will now update the variables in the Terraform code.

In your terminal, execute:

git clone https://github.com/hashicorp/vault-guides.git
cd vault-guides/operations/benchmarking/terraform-aws-vault-benchmarkcp terraform.tfvars.example terraform.tfvarsexport AWS_ACCESS_KEY_ID=[YOUR ACCESS KEY HERE]
export AWS_SECRET_ACCESS_KEY=[YOUR KEY HERE]
export AWS_DEFAULT_REGION=[DESIRED REGION HERE]

Update terraform.tfvars in your favorite text editor with the following:

owner="YOUR NAME HERE"
ttl="8"
region="us-east-1"
azs=["us-east-1a", "us-east-1b", "us-east-1c"]
env="DESIRED DEPLOYMENT NAME"
consul_ami="AMI CREATED WITH PACKER"
vault_ami="AMI CREATED WITH PACKER"# You can change instance size if you want. These follow the "Small Deployment" on Vault Reference Architecture:
 
vault_instance_type="m5.large"
consul_instance_type="m5.xlarge"
telemetry_instance_type="m5.large"
benchmark_instance_type="m5.large"
envoy_instance_type="m5.large"

Executing Terraform

The easy part! On the terminal, execute:

terraform initterraform plan
# Validate you are ok with the resources createdterraform apply

Initializing Vault

Once Terraform deploys the infrastructure, you will be able to connect to the Vault cluster to initialize it and get the root token. To connect, use the “envoy_http” url output from Terraform:

...
I7nCxWrSxpB/vr/QVW4zjtCKxCGNvwGprAzfJ7QifkoklU6wG1JAV/VGWV1AUDaO
0Mf6NASS16crZu0=
-----END CERTIFICATE-----envoy_http = vault-benchmark-stenio-envoy-3dcd3a58e7024136.elb.us-east-1.amazonaws.com:80
envoy_https = ...

On the browser:

You can enter “1” and “1” and press “initialize”.

The next screen will show you your root token and the unseal/recovery key(s). In a real-world scenario, you would create an admin user and revoke the root token, but for this blog, you can store these safely for later use.

No additional commands will be needed to unseal Vault because this deployment is using KMS Auto Unseal.

Configuring Grafana

Grafana has been deployed by Terraform and it is configured to access the data from InfluxDb, sent by Telegraf. However, we need to create dashboards to tell Grafana which information we want to visualize.

Grafana’s URL will be in the Terraform output:

...
envoy_https = vault-benchmark...
grafana = vault-benchmark-stenio-grafana-1921586461.us-east-1.elb.amazonaws.com:3000
key = -----BEGIN RSA PRIVATE KEY-----
MIIEpgIBAAK ...

When opening in the browser, you will be asked for user credentials. Use “admin” for both.

Once you log in, you will need to import the templates for the Vault, Consul, and Envoy dashboards.

Click on “Import”, and then on “Upload .json file”. The dashboard templates can be found in the folder vault-guides/operations/benchmarking/terraform-aws-vault-benchmark/grafana . Do this for vault.json, consul.json and envoy.json.

Once you import these templates, you will be able to navigate between the dashboards by selecting in the top left corner.

Setting Up Workloads

Now that Vault and Grafana are open for business, it is time to start sending requests!

Follow the instructions of the README to connect to the bastion and to the benchmark server using consul:

# Terraform created a pem file and placed in the directory, so now we add to your ssh keysssh-add *.pem
ssh -A ubuntu@$(terraform output bastion)# Success, we are in the bastion. To see the consul members, execute
consul members

# Connect to the benchmark server. The terraform code configured this bastion host with ssh forwarding. Thank you @lanceplarsen! ssh 10.0.1.165

Running Benchmark Tests

Now that we are in the benchmark server, we can try it out by running the simple test we executed locally. Configure Vault and write multiple secrets by running the following on the benchmark server:

# Prepare environment variables:
export VAULT_ADDR=https://vault.service.consul:8200
export VAULT_TOKEN=YOUR_ROOT_TOKEN# Prepare Vault for simple test
vault auth enable userpass
vault write auth/userpass/users/loadtester password=benchmark policies=default# Since this is not in dev mode we shouldn't have anything in the "secret" path yet
vault secrets enable -path secret -version 1 kvgit clone https://github.com/hashicorp/vault-guides.git
cd vault-guides/operations/benchmarking/wrk-core-vault-operations/# Concurrent write of random secrets, with:
# 6 concurrent threads
# 16 connections per thread
# 20 seconds to run the test
# 1000 secrets to write
# Write benchmark results to file prod-test-write-1000-random-secrets-t6-c16-20sec.lognohup wrk -t6 -c16 -d20s -H "X-Vault-Token: $VAULT_TOKEN" -s write-random-secrets.lua $VAULT_ADDR -- 10000 > prod-test-write-1000-random-secrets-t6-c16-20sec.log &# Output I got as of 03/05/2020
cat prod-test-write-1000-random-secrets-t6-c16-20sec.log
Number of secrets is: 10000
thread 1 created
Number of secrets is: 10000
thread 2 created
Number of secrets is: 10000
thread 3 created
Number of secrets is: 10000
thread 4 created
Number of secrets is: 10000
thread 5 created
Number of secrets is: 10000
thread 6 created
Running 30s test @ https://vault.service.consul:8200
  6 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    16.71ms    4.89ms  92.96ms   83.89%
    Req/Sec   120.59     14.80   170.00     72.22%
  21630 requests in 30.03s, 2.50MB read
Requests/sec:    720.32
Transfer/sec:     85.12KB
thread 1 made 3584 requests including 3584 writes and got 3581 responses
thread 2 made 3606 requests including 3606 writes and got 3605 responses
thread 3 made 3610 requests including 3610 writes and got 3608 responses
thread 4 made 3619 requests including 3619 writes and got 3617 responses
thread 5 made 3622 requests including 3622 writes and got 3620 responses
thread 6 made 3601 requests including 3601 writes and got 3599 responses

Now we can do another test writing and reading secrets:

# Write 1000 secrets:
wrk -t1 -c1 -d5m -H "X-Vault-Token: $VAULT_TOKEN" -s write-secrets.lua $VAULT_ADDR -- 1000# Validate 1000th secret written:
vault read secret/read-test/secret-1000# Benchmark reading 1000 secrets concurrently:
nohup wrk -t4 -c16 -d30s -H "X-Vault-Token: $VAULT_TOKEN" -s read-secrets.lua $VAULT_ADDR -- 1000 false > prod-test-read-1000-random-secrets-t4-c16-6hours.log &# Output I got as of 03/05/2020
cat prod-test-read-1000-random-secrets-t4-c16-6hours.log
Number of secrets is: 1000
thread 1 created with print_secrets set to false
Number of secrets is: 1000
thread 2 created with print_secrets set to false
Number of secrets is: 1000
thread 3 created with print_secrets set to false
Number of secrets is: 1000
thread 4 created with print_secrets set to false
Running 30s test @ https://vault.service.consul:8200
  4 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     8.76ms    4.09ms 207.32ms   88.49%
    Req/Sec   465.15     39.43   590.00     66.58%
  55589 requests in 30.02s, 22.95MB read
Requests/sec:   1851.97
Transfer/sec:    782.92KB
thread 1 made 13926 requests including 13926 reads and got 13921 responses
thread 2 made 13908 requests including 13908 reads and got 13904 responses
thread 3 made 13884 requests including 13884 reads and got 13880 responses
thread 4 made 13888 requests including 13888 reads and got 13884 responsesWhat Tests To Run

Note that the above was using a Vault token $VAULT_TOKEN, without considering the authentication flow. Now is a great opportunity to think about the workloads you expect your Vault cluster to support — a few example questions to ask:

How many concurrent clients do you expect?
What authentication methods will you support?
How often will the clients authenticate?
What type of secrets will you support?
Are the requests only coming from the LAN (the Benchmark server in our deployment) or also from the WAN (accessing the AWS LoadBalancer from remote computer)
Will the differences between how wrk and wrk2 operate impact your analysis or is a rough latency estimate good enough?
Will infrastructure changes, such as moving the disks from burstable gp2 to provisioned IOPS provide significant benefits despite increased costs?

Based on these responses you can update the Lua scripts and the wrk command to match a desired behavior, tracking the real-time impact on servers with Grafana, and recording results with the wrk log file.

Performance Best Practices

To help come up with additional relevant questions to ask, it is a good idea to understand a few workflow behaviors that can take Vault to its limit:

Too many static secrets, users, policies, etc: Vault stores static information on the secret backend (Consul, S3, Postgres, etc). Therefore this value is limited by the resources offered by the backend. This can be addressed by:

Understanding your workloads in advance and doing benchmarks and sizing your deployment adequately
Installing monitoring tools to alert once certain thresholds are reached

Too many leases: Vault creates leases to track dynamic secrets and logins with valid TTLs. Once these TTLs expire, these are removed. Therefore if you have many requests which create leases before the existing ones expire, the limits will be constrained by the resources of the storage backend. This can be addressed by:

Ensuring you set reasonable TTLs for dynamic secrets and auth methods
Leverage refresh instead of creating new credentials every time
Installing monitoring tools to alert once certain thresholds are reached

Too many transit requests/ concurrent requests: The transit secret engine (encryption as a service) is one of the most taxing operations on a Vault server, since it requires the server to run the encryption algorithm. Similarly, like any server receiving connections, Vault will support a finite number of concurrent client requests. Both of these will be limited by the resources of the Vault server. This can be addressed by:

Understanding your workloads in advance and sizing, doing benchmarks, and sizing your deployment adequately
Leveraging Vault Enterprise Performance Standbys
Installing monitoring tools to alert once certain limits are reached

Guide to Important Metrics

Additional information on telemetry metrics available for Vault and Consul can be found in the documentation (Vault here and Consul here) and in the Vault Cluster Monitoring Guide.

Other Benchmark Repositories

If you don’t like this reference deployment, or prefer using python instead of Lua/wrk, here are a few other repositories to check out:

tradel/vault-load-testing

This project aims to generate realistic load against Vault (and, by extension, Consul) by exercising various secrets…

github.com

mpeters413/vault-benchmark-perfstandbys

This folder contains a Terraform module for deploying Vault to AWS (within a VPC) along with Consul as the storage…