Vault debug

Pedro Coca
HashiCorp Solutions Engineering Blog
3 min readDec 15, 2019

Could you please send me the “logs”?

We often find ourselves facing the need to either gather data from a particular service or to send that data to other people or teams. On our daily job we are often dealing with Vault server logs that contain information about internal Vault operations, with Vault audit trails that include data about requests and responses or with Vault telemetry to know about the performance of the different Vault subsystems.

Gathering data from different places at different times makes things a bit more complicated, ending up with information that lack completeness and is not consistent. As a result of this, triage, escalation and utterly the resolution of configuration issues, bugs, et cetera. might take more time.

Data from an initial configuration of Vault at “t” can be gathered at t+1 with the CLI, at t+2 with the API, at t+3 with the telemetry, and again with the API at t+4. But that info might be affected by some event or configuration change that can happen between the data samples we were taken. In this case between t+2 and t+3:

evolution in time of a Vault interaction

One of the features of Vault 1.3 is the ability to get a bundle of information that is consistent and complete with vault debug:

❯ vault debug
==> Starting debug capture...
Vault Address: http://127.0.0.1:8200
Client Version: 1.3.0
Duration: 2m0s
Interval: 30s
Metrics Interval: 10s
Targets: config, host, metrics, pprof, replication-status, server-status
Output: vault-debug-2019-12-14T19-24-22Z.tar.gz
==> Capturing static information...
2019-12-14T19:24:22.338Z [INFO] capturing configuration state
==> Capturing dynamic information...
2019-12-14T19:24:22.346Z [INFO] capturing metrics: count=0
2019-12-14T19:24:22.346Z [INFO] capturing pprof data: count=0
2019-12-14T19:24:22.346Z [INFO] capturing host information: count=0
2019-12-14T19:24:22.346Z [INFO] capturing server status: count=0

...
2019-12-14T19:26:22.345Z [INFO] capturing metrics: count=12
2019-12-14T19:26:22.345Z [INFO] capturing host information: count=4
2019-12-14T19:26:22.345Z [INFO] capturing server status: count=4
2019-12-14T19:26:22.345Z [INFO] capturing replication status: count=4
2019-12-14T19:26:22.590Z [INFO] capturing pprof data: count=4
Finished capturing information, bundling files...
Success! Bundle written to: vault-debug-2019-12-14T19-24-22Z.tar.gz

Vault debug allows you to standardise on a debugging bundle for an easier consumption of consistent and complete data that will lead to faster triaging (and resolution :) or faster escalation (and resolution :).

As a practical example, we will gather specific data about the replication process on our vault clusters. It is important to know that some of the targets for Vault debug require specific permissions to be queried. We will use a root token to avoid any permission issues and get the data for the replication-status target during 30 seconds with intervals of 5 seconds:

vault debug -target replication-status -duration=30s -interval=5s -metrics-interval=5s -compress=false

Getting the following output:

==> Starting debug capture...
Vault Address: http://127.0.0.1:8200
Client Version: 1.3.0
Duration: 30s
Interval: 5s
Metrics Interval: 5s
Targets: replication-status
Output: vault-debug-2019-12-14T19-43-39Z
==> Capturing static information...==> Capturing dynamic information...
2019-12-14T19:43:39.227Z [INFO] capturing replication status: count=0
2019-12-14T19:43:44.230Z [INFO] capturing replication status: count=1
2019-12-14T19:43:49.230Z [INFO] capturing replication status: count=2
2019-12-14T19:43:54.230Z [INFO] capturing replication status: count=3
2019-12-14T19:43:59.230Z [INFO] capturing replication status: count=4
2019-12-14T19:44:04.230Z [INFO] capturing replication status: count=5
2019-12-14T19:44:09.227Z [INFO] capturing replication status: count=6
Finished capturing information, bundling files...
Success! Bundle written to: vault-debug-2019-12-14T19-43-39Z

After that we will be able to inspect the replication status with a simple cat on the pertinent file of the debug output:

❯ cat ./vault-debug-2019-12-14T19-43-39Z/replication_status.json
[
{
"mode": "disabled",
"timestamp": "2019-12-14T19:43:39.227865Z"
},
{
"mode": "disabled",
"timestamp": "2019-12-14T19:43:44.232115Z"
},
{
"mode": "disabled",
"timestamp": "2019-12-14T19:43:49.23361Z"
},
{
"mode": "disabled",
"timestamp": "2019-12-14T19:43:54.23268Z"
},
{
"mode": "disabled",
"timestamp": "2019-12-14T19:43:59.232733Z"
},
{
"mode": "disabled",
"timestamp": "2019-12-14T19:44:04.232732Z"
},
{
"mode": "disabled",
"timestamp": "2019-12-14T19:44:09.229048Z"
}
]%

As simple as that!

--

--