Apply some general system hardening to your platforms to protect your Vault secrets from attackers.

How I’d attack your HashiCorp Vault (and how you can prevent me): System Hardening.

Published in

HashiCorp Solutions Engineering Blog

14 min readMar 3, 2020

System hardening for Vault

As customers put a lot of trust in their HashiCorp Vault installations, it’s important to think about good old system hardening guidelines. Why do security guidelines recommend installing on VMs instead of Kubernetes? What are the chances an external or internal attack could compromise your Vault? I’ll put on my white hat 🤠 and show you exactly what I would try if I were trying to compromise your Vault from either an external connection or an internal vector within your DevSecOps team. You can use the information to help prevent anyone from getting through. These examples are not exhaustive — these are just some basic examples of system hardening and simple prevention steps.

As an attacker, my ultimate goal would be to get either a root token or a master/encryption key from your Vault. The first obvious defense is to delete your root token in production. If you need a root token later, you can always regenerate one with your recovery keys. A root token would give me admin access to everything in your Vault including all policies, Sentinel policies, and all namespaces. Assuming you’ve deleted that root token as you should, let’s assume I will attempt to compromise either option:

Vault server attack.
Critical communications attack (middleman).

For reasons we’ll get into, the installation method matters significantly. Vault is officially shipped as a zipped standalone binary or official Docker Hub image but for comparison, I will be demonstrating with both the zip distribution and the community Yum repo for Linux I maintain here: https://copr.fedorainfracloud.org/coprs/boeroboy/hashicorp/

Option 1: Compromising a Vault Server

General Linux system hardening guidelines have been established for quite a while but are rarely applied. A simple scan of a default RHEL/CentOS/Amazon Linux install using OpenSCAP and Scap Workbench shows how many red flags are actually present against the PCI DSS profile. If attackers use this and assume a default install is the platform for a user of a HashiCorp product, they can gain quite a bit of insight into a server’s potential weaknesses.

SCAP Workbench is a super handy and free tool that isn’t well-publicized.

Here we see a simple RHEL 8 PCI DSS compliance scan against a fresh default install of CentOS 8 or RHEL 8. There is plenty of red to behold. A few years back Anaconda got the brilliant option to pre-install an OpenSCAP profile compliant OS but it doesn’t always work and most people tend to skip it. Some of these items are simple to enforce, such as PAM policies for expiring, rotating, or disabling local passwords. Other policies are harder to enforce. Luckily, OpenSCAP profiles specify a remediation script that can attempt to fix these violations for you.

Yep, this is the default EL8 scan result. Do you feel dirty yet?

Some of my favorite tests involve RPM package integrity. If your Vault has been installed with Terraform or a manual unzip and custom SystemD units, there is no guarantee your Vault binary is untampered unless you compare signed checksums. If this isn’t the case, a simple commit to your CI pipelines can result in a Vault being deployed (or redeployed) with a compromised binary. No OS integrity checks will verify a zipped binary you’ve dropped into your system. It’s up to deployers to manually verify checksums against our signed checksum list. This is a major risk for any production Linux deployment and the reason that Yum and RPM packages have the option to sign and verify with an org’s GPG keys. Not only are RPMs signed and verified but they also contain metadata about which files are provided by the packages and what their checksums, ownership, timestamps, etc. were at time of packaging.

If you’re using our ZIP distribution with your own SystemD units and config, it’s much easier to completely own your Vault with a minor commit to your Terraform or CD pipeline. A custom build of Vault can do anything from leak keys and secrets to corrupt and hold your data hostage. Immediately these options come to mind:

Changing the SystemD unit to run Vault service with different permissions or config.
Switching the download URL to my own compromised Vault build gives me everything I need with no system flags on nefarious behavior.
If you’re using Consul or other network storage, changing process user has no effect on storage. If you’re using RAFT or filesystem storage, I may not be able to access that path. This option gives a slight security edge to RAFT/files since TCP/HTTP has no concept of user permissions or ownership besides app-level session tokens and ACL.
If attackers have root access or something like Ansible tooling as root, they can potentially accomplish all of the above which is a no-brainer and why nobody should get root.

Let’s assume you are using valid secure software repos and packaging for your deployment. If attackers try to pull options 1 or 2 above, they run a huge risk of being detected. Most Linux estates have a hardening guideline that is enforced and regularly monitored. Scheduled OpenSCAP profile checks can make sure nobody is installing unapproved software within your platform. I’ll use the COPR secure yum repo for an example on the same RHEL 8 VM I used to install and break a single node Vault instance. This idea relies on the implication that compromising any node within a Vault cluster is catastrophic to the entire cluster.

Imperative: Installing Vault securely via repo

Software packaging and repository options are currently in planning for HashiCorp Engineering [UPDATE: Available now]. Some customers roll their own packages from our official ZIP releases. Other community members have even created Yum repositories for distributing packages with automatic updates. There is one thing you should watch for on any repository, as required by the PCI compliance profile. This is the gpgcheck=1 flag on every repo. It’s one thing to publish a repo of public RPMs but it’s another step to associate a set of signing keys with that repo and to sign all packages with it. Windows MSI packages use the same concept which causes occasional warnings a user may see about “untrusted publisher” with Java or other installers.

By default, all Enterprise Linux distributions ship with secure repos and the default installation passes this test. Unfortunately, even some well-known hardware vendors ship critical system software with unsigned packages and this test often gets overlooked. Luckily the Fedora COPR (Cool Other Projects Repo) build system automatically builds signed packages using a private key that it never even shows me. Access to the build system is protected via SAML credentials. This way the packages I’ll be using are signed and will pass an audit or integrity scan. Personally I’d remove any unsigned packages in any critical system.

Each rule in an OpenSCAP profile includes explanation, how it’s checked, and remediation.

If unsigned packages are allowed, anybody can just repackage a custom-built binary or config and distribute it to your servers with no guarantee that it is a HashiCorp build. Also some installations run as non-root users or groups. Combine that with a default umask and oftentimes the Vault binary is user or group writable, where anybody in an attacking group can quietly overwrite the binary or replace it with a script to inject whatever they want into your SystemD service. The binary must always be root-owned with no write permissions. So let’s install securely using the RPM which takes care of all of this for us.

Importing the repository’s public key and installing Vault only after verifying it.

If attackers have manipulated your binary it won’t install as gpg verification fails. This is a good thing for you as it prevents them from mangling your delivery stream and updates. Not only does this protect against delivery tampering, but it also secures some secondary benefits of RPM packages. I can easily tell if anybody has touched or modified a binary or config file:

Help! Someone touched Vault! Modify timestamp has changed from signed packaging and the systemd unit has been altered too! Note the config files which are packaged with a “c” next to them and usually would be modified so are omitted from compliance checks. At this point, the entire system has potentially fallen out of PCI compliance. Someone should call forensics and yum reinstall Vault.

Touching Vault won’t really do anything but the much more obvious compromise is to replace Vault with a custom build that compromises keys. The following attack is much more serious, transparently replacing the Vault binary with a symlink to somewhere world-writable:

Regular system scans and OpenSCAP profiles verify all packages to flag this kind of warning. Obviously I’ve demonstrated with root access which is a given but it may be possible to edit Vault configuration without root using the CI/CD pipeline or sneaky DevOps code commits. A Terraform heredoc can be off by one char and completely derail your organization. OpenSCAP’s PCI DSS profile scans all RPM files for integrity and is often scheduled regularly.

Vault Configuration

What about configuration files? Odds are admins won’t have root access but will have some way to modify Vault config files. These files provide the arguments and environment variables to my Vault service including the commands for how to run everything. Take my second favorite option when I don’t have root but I am in the operations team. I may not be able to manipulate your systemd unit but I can sure as heck read it for the important lines:

Combined with your vault.hcl config, I can find your storage option. If you’re using local files or RAFT, I can potentially compromise permissions on that filesystem path if it’s owned by the Vault service user and I can get your service process to run extra commands.

This is where SystemD novices can make or break you. What type of SystemD unit is specified to start your service? The default type (basic) is fine and as long as the first part of the command is the full path to your trusted Vault binary, no other process can run. Other types may open up vulnerabilities if they’re not carefully written. Type “oneshot” allows multiple commands to be run, including scripts intended to run as the Vault user. Type “forking” may allow for intermittent processes to be forked instead of Vault, so please be careful. The best way to prevent your Vault service user from changing permissions or group ownership on a filesystem is with SELinux contexts; definitely DON’T DISABLE SELINUX or AppArmour.

[UPDATE 10/FEB/2021] There is now a dedicated SELinux policy and guide published here: https://www.hashicorp.com/blog/hardening-hashicorp-vault-with-selinux

Use our SELinux policy for a solid Enforcing strategy.

sudo dnf install https://github.com/hashicorp/vault-selinux-policies/releases/download/v0.1.5/vault_selinux-0.1.5-1.fc32.noarch.rpm

Check the repo out for releases or yum install your release directly as above.

Kernel and FIPS

If you’re tied to FIPS compliance, know that FIPS is a lot more complex than a simple automated scan. FIPS is an extended set of requirements that an application or system must adhere to including storage, encryption methods, etc. Running a kernel in fips mode is wise but is only part of a compliant plan. The Linux kernel parameter “fips=1” ensures that the kernel does not allow encryption standards that are banned by the FIPS standards. Note that this applies only within the kernel itself. A kernel running in FIPS mode can still run proprietary application code that makes use of older encryption stacks. Be sure to check if your release support FIPS mode https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/considerations_in_adopting_rhel_8/security_considerations-in-adopting-rhel-8

Vault is audited for FIPS 140–2 compliance for data at transit and at rest but a weak or older kernel may provide an easier attack vector at the system level.

Hostage

If attackers compromise your config, they can potentially steal read or write access over your storage. Sure your data is encrypted at rest but that won’t prevent them from moving or encrypting your storage and holding that data for ransom. Understand that encryption at rest isn’t a safety guarantee! The best defense against ransomware on your storage is to have multiple replicas in your cluster and ideally, a DR cluster replicated to a different cluster managed by a different local team. Always keep and test backups too!

Listeners

If attackers have full control of your config they can also add a local listener with or without TLS. If they spin up a new listener without encryption they can trick users into using my new endpoint via an encrypted proxy or modifying or adding consul service definitions for Vault. Attackers could listen on the inside to all bidirectional traffic while a user still thinks they are on valid TLS and just a different port. Be careful to always lock down your established Vault endpoint with TLS!

Plugins

If attackers have control over your config they are probably able to set the plugin directory. A simple line of plugin_directory = “/tmp” allows anyone (world writeable) to drop a custom plugin into this directory and possibly run arbitrary code within the vault process. In this case, the attacker may choose to change permissions on any system resources owned by the vault user. It also allows read access to any of Vault’s plugin internals, as described here: https://www.vaultproject.io/docs/internals/plugins/

Docker / Kubelet

A lot of people are using Kubernetes or OpenShift to install Vault these days. It’s a super simple way to get a cluster up and running. The ultimate warning is whoever has control over kubectl can potentially execute arbitrary commands within your pods, and edit ConfigMaps. Also, there are plenty of unverified Docker images out in the wild, so be sure to always use HashiCorp’s official images. If someone has direct access to your Kubelets or even kubectl exec within your Vault pods, your Vault can be vulnerable to all of the above.

Memory Attack

None of the attacks above have compromised the actual master or encryption keys used by Vault to store secrets at rest. They have relied on a running unsealed Vault to give up its information. What if an attacker is able to access your keys directly? Then they could potentially decrypt your Vault data from raw storage even without having a Vault instance or unseal key to worry about.

When a host is compromised by someone as root or wheel group they can typically list all running processes on a machine including its cgroups, pods, and containers. Vault already runs with mlock so that nothing critical can be swapped to disk, but root can still access all memory in a host. Often you hear about people accessing raw memory but how is that done exactly? It depends on the kernel you’re running but in older kernels /dev/mem was a root-readable device where you could access all memory in a host. That path has since become protected by a recent kernel patch for obvious reasons, but certain enterprise kernels downstream may not have this patched. Rather than scan through all memory in a system, the /proc/$PID filesystem narrows it down to make things simple for us:

Here we’ve listed all of the memory ranges currently reserved by Vault. You must be root to access these process maps even if your Vault service is running as a different user. Now it’s simple enough to use gdb to dump the contents of each range to disk for searching. In fact, here is a widely used script to dump process memory to disk for troubleshooting:

When run as root this will dump any process memory to disk which I can browse for keys at will. This includes all containers on the machine! If your Vault has been initialized recently, I can pick out your master key with a simple grep. If attackers wanted to compromise your Vault early and lurk undetected for a long term intrusion, this is the ideal way to lie in wait.

[EDIT] Hibernation Attack

After publishing this article I forgot one of my favourite attacks and one that doesn’t even take root. Use extreme caution with hibernation aka ACPI state S4. Does your tin/hypervisor/cloud support hibernation? If you don’t know, you should check. Hibernation itself can be very handy and isn’t a security issue by itself. This state surpasses sleep/suspend in that it writes the contents of memory to swap on disk so you can power a machine completely off without losing memory. It speeds up boot while saving energy. It can also obliterate any protections you set up with mlock. Depending on your OS and polkit policies, you may not even need root to hibernate a machine. If your machine has no swap, you are safe and hibernate is not an option. If your admin team can create a swapfile, you still run a risk.

$ systemctl hibernate

There you have it. One command run from the wheel group hibernates a machine, writes your vault memory to disk in plain text, and shuts down the machine to state S4. Now all I need to do is pull the disk or image, read that swap file, and I’ve got your keys. Hibernate can be enabled in KVM and thus some of the cloud providers support it. I can reproduce this on KVM and GCP but I haven’t tried with others. I don’t think cloud providers or hypervisors should disable S4 globally, but it may be wise to include an option within each VM.

Option 1 Summary

The config file and systemd unit are pretty critical to your Vault security. Anything that’s a file should be locked down with both filesystem permissions and AppArmour or SELinux contexts. This includes config and file or RAFT storage. All binaries should be owned by root and come from signed packaging. Definitely establish a secure platform with a tuned OpenSCAP profile and don’t run Vault without SELinux enabled. Clustering makes manipulation attacks difficult as all nodes in a cluster would need to have their storage or config compromised at once. On the other hand, compromising one node would be a sneaky way to listen undetected to Vault operations.

Option 2: Traffic sniffing

If attackers have the option to execute a man in the middle attack obviously there is potential to compromise tokens, secrets, and authentication credentials. There are three different traffic streams that can be compromised:

Client-Cluster traffic.
Server-server traffic (intra-cluster).
Cluster-Cluster replication traffic.

If attackers can manipulate your replication traffic, they can potentially corrupt your replica and ruin a DR failover. Always use valid TLS and no proxy. Make sure non-TLS connections are disabled. Ideally, set an SELinux policy that Vault can only listen publicly on port 8200.

A lot of enterprise orgs use TLS inspectors but Vault is definitely not something you want to access through a TLS inspector or packet shaper. Make sure to add exceptions where you can.

Conclusion

The simplest way to have your Vault compromised is to allow root on an environment. Direct access — especially root access can compromise secrets that aren’t even accessible by the Vault API. Disable even SSH if possible. Network attacks are possible too but offer fewer options than a direct server or cluster attack. Ensure all possible hardening guidelines are followed and all privileged access is revoked during production usage. Attack vectors are easily missed in CI/CD pipelines so be careful to inspect all pull requests closely.