Creating Robust SSH with HashiCorp Vault Certificates
Secure shell (SSH) is a standard way of accessing server infrastructure in the IT industry. By contrast, no single standard solution exists for managing users’ SSH access in an automated and secure manner. Let us explore how we tackled this challenge at GoodData.
Our Requirements
Given the diverse and globally distributed platform infrastructure we operate at GoodData, secure shell (SSH) is a core way of accessing our systems in order to perform necessary operations and debugging. It is therefore vital for us to have an SSH authorization method that is:
- secure in accordance with the latest industry standards;
- robust in its performance; and
- easy to use for our employees.
Having utilized FreeIPA to provide SSH access management for multiple years, a strategic decision was made in late 2020 to switch to Okta as the company-wide identity management solution. This transition presented us with an opportunity to refresh our access control setup and ensure its consistency with current security standards.
Searching for a Solution
We identified insufficient scalability as the primary shortcoming of our previous SSH setup. With FreeIPA, every SSH login attempt requires the user’s public SSH key to be verified against the server, rendering FreeIPA a single point of failure and preventing any user logins to target machines in case our FreeIPA servers experienced an outage or maintenance.
We set out to find and implement a solution that would avoid this pitfall; one that would be robust enough to support daily operations on the scale we require, i.e. managing access to hundreds of servers distributed across the globe.
Additionally, deprecating long-lived SSH credentials from the ecosystem was another goal we deemed important. There was an inherent security risk stemming from static SSH keys being present on employees’ devices, as they could possibly be stolen and used for access to our infrastructure. While we had partially mitigated this risk using multi-factor authentication, we were striving for an ideal solution that would only rely on short-lived, dynamic credentials. This way, any leaked credential would not pose a security risk, because it would expire very quickly.
After extensive research, we selected HashiCorp Vault to provide SSH access management via its dynamic SSH certificates engine.
Technical design
Since Vault had already been present in our infrastructure as a store for static key-value credentials, extending its role by adding the SSH certificate signing backend was fairly straightforward.
User Authentication
For user access, Vault has been integrated with Okta using the OIDC authentication method. We defined several user login roles corresponding to varying levels of access granted to distinct groups of users; each role is tied to a specific OIDC claim value provided from Okta. Every Vault login role, in turn, allows signing SSH certificates by a corresponding SSH backend role. The SSH certificate key ID configuration ensures that user role information is preserved in the resulting signed certificate.
For example, if a user first.last belongs to group1, they can login to Vault using login role group1. Vault policy will also allow them to sign a certificate using SSH role group1, and the resulting certificate’s key ID will be okta-first.last:group1.
SSH User Provisioning
In absence of a centralized SSH authorization server, a challenge remains: how to determine which users are allowed to connect to a specific server? To resolve this, we have defined two distinct SSH login scenarios we needed to support:
- login to externally exposed jump stations; and
- further hops to internally accessible servers.
On jump stations, we require the users to login under a personal non-privileged account, in order to provide a safe environment to access systems using personal credentials (such as AWS IAM Identity Center logins or reading secrets from Vault itself). Therefore, a simple Python script is periodically run on the jump station servers to fetch a list of relevant users from Okta API and create a system account for each of them.
On all other servers, a single shared user account is sufficient for all users logging in via SSH; therefore, no periodic user synchronization from Okta is required.
By this splitting of use cases, the need to deliver Okta credentials is limited to a handful of jump station nodes while we still keep the level of user access separation we require.
User Authorization
The process of authorizing users utilizes the sshd AuthorizedPrincipalsCommand directive; three pieces of setup are pre-delivered by our configuration management tool to facilitate this:
- Vault SSH certificate authority public key;
- list of Vault SSH roles allowed to login to a given server; and
- an authorization script to be called upon user login.
When a user attempts to login to a server, AuthorizedPrincipalsCommand is called, with the %i (certificate’s key ID) and %u (username) parameters provided to our authorization script. The script parses the user’s Vault role from the key ID (making use of its static format described above) and compares it against the list of roles allowed to login. On jump stations, it additionally verifies whether the user is logging under their own personal account name. A second factor verification is also hooked into the login process for the jump stations case.
Benefits and Lessons
Using Vault’s SSH certificates engine, we have been able to transition to a much more robust and secure login system than before. Users across the company appreciate the system’s stability; since authorization is done with information statically available to SSH servers and no live communication with Vault is needed at login time, user experience is much smoother than previously. Security has also been improved by eliminating static, long-lived SSH keys in favor of quickly expiring SSH certificates.
We have encountered minor challenges of the new setup. Having a shared system account for all users on a majority of servers means that auditing user actions is slightly harder; we have worked around this by parsing the certificate key ID into an environment variable upon login. Commands such as who are not useful since introducing the new setup. Retraining users to the new Vault-based SSH login workflow was an initial hurdle we have been able to overcome with time.
Overall, though, we conclude that the new way of doing SSH at GoodData provides more security and a better user experience than the past solution based on static SSH keys.