SSH Certificate Injection at scale for HashiCorp Boundary sessions

Published in

HashiCorp Solutions Engineering Blog

14 min readNov 27, 2023

This blog assumes basic knowledge of Boundary and Vault. For more information around each product, please visit the Boundary and Vault pages, respectively.

In my previous blog, I discussed a multi-hop worker deployment for Boundary, which solved the problem where inbound network traffic into private networks is prohibited. In the tutorial, lateral access was restricted in the network and the resources remained secure in their private network(s). However, other security issues were still prevalent — namely, credential storage, rotation, and distribution for end users to actually connect to the resources.

In this blog, I will highlight the operational and security benefits that leveraging Boundary and Vault provide to an organisation. Throughout this blog I will also walk through key aspects of a Boundary and Vault deployment to provide those dynamic, ephemeral, injected credentials into an SSH session, all driven through Terraform. This will allow practitioners to take a working example of this integration to demo, explore and potentially use as a baseline for any proof-of-value. Furthermore, I will discuss how organisations have the ability to leverage Boundary to achieve certificate management at scale, which encompasses having the ability to audit actions and interactions within Boundary and target resources.

This example does not intend to serve as a best practice / HashiCorp recommended practice, but more so to provide an example of how it can be achieved and to allow practitioners to take a working example to explore and build upon. The code for the deployment can be found here.

Setting the scene

Before we start this journey, I wanted to briefly discuss an example workflow that many organisations we speak to have created and built upon over time.

Focusing specifically here on credential management, arguably the credentials are most likely to be long-lived, they are most likely to be static and seldomly rotated. Yet when we have completed the ticketing process to glean all the required credentials, we are handing them over to differing personas within, or outside of, our organisation to use. This, combined with a workflow that promotes lateral access within a network, drastically increases the risk of network breaches and data exfiltration, which has contributed to the year on year increase in cost that businesses have to pay as a result.

Not only do we want the differing personas to have a more focussed catalogue of targets they can connect to, based on their role within the organisation, but to also create a more transparent workflow, where users connect securely, but are unaware of what the credentials are.

Boundary and Vault integration

Boundary has the ability to offer brokered and injected credentials using Vault, which provides dynamic and ephemeral secrets. With brokered credentials, Boundary can return the credentials to the end-user to then take and use to log into the target. With this, the end-user is exposed to the credentials, but as we can leverage Vault, it can provide us with dynamic, ephemeral credentials. So despite being presented to the end-user we can enforce a strict TTL and control the lifecycle of them.

From 0.12, Boundary added support for SSH certificate injection by leveraging the Vault SSH Secrets Engine. For injected credentials, Vault can provide Boundary with short-lived SSH certificates that Boundary can inject into a session to facilitate a secure SSH connection. This results in the end-user never seeing what the credentials are and once the session is terminated, those credentials terminate with them. This drastically improves the security posture and improves the overall workflow of secure connectivity to resources.

Deployed architecture

Before going into some of the specific configuration aspects, it is important to visualise what we are going to be deploying. The below diagram is not trying to represent a production environment, but more so designed to be created with speed in order to demonstrate the feature within Boundary and Vault. We will deploy a single VPC and a public subnet, which our target will reside in to facilitate ease of access to show the SSH injection capability

Key pair generation

Depending on your configuration, there are two methods of generating the key pair, which Vault will eventually sign. Who generates the key pair is the result of which Vault endpoint you select — /issueor /sign.

With the /issue endpoint, which was added into Vault 1.12.0, Vault will generate the key pair to then sign. The/signendpoint will result in the Boundary controllers generating the key pair to then send to Vault.

We will be using the /sign endpoint, so the traffic flow, as per the diagram below, will be as follows:

Boundary will generate the key pair which will need to be signed by the CA (Vault).
Vault will sign the certificate and send it back to the Boundary controllers.
The Boundary controllers will then send the credentials to the worker associated with the target so it can be injected into the session.

Boundary target type

When configuring the target ‘type’, you have the ability to configure it as TCP or SSH. If you are setting this to TCP, you must include the default_port attribute in the Terraform provider to ensure we have the correct port configuration set for the connection to the target (22 in the case of SSH). When we are using Terraform to manage the lifecycle of our targets, each time a target is destroyed and re-created it will result in a new target being created with a new target ID within Boundary. If we wanted to change that target ‘type’ to be SSH, instead of TCP, to leverage the injected application credentials feature, it has to be done before we apply our Terraform configuration. It cannot be changed after the target is created.

In order to establish a session with an SSH target, it must be configured with at least one injected application credential. It is not possible to add injected application credentials to a TCP target. If you do try to configure a TCP target with injected applications credentials via Terraform it should fail. If you try to configure it via the CLI, the configuration should also fail.

Configuration with Terraform

As with all of the HashiCorp products, everything can be created and managed via Terraform — Boundary and Vault are no different and have their official providers to consume. I am not going to cover the general AWS resource creation, networking or security groups, as this should be familiar to most, and this is already covered within my Git repo. I will focus on the Boundary Vault token, Vault policies and ensuring our target trusts Vault as the CA.

Boundary Vault token

The token that Vault creates for Boundary needs to have certain attributes. It needs to be periodic, so we don’t have to worry about revocation that would have an adverse effect on the operation of the integration between the two products. The token also needs to be an orphan token to ensure we do not run into the same issues as previously mentioned. As a result, the configuration should look similar to the below:

resource "vault_token" "boundary_vault_token" {
  display_name = "boundary-token"
  policies     = ["boundary-controller", "ssh-policy"]
  no_parent    = true
  renewable    = true
  ttl          = "24h"
  period       = "24h"
}

In relation to how the Vault token lifecycle is managed, there is a job scheduling process, which is a synchronisation between the Boundary controllers to ensure that only one controller is renewing and revoking Vault credentials at any given time. This mitigates against an overloading of Vault, as well as the Boundary database with duplicate requests.

The renewal process is initiated from the controllers and they track any renewals that need to be done. However, in a production environment, we may not have Vault sitting in a public network as per this example. In this instance, the request to Vault will likely be routed through a worker if the Boundary controller does not have access to the private network. This is where we would leverage worker filters to direct the request to the correct worker that would be associated with Vault. There is a tutorial around injected credentials with private Vault here, for reference.

Vault policies

There are two policies in Vault that are required to be configured. One is for the Boundary controllers, the other is for the SSH secrets engine.

Boundary controllers policy

The following Vault policy needs to be created. These are the set of Vault endpoints needed for Boundary to maintain the token, regardless of which Vault secret engines the credential will be used to interact with.

If you are configuring multiple Vault Credential Stores in Boundary, each of the tokens will need this policy so that Boundary can renew the token as well as renew/revoke any leases from secrets issued by that token. We then have the ability to leverage separate policies for the specific secret engines they need.

resource "vault_policy" "boundary_controller_policy" {
  name   = "boundary-controller"
  policy = <<EOT
path "auth/token/lookup-self" {
  capabilities = ["read"]
}

path "auth/token/renew-self" {
  capabilities = ["update"]
}

path "auth/token/revoke-self" {
  capabilities = ["update"]
}

path "sys/leases/renew" {
  capabilities = ["update"]
}

path "sys/leases/revoke" {
  capabilities = ["update"]
}

path "sys/capabilities-self" {
  capabilities = ["update"]
}
EOT
}

It is worth highlighting that all of the above, with exception of “sys/leases/revoke” are part of an unmodified ‘default’ policy in Vault that every new token will have by default. However, to ensure a more holistic and comprehensive policy, these have been combined to form a dedicated policy for the Boundary Controllers.

SSH secrets engine policy

The following Vault policy needs to be created for the SSH secrets engine:

resource "vault_policy" "ssh-policy" {
  name   = "ssh-policy"
  policy = <<EOT
  path "ssh-client-signer/issue/boundary-client" {
  capabilities = ["create", "update"]
}

path "ssh-client-signer/sign/boundary-client" {
  capabilities = ["create", "update"]
}
EOT
}

It is worth re-clarifying that depending on who will create the SSH key pair, i.e. Boundary or Vault, you have the ability to remove the path to the /issue or /sign endpoints. If you know that Boundary will consistently be the method to create the key pairs, then you can remove the /issue path of the policy. Subsequently, if it will always be Vault, you can remove the /sign endpoint.

Trusting Vault as the CA

As part of this demo it is efficient to use cloudinit configuration to obtain the public key from Vault and make the requisite changes in the sshd_config file or other relevant file depending on the architecture you are deploying to.

data "cloudinit_config" "ssh_trusted_ca" {
  gzip          = false
  base64_encode = true

  part {
    content_type = "text/x-shellscript"
    content      = <<-EOF
    sudo curl -o /etc/ssh/trusted-user-ca-keys.pem \
    --header "X-Vault-Namespace: admin" \
    -X GET \
    ${var.vault_addr}/v1/ssh-client-signer/public_key
    sudo echo TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem >> /etc/ssh/sshd_config
    sudo systemctl restart sshd.service
    EOF
  }

For production environments, it may be more prudent to utilise HashiCorp Packer to create that golden image and embed the relevant changes to the sshd configuration.

Boundary certificate management at scale?

Using SSH keys from an operational perspective is trivial. You add your public key to a host, and authorise and authenticate yourself using your public/private key pair.

The problem with SSH key pairs, however, comes with scale. Users have to manage their public keys across all of their infrastructure, trusted access to a machine is defined by the individual key pair, and it’s not uncommon to find unknown public keys in the authorized_keys file of your servers. As you look to connect to more hosts, best practice would encourage SSH rekeying. Users often do not do this due to the complexity of managing tens or hundreds of Public keys for sets of hosts, and these key pairs become trusted permanently across your infrastructure.

SSH certificate-based authentication extends key-based authentication with digital signatures. The user’s authenticity is based on possession of a certificate signed by the trusted certificate authority (CA). Instead of having to provision and deprovision keys across your infrastructure, SSH certificates allow you to specify:

How long a certificate is valid for.
Who will gain access to a target host.
Which users the certificate holder can login as.

Policy becomes centralised, and you avoid having to use the authorized_keys file. SSH certificates, unlike SSH key pairs, are short-lived and self-destructive.

Boundary has a concept of credential libraries which is a resource that provides credentials of the same type and same access level from a single credential store. We have a specific credential library for SSH injection as shown below.

resource "boundary_credential_library_vault_ssh_certificate" "vault_ssh_cert" {
  name                = "ssh-certs"
  description         = "Vault SSH Cert Library"
  credential_store_id = boundary_credential_store_vault.vault_cred_store.id
  path                = "ssh-client-signer/sign/boundary-client"
  username            = "ec2-user"
}

The potential issue we have here is with accountability and traceability. We define a username to match the default username of the device we are connecting to. In an AWS Linux EC2 instance we know that the default username is ec2-user or if it’s an Ubuntu flavour we know that the username will be ubuntu.

However, if multiple users wish to connect to the same target at the same time and we need that accountability and traceability with a feature like SSH session recording, how can we correlate who did what if we have the same username?

With Boundary we have multiple ways to address this. If your organisation isn’t using SSH session recording and a username is shared on the target host, there will still be separate sessions in Boundary associated with separate users. Within the audit events, or via the data warehouse, this information could be examined when investigating a potential incident on a target. There may be a little work to do here, but the information is there to extrapolate and analyse.

If SSH session recording is enabled, then there is even more that Boundary provides, in arguably a more readily available state. When the users authenticate with their IdP, or username and password to access a target, during the creation of that session, the metadata associated with the recording will highlight which user has accessed the resource, even if the username is shared on the target device.

The below SSH session recording screenshot has the user name set to danny.knights, yet we are using a single username (ec2-user) to access this resource, therefore providing that granularity into what operations were performed on a target and by whom, irrespective of having a shared username on the target.

From an organisation’s perspective, there may be good reasons for sharing usernames on the hosts, such as making it easier to configure unix permissions, configuring common reusable scripts, resource limits and also simplifying the overall provisioning of the hosts.

For other organisations though, there may be compelling reasons to have multiple users on a host; even as fine-grained as a 1:1 mapping between users on the host and Boundary users. Potentially if users each have their own scripts or files that they use from that host. In this situation, Boundary, as of release 0.11, can use Vault credential library parameter templating. This gives the ability to provide information about a Boundary user or account when making a call to Vault, so we can pivot away from statically hard-coding what the username is and have our certificates generated by Vault with the username taken from when the user authenticated into Boundary. This means when we authenticate into Boundary with our IdP, and our username is an email address, e.g. danny.knights@hashicorp.com, we can do the following within the credential library configuration in Terraform:

resource "boundary_credential_library_vault_ssh_certificate" "vault_ssh_cert" {
  name                = "ssh-certs"
  description         = "Vault SSH Cert Library"
  credential_store_id = boundary_credential_store_vault.vault_cred_store.id
  path                = "ssh-client-signer/sign/boundary-client"
  username = "{{truncateFrom .Account.Email "@"}}"
}

Now when our certificate is created our username will be that of our IdP login email, but removing the “@hashicorp.com”.

Of course we also have to think about the target we’re connecting to. We have to ensure that the target has a hostname that matches the one that we are templating. Again, for this demonstration this can easily be done with cloudinit configuration:

data "cloudinit_config" "ssh_trusted_ca" {
  gzip          = false
  base64_encode = true

  part {
    content_type = "text/x-shellscript"
    content      = <<-EOF
    sudo adduser danny.knights
    EOF
  }

When we gain access to the target it will be from our username. Being able to utilise the templating feature results in more fine-grained control when it comes to understanding who is connecting to targets. From a configuration standpoint we are not over-polluting our Terraform code base by having multiple credential libraries and multiple duplications of a single target, which would then be specifically associated with a particular user.

Do I really need Vault?

As with most questions, answers from Solutions Engineers tend to be “it depends” and that rings true here. Your organisation’s appetite for risk and rotation come into question when making that decision. With Vault, we get those ephemeral credential benefits and the fact that they are not stored anywhere during the workflow not only means we can remove the worry of credential rotation, but the risk of exfiltration of credentials.

However, Boundary does have a native credential store and the ability to configure a credential using a username, private key and optional passphrase. In terraform it looks like this:

resource "boundary_credential_ssh_private_key" "static_ssh_key" {
  credential_store_id = boundary_credential_store_static.static_cred_store.id
  private_key         = file("../boundary.pem")
  username            = "ec2-user"
}

This resource will take the static credentials from the boundary.pem file and add them into Boundary, so when the user goes to connect to the target they do so without entering any username or needing to know / have the .pem file associated with the target, i.e. they are injected. However, there will be more operational overhead with this method around the lifecycle of the static credential.

For example, if your organisation has 200 hosts and you do not want to use Vault for the creation of the credentials, you have to then manually create a single SSH key pair and add the private key to the Boundary static credential store. The public key will then need to be added as an authorised key to all 200 of those hosts. Of course, this is possible via a tool like Terraform or some custom script, however, now this single credential is being reused between all 200 hosts, which increases the risk and blast radius if it does get exposed. In this situation, what would be prudent to do is to regularly rotate this credential as part of a good security posture, but these things take time and operational effort to complete. The other time organisations would be more willing to rotate it is when you’re being reactive and responding to an incident. In this situation you would need to:

Update the Boundary static credential store.
Terminate all sessions that are actively using the old key.
Update all hosts with the new public key and remove the old public key.

This whole process isn’t made any easier by trying to avoid sharing the secret between hosts or between users. It just means there are more keys to distribute between hosts!

The real benefits of using Vault SSH Certificates is that the hosts only need to be provided the public key of the CA. You can then have short-lived and narrowly scoped credentials with very little operational overhead and cost to the business, whilst improving the organisation’s security posture and overall workflow.

Summary

The integration of HashiCorp Boundary and Vault, focusing on SSH credential management, introduces a robust solution for dynamically creating and managing SSH certificates at scale. Vault’s capability to generate SSH certificates on-demand adds an extra layer of security, reducing the risk associated with static credentials or manually generated certificates.

Boundary dynamically injects the certificates during runtime, ensuring fine-grained access control and reducing the potential for unauthorised access or credential exposure. This approach optimises security, streamlines compliance, and simplifies administrative tasks, all within a centralised, scalable framework. It’s particularly beneficial for organisations operating in multi-cloud environments, offering a comprehensive, centralised solution to efficiently create, manage, and secure SSH certificates in a dynamic and highly secure manner.

To try this solution for yourself, the GitHub repo can be found here. To sign up for a free trial of HashiCorp’s self-hosted Boundary and Vault deployments, the HashiCorp Cloud Platform provides you with $50 free of usage. Enjoy!