Introducing the Uber SSH Certificate Authority

Peter Moody, systems security engineer

Starting today, security professionals can use Uber’s PAM_SSH module to enable continuous reauthentication of SSH keys. The module is free to use under the MIT License here:

SSH uses public key cryptography to authenticate network connections. Users and hosts generate a public/private keypair, keeping the private key secure while making the public portion available on remote machines. To enhance security, many enterprises require a second factor (e.g. Duo) in addition to pubkey auth to protect against the use of stolen private keys. This is a very common model as it’s significantly more secure than relying solely on phishable credentials like passwords.

There are several challenges with his model, however. First, all of these public keys must be managed and if done incorrectly can easily introduce additional security risks. Second, losing control of a private key becomes more likely the longer the key remains valid. Therefore, keys should ideally expire after a set amount of time, requiring users to re-authenticate. But SSH keys don’t expire on their own and the process for manually invalidating them is often brittle. Finally, constantly prompting for two-factor can lead to warning fatigue and ultimately condition employees to carelessly accept two-factor requests.

To address these issues, Uber’s security team created its own SSH Certificate Authority which gives us the ability to inventory public keys, enable automatic expiration of SSH keys, and improve host authentication.

SSH Isn’t Enough

Typically when pubkey authentication is enabled, if a public key is added to an authorized_keys file, sshd will always trust a connection if the user has the corresponding private key. It will trust that connection even if it’s coming from an untrusted machine like a personal iPhone, a laptop of a former employee, or a hotel kiosk. There is nothing binding the private key to an individual machine and there’s nothing inherent in the format of the key itself to invalidate the key when the relationship with the user changes.

Relying on per-user authorized_keys files is error-prone and should be avoided if possible. Traditionally, security teams relied on some automated process like puppet or ansible to manage their authorized_keys files. But if someone manages to install an authorized_keys file that the management process doesn’t know about or the process fails to remove a public key, then access could be accidentally granted to an unauthorized user.

Another important aspect of SSH that rarely gets the attention it deserves is how clients use host keys to authenticate hosts. The first time a client connects to a host, the host presents its public key and the client compares that key against its saved known host keys. If the key can’t be matched, the user is usually given the choice to save the unknown key and proceed or reject the key and terminate the connection. If the hostname is found in the known_hosts file, but the key doesn’t match, the user is shown a key mismatch error and told they need to remove the existing key if they want to connect.

Conventional advice found online for dealing with these errors is to accept the unknown host key, removing the mismatched if present. This is so common that ssh-keygen(1) provides a flag to remove a previously saved host key. The rationale is that when dealing with large production networks this is almost always caused by hosts being automatically reinstalled. This advice is wrong.

We wouldn’t dream of telling someone to ignore ssl certificate errors on other systems such as a banking website or social media, so why would we ever give this advice for SSH connections?

SSH Certificates

SSH certificates were introduced nearly seven years ago with OpenSSH 5.4 and have been used at Uber for several years. They’re part of SSH’s pubkey authentication system. At a high level, ssh certificates are made up of:

[[ssh-key + metadata] + signature + signing key]

They were perfectly suited to solve the issues of key lifecycles, management, and authentication for a number of reasons:

  1. Auto expiration: Metadata includes (among other things) a validity period, so we can easily auto-expire certificates. If an employee loses access (e.g. they leave the company), their existing cert will expire and they won’t be able to generate a new one.
  2. Two-factor authentication: This can be moved from when a key is first presented (e.g. on a bastion or jumphost) to when the key is obtained, reducing the number of times an authorized user has to enter a two-factor code. This cuts down on warning fatigue.
  3. CA public key: The only key needed to verify an ssh user certificate is the public portion of the signing key (the CA public key). Rather than adding an entry to every authorized_keys file and worrying about removing it when the employee no longer needs access, we can distribute a single public key and leverage SSHD to perform a signature verification.
  4. Host keys: Points 1 and 3 also apply to host certificates. Host certificates are time-bound similar to user certificates and clients only need a single publickey to authenticate all hosts presenting host certs. Instead of asking users to accept new host keys or remove existing entries when a host is re-installed, we can distribute a single public key to every managed client so they can authenticate the host certificate presented by any corporate host. This has exciting implications for things like transparently re-provisioning machines or load-balanced SSH bastions.

The Uber SSH Certificate Authority & PAM-USSH Module

At Uber, we needed something that was interoperable with our existing authorization protocols and could support issuing both user and host certificates. Nothing like this existed on the market, so we decided to write our own. We wrote it in go because it has the best ssh library support, including native support for certificates. The final solution became the Uber SSH Certificate Authority (USSHCA).

SSH certificates are great for authenticating a user at a single point in time, namely when they first access a given machine. After that point, however, even if a certificate expires, the login session will remain active. To resolve this problem, we also wrote a pam module which can authenticate a user based on the continued validity of a user’s SSH certificate. Today, we’re making this module available to the security community for free here:

How it Works

USSHCA issues ssh certificates to authenticated Uber employees. The certificate lifespan as well as where/how the certificates can be used are configurable on a per-user or per-group basis. Because the CA supports pam(8), additional auth options are possible including ldap/AD/Duo/local password, etc. as well as pubkey auth or even no-auth.

An employee gets a ussh certificate when they run the ussh command. This connects to the USSHCA, performs the pam conversation and forwards the client’s ssh agent to the CA. If the client successfully authenticates, the CA generates a new ssh key, populates the associated cert with the configured information (validity period, the user it’s valid for, the options permitted, etc.) and adds both the key and the certificate to the remote agent. The certificates are added to the agent with a timeout telling the agent to remove the keys when the certificate expires.

In reality, almost no one runs ussh directly. If the ussh command is given any arguments, it executes the equivalent of ‘ssh argv[1:]’. So through the power of a centrally managed ssh_config, all of this happens automatically when a user tries to connect to an uber host and they don’t have a valid cert.

Host Certificates are Critical

The metadata for ssh certificates also includes a list of valid principals. For user certificates, valid principals are the users or groups for whom the certificate is valid. For host certs, they are the hostnames for which a certificate is accepted. Including these in certificates provides the following features:

  1. Load balanced SSH connections: Host certs allow multiple hosts to share a principal without needing to share the same private key. Prior to host certs, if two hosts had different keys but the same hostname, clients would error when connecting to the second host.
  2. Confident authentication: You can avoid flooding users with “can’t verify host key” warnings that ssh clients display when connecting to a host for the first time. Since the client is validating the signature and not the underlying host key, it doesn’t matter that the host hasn’t been seen before. Furthermore, when a host cert is validated, the host key isn’t saved.
  3. Prevent error messages: Using host certs prevents “host key has changed” errors after reinstalling a host. Prior to certificates, re-installation typically meant regenerating hostkeys. This meant that anyone connecting to a host after re-installation would be presented with an SSH error about changing host keys.

The last two features are particularly important for an enterprise security team. When connecting to production or development infrastructure from authorized machines, employees don’t see host key warnings. If they do, they immediately know something is wrong and can report it to the security team for investigation.

Moving Forward

We’re already moving toward cert revocation and experimenting with different key types. RSA is universally supported across the industry, but some newer crypto is more cost effective to generate and verify. USSHCA already supports newer key types and we look forward to collaborating with the community on future developments.