Single Sign On via Consensus

Kareem Moussa
The Coinbase Blog
Published in
7 min readDec 13, 2018

The Infrastructure Team at Coinbase has the goal of enabling any engineer in the company to quickly and securely access and deploy complex infrastructure. This effort started with our secure deployment pipeline Codeflow, was extended by our codification tooling GeoEngineer, and utilized by our blockchain infrastructure project Snapchain.

Our latest project to empower engineers was to make it easy and safe to elevate their own permissions temporarily to perform complex infrastructure changes.

Everything that engineers do at Coinbase is locked down by a mechanism that implements consensus. In order to interact with any production environment you must have a quorum of engineers approve the permissions, code, and configuration. This creates strict guardrails around making changes to our production environments along with an audit trail. This also enables us to secure customers funds with confidence.

Our philosophy of consensus also applies to access to critical services such as AWS and GitHub since our production services depend on them. In the past we have manually onboarded employees onto such services with consensus and an audit trail. Manually provisioning accounts to services has been easy for us to do until this year. In 2018 Coinbase has experienced incredible hypergrowth growing from 200 to almost 600 employees. This means that the number of employees joining per week has increased dramatically. Manually provisioning accounts resulted in operational toil. This is an obvious place for us to eliminate toil through automation.

We have built a Single Sign On (SSO) system that fulfills our consensus philosophy by protecting all changes to a user’s permissions via consensus to eliminate this source of toil. The system that we built had the following requirements to meet our high security and productivity standards:

  • Reduce the manual toil to maintain user accounts through centralized management
  • Full codification of users’ permissions
  • Audit trail of users’ permissions over time
  • MFA for all authentication, ideally push based
  • Highly available and 12 factor, allowing for blue/green deploys
  • Minimal surface area for vulnerabilities
  • Help us scale 10x more engineers to 10x more critical services with ease
  • Work with our current workflows e.g. `assume-role`

To build this identity provider (a service that authenticates users on behalf of other services) we use a combination of SAML, LDAP, and consensus.

SAML (Security Assertion Markup Language) is the defacto enterprise SSO protocol. It is used to send cryptographically signed assertions about a principal (ie. their permissions) to service providers like AWS and GitHub. These assertions are used to authorize users into their platform. SAML profiles describe the different request-response protocols that identity providers and service providers can use to communicate with each other. SAML bindings describe which lower level communication and messaging mechanisms are used in the steps of SAML profile specifications.

LDAP is a tried-and-true directory service that is typically used to represent organizations in a tree-like structure. It also has secure native authentication mechanisms for users.

In order to understand how consensus is used to protect changes to users’ permissions, we will first explain how consensus is used at Coinbase.

Consensus at Coinbase

Software development process at Coinbase utilizing consensus. (Heimdall is licensed under CC BY-SA 3.0).

In the software development process at Coinbase engineers can only deploy code to production environments that meet a specific set of checks and requirements. These checks and requirements are numerous but one of the key requirements is that all deployed git branches much be checked via consensus by a tool we wrote called Heimdall. This tool enforces an immutable git history that has ensured all commits have consensus.

The general software development process to deploy code to production environments is as follows:

  1. Engineer creates a pull request to a protected branch with immutable history (ie. master).
  2. N qualified reviewers engage in a code review process, where N is configurable on a per-repository basis. N depends on how sensitive the repository is or is not.
  3. After all qualified reviewers ensure that the code is of high quality they may approve the pull request (ensuring consensus). A webhook triggers to notify Heimdall that all commits of the pull request have consensus.
  4. The engineer merges the pull request into the git branch with immutable history. Heimdall marks the new merge commit with consensus. This ensures that all commits to the protected branch have consensus.
  5. The engineer attempts to deploy a commit to a production environment with our secure deployment pipeline Codeflow.
  6. Codeflow asks Heimdall if the commit has consensus. If and only if it has consensus the deploy initiates!

The Single Sign On System

Architecture of the Single Sign On system.

In our configuration of LDAP we have two directories — users and groups.

The groups directory describes which groups users are a part of. Service providers use this to translate into permissions specific to that service.

When an engineer would like to elevate their permissions to a service they make a pull request to a repository that is used to build the groups directory. This repository is protected by consensus with Heimdall. This repository then updates the groups directory which is served from a read-only filesystem. The git commit history creates an audit trail which is one of our requirements for compliance.

The users directory contains information about users as well as their cryptographically hashed passwords. Users authenticate against this directory as well as MFA with a push notification from Duo Push.

To allow LDAP to be blue/green deployed in a highly available mode by fulfilling the 12 factor requirement of having stateless instances we use the slapd-sql module for the user directory. We store the data in Postgres (Amazon’s RDS) instead of on disk.

In our SAML identity provider which service providers interact with, we use LDAP as the source of truth when authenticating and authorizing users.

We decided to create a custom SAML identity provider instead of using any third-party SAML identity provider for these critical services because they require an administrative user/account to provision permissions. This would violate our requirement of consensus because a single user/account would have access to make changes. An administrative user/account could maliciously elevate a user’s permissions and compromise our system. Using a third-party also introduces vendor risk. We were not willing to compromise on this risk for our identity service. If they were compromised they could maliciously create SAML responses to elevate an unauthorized user’s permissions to a critical service, and perform malicious infrastructure changes.

Creating a custom SAML identity provider also allowed for future flexibility and integrations with various service providers.

Putting It All Together

Full process of signing into a service provider.

To setup a service provider to integrate with our system, we use consensus mechanisms to create a Trust Relationship between a service provider and our identity provider, as described in the diagram. We configure both with public X.509 certificates as well as SAML metadata. This allows them to communicate with each other and prove each other’s authenticity.

We use the very popular SAML profile, Web Browser SSO Profile, to integrate with service providers. The process to sign into a service provider is as follows:

  1. The user agent requests to authenticate with the identity provider.
  2. The identity provider delegates to LDAP for authentication. In our implementation, we use Duo Push for MFA. If the user successfully authenticates, information about that user’s groups is returned.
  3. The identity provider then packages this information in a signed SAML response and returns it to the user agent.
  4. The user agent sends the SAML response to the service provider on behalf of the identity provider. The service provider interprets the response and authorizes the user. This grants the user the requested permissions based on the response.

This process allows engineers at Coinbase to seamlessly login to critical services where their permissions are protected via consensus.

Conclusion

Using our SAML identity provider has automated onboarding engineers onto our critical services instead of requiring us to manually provision them. In addition to the security benefits this saves us a lot of time and is one of the best ways to handle the hypergrowth of our engineering team. This enables us to focus on what we do best — building the most secure and innovative cryptocurrency platform in the industry.

If you are interested in cool infrastructure challenges such as this, and want to create a platform that empowers engineers with both speed and security, we’re hiring! Check out the careers page at coinbase.com/careers.

Unless otherwise indicated, all images provided herein are by Coinbase.

This website may contain links to third-party websites or other content for information purposes only (“Third-Party Sites”). The Third-Party Sites are not under the control of Coinbase, Inc., and its affiliates (“Coinbase”), and Coinbase is not responsible for the content of any Third-Party Site, including without limitation any link contained in a Third-Party Site, or any changes or updates to a Third-Party Site. Coinbase is not responsible for webcasting or any other form of transmission received from any Third-Party Site. Coinbase is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement, approval or recommendation by Coinbase of the site or any association with its operators.

--

--