Secrets Management in a Cloud Agnostic World

Published in

Cruise

9 min readAug 1, 2019

Open-sourcing Daytona: Automated Secrets Injection, Fast & Secure.
Authors: Mike Ruth & Brian Nuszkowski

“Private keys used to sign software leaked to ACME Co’s public code repository”

Stop us if you’ve heard this one before. Secrets management is hard and, in the days of tools like Trufflehog, finding secrets in public repositories is easy: just point and click. So it was a day-1 initiative of Cruise’s Security team to ensure we have secrets management as a primary consideration for application and service owners during our software development lifecycle. Before we examine the specifics of what that looks like, we should take a moment to consider some of the challenges that every organization and individual faces when storing secrets.

Thinking About the Secrets Management Problem

One of the first questions that needs to be asked is, “Where should we store secrets? In source? In build artifacts?” Most of the industry agrees that these are not the correct locations, and a primary reason for this lies in an organization’s authorization model. Are employees who are able to read source also the same privileged entities that should be reading application secrets? Probably not. In fact, we’d contend that no humans should be reading application secrets at all.

Sure, we could use something like KMS to store encrypted secrets in source, but a noteworthy downside (of which there are more than one) is the challenge of reusing source code and artifacts in multiple diverse environments. Is our dev-instance application going to be using the same database as our prod-instance application? Hopefully not; we need different secrets for them. But we’re getting ahead of ourselves.

“Where we store secrets depends on what applications and services need them, and where they’re running.”

Some organizations may only have a single location for deploying their applications: just in AWS, just in GCP, just in Kubernetes. It might make sense for these organizations to focus on using the supplied cloud provider’s features here: Secrets Manager, KMS, Kubernetes Secrets. Increasingly however — and this is true for Cruise — organizations aren’t as homogeneous with their deployment locations. With heterogeneous cloud environments, we lose the interoperability that these features originally provided us. What happens when an application running in one cloud environment needs access to a resource in another? This challenge defines whether we choose to store secrets in multiple locations (pending on the applications that need them) or instead choose to centralize to a single secrets management location.

Establishing Identity and Authentication

If we consider using cloud providers’ respective secrets technologies for our applications, KMS and object storage look pretty alluring. Secrets are encrypted, their storage is easily attached to underlying infrastructure, and cloud init or pre-exec functionality is available to decrypt them prior to the application starting. But how does this work for our other environments? What about a car that’s moving around the city and needs to authenticate back to these applications?

Any service that’s deployed into a cloud environment acquires its respective secrets by first verifying it’s allowed to access the secrets management location in which they’re being stored. How do services do this? They need to authenticate.

Authentication validates that an identity is whom they claim to be. So first we need an identity associated with our services to authenticate against. In the figure below, we can see three different service identity primitives that are used to authenticate with primary cloud infrastructure and platforms.

AWS IAM Roles, GCP Service Accounts, and Kubernetes Service Accounts ranked in this order and paired with their icons. — Figure: Cloud and Platform Identity Primitives

AWS IAM Roles: These can be used in many different ways, including in tandem with the AWS Metadata Service and with AWS Signature V4 Signing. Perhaps the best part of AWS’ IAM Roles is that they can be attached to AWS resources (EC2 Instances, for example). They’ll allow applications to authenticate to other AWS resources implicitly, requiring no additional configuration steps to attain proper credential material.

GCP Service Accounts: Similar to AWS, the GCP Metadata Service will sign incoming JWT requests for the Service Account (SA) that is being defined in the request. By default, this is the GCP SA attached to the GCE instance VM. GCP Service Accounts can also sign JWTs if they have the proper IAM permissions. This is required when the Metadata Service is unavailable, such as for authenticating Google Cloud Functions. Also similar to AWS, because GCP SAs are attached to GCP resources, they’ll enable applications to authenticate to GCP resources implicitly.

Kubernetes Service Accounts: All Kubernetes (K8s) requests are authenticated through the API Server. For services running on Kubernetes to perform these actions they require a Kubernetes service account to be bound to the service’s pod. Each Kubernetes SA has a JWT that is provisioned by the internal Kubernetes CA upon the SA’s creation, and must have associated RBAC rolebindings to be granted permission to make requests. Akin to the previous two primitives, Kubernetes SAs will enable applications to authenticate to Kubernetes resources implicitly.

Centralized Secrets Management

We see that each of these providers and platforms has its own way of performing authentication that can work fairly seamlessly — without services needing to manage credentials manually. But consider the question presented earlier: what happens when an application running in Kubernetes needs to access a GCS bucket, or when a GCE VM needs to access an object in S3? A JWT associated with a Kubernetes service account will not be identifiable to a GCP resource (GKE’s new Workload Identity notwithstanding), nor will a JWT associated with a GCP SA be identifiable to an AWS resource. The onus is placed back on the service to manage these credentials, with the added complexity of credential rotation, revocation, and the varying implementations and patterns of each ecosystem. Simply put, we lose the interoperability we previously had when located within a single cloud or platform.

“What happens when an application running in Kubernetes needs to access a GCS bucket, or when a GCE VM needs to access an object in S3?”

When the responsibility of understanding the nuances of these varying implementations and patterns falls back on services and their owners, it generates a large amount of friction. In our experience, this ends up undermining all the best practices Cruise tries to implement, and was the impetus for designing a solution to solve secrets management in a cloud-agnostic way. So how can we simplify this? Does each cloud implementation actually need to vary?

Enter HashiCorp Vault. Like many of Hashicorp’s products, Vault looks to solve a specific set of problems for many different clouds and platforms. Using Vault as our central secrets store allows us to begin creating consistent patterns for storing secrets, as well as authenticating identities. These patterns stay roughly the same regardless of the ecosystem our services are being deployed into. Why is this? In part, due to how we delegate secret paths in Vault.

In order to make sure secrets management is multi-tenant and environment-aware, we use a conventional namespacing pattern for application secrets. This allows us to reduce the blast-radius of leaked secrets. We don’t share secrets across environments, so compromise in a dev-instance application does not yield access to prod-instance secrets.

An table chart that discusses Groups, Permissions, and Path of Vault secret pathing patterns. — Figure: Vault secret pathing patterns.

These patterns also stay fairly consistent across cloud environments for standardizing the authentication of service identities by using Vault Auth Backends. Auth backends leverage authentication methods that are supported by Vault and delegate identity to the respective provider and platform: Okta, AWS, GCP, Kubernetes, and others. To ensure Vault is aware of service identities, role bindings against these Vault backends are required. These roles bind the service identity in the associated provider or platform to Vault Policies, which scope CRUD-like permissions to individual secrets paths. We can begin to see how all of the service identity primitives discussed earlier can now be leveraged to authenticate against the same secrets storage location in Vault, with granularly scoped permissions through the use of secrets pathing and Vault policies.

What we end up losing in this process, however, is the convenience of applications being able to authenticate to cloud resources implicitly. That is, application owners now have an extra set of configuration steps to generate their identity material (PKCS #7 certs, signed JWTs, etc), perform Vault authentication, and fetch their secrets. So again we asked ourselves, “Can we simplify this? Does each application need to individually manage this additional configuration overhead?”

Enter Daytona: Secrets Fast and Secure

A black and white icon that simply reads “Daytona”.

While we went through implementing the same authentication flows for numerous services against each of the different Vault backends, we realized that these too had the same patterns that lend themselves to automation. Previously authentication to, and secret retrieval from, Vault via a server or container was a delicate balance of shell scripts or potentially lengthy http implementations. Instead, a single binary can be used to accomplish most of these goals. So we did just that. Enter Daytona: secrets fast and secure. Daytona allows application owners to offload the majority of the planning about how their services authenticate to Vault, fetch secrets, and place them in the locations that their applications expect them to be. Let’s see how Daytona works.

“Since the Daytona configuration dictates how it will authenticate and acquire secrets, this leaves application owners with significantly reduced implementation decisions and a consistent pattern on how secrets are provided to applications across environments”

While Daytona is a Go binary that largely replicates functionality from the existing Vault client, it also looks to automate a handful of tasks mentioned above with the assumption that our application, acting as the client, will not be interactive. We deploy Daytona packaged as an image, which can then be leveraged in a few different ways depending on the environment: as a sidecar, as a Kubernetes initContainer, or as an entrypoint. With support for AWS, GCP, and Kubernetes Vault auth backends, Daytona can authenticate to Vault using the respective Vault auth backend role to acquire the Vault token, and then with the appropriate Vault policies, fetch its intended secrets. Once secrets have been fetched, it injects those secrets into the location the application expects their secrets to be at deploy-time. This is either in an environment variable (if used as an entrypoint), or file(s) in a memory-mounted tmpfs volume. Since the Daytona configuration dictates how it will authenticate and acquire secrets, this leaves application owners with significantly reduced implementation decisions and a consistent pattern on how secrets are provided to applications across environments. Let’s see an example of what this looks like using an application deployed to a Kubernetes pod.

As we can see from the pod YAML file in the figure below, using Daytona as an initContainer defines:

A pod YAML file where Cruise’s Daytona is being used.

How we plan to authenticate back to Vault (K8S_AUTH)
Which specific Vault auth backend we plan to use (K8S_AUTH_MOUNT)
And its respective Vault role (VAULT_AUTH_ROLE) which checks for the defined K8s service account — awesome-app.
Once authentication completes, Daytona fetches the set of Vault secrets (VAULT_SECRETS_APP, VAULT_SECRETS_GLOBAL)
And places them in a memory mounted tmpfs volume, vault-secrets (SECRET_PATH).

In addition to defining environment variables, we also use a couple kubernetes-defined security controls (securityContext), including running Daytona as a user other than root, and preventing escalation to root while it’s running.

Putting it All Together

Deploying applications in heterogeneous cloud environments requires a reliance on users to self-manage their service’s credentials and secrets when communicating between clouds. Centralizing secrets was the chosen solution to remove that reliance on our users.
Vault enables applications to still rely on the underlying cloud infrastructure’s identity primitives to authenticate and acquire secrets. Using Vault’s auth backends, roles, and policies provides a consistent pattern for authentication and granular authorization across these cloud environments.
To simplify secrets retrieval, Cruise created Daytona. This enables automation of secrets fetching for applications, and simplifies the number of decisions application owners face when deploying their applications.

We can see all of these items demonstrated in the secrets injection workflow diagram below.

A figure where Daytona secrets injection as a K8s initContainer. — Figure: Daytona secrets injection as a K8s initContainer.

But wait, there’s more! Experience Daytona for yourself.

In an effort to facilitate a better story and strategy around cross-cloud secrets managements, Cruise recently open sourced Daytona. Anyone interested can find it on Cruise’s Github page.

Interested in helping us build a more secure autonomous vehicle? Check out our open positions.