Workload Identity Federation for On-Premise Workloads with SPIFFE

Christoph Grotz
Google Cloud - Community
5 min readApr 9, 2022

Today I wanted to share a very cool project from the CNCF with you: Secure Production Identity Framework for Everyone (SPIFFE) is a universal identity control plane for distributed systems. I can highly recommend you watch the great keynote Kelsey Hightower gave at KubeCon two years ago, which gives a great overview over SPIFFE and what it’s used for.

But since you might not want to watch the video just yet, here is a quick TL;DR to what SPIFFE is trying to do. So imagine you have an application running on a VM. How does the application access a database server? Easy, you configure the username and password, maybe as an environment variable. We all know that handling secrets in this fashion is a big pain point. Rotating them and actually keeping them secret can be a tedious task. All of this is usually due to us not considering a simple question: What is the identity of our application?

In the cloud there is usually a service account concept that binds an identity directly or indirectly to a workload. In traditional datacenter architectures VMs don’t have explicit identities. Sure they have a name, but they don’t have an identity with which they can Authenticate and Authorise against other services. Some organisation use MTLS for this and place certificates on each machine. When using Kubernetes there are handy tools like Istio which try to make this process easier. But having an interoperable approach is often still elusive. For example Google Cloud APIs won’t accept MTLS for AuthN and AuthZ. SPIFFE tries to fill this gap and with Spire CNCF provides a production ready implementation for this. If you have workloads running in your datacenter you really should check out SPIFFE and Spire.

Many DevOps find it quite challenging managing workloads on-premise, that communicate with GCP APIs. I wanted to show you how SPIFFE solves interoperability by combining SPIFFE and Google Workload Identity, to solve this problem. In the fields you can find different approaches to solve the challenge, with various pitfalls:

  • Some people will export a GCP IAM Service Account Key and place it on the machine. This is the worst thing you can do! Don’t do that.
  • You can use Workload Identity Federation and an OIDC-based Identity Provider. The workload uses a client credentials grant to Authenticate. This is much better, but still means you have a clientID and clientSecret that you want to rotate and keep secret.
  • You could take a look at SPIFFE and setup Spire and integrate that with Workload Identity Federation.

SPIFFE and Workload Identity Federation

You can easily try this out by setting up Spire on your own, local machine following the great getting started guide. You will need to make a small change though since we need to create a discoverable OIDC configuration. The easiest way I found for running this small POC was the following:

  1. Create a GCS bucket. In the conf/server/server.conf file add an entry: jwt_issuer = “https://<bucket_name>.storage.googleapis.com.
  2. Add the below openid-configuration under .well-known/openid-configuration (make sure to replace the placeholders).
  3. Get the JWKS from the Spire server by calling bin/spire-agent api fetch jwt -audience test the output contains a JSON object with the keys. Just place the whole key object with the keys keyword in a file called keys and place it in the root folder of your bucket.
  4. In the Google Cloud console, create a new Workload Identity Pool and Provider, and set the issuer to https://<bucket_name>.storage.googleapis.com as well.
  5. Add a new service account that you want to use from your on-premise workload. You will need to grant the SPIFFE ID of the test workload access to the service account:
gcloud iam service-accounts add-iam-policy-binding SERVICE_ACCOUNT_EMAIL \
--role=roles/iam.workloadIdentityUser \
--member="principal://iam.googleapis.com/projects/<project_number>/locations/global/workloadIdentityPools/<workload_pool_id>/subject/<SPIFFE_ID>"

OpenID Example Discovery document:

{
"issuer": "https://<bucket_name>.storage.googleapis.com",
"jwks_uri": "https://<bucket_name>.storage.googleapis.com/keys",
"authorization_endpoint": "",
"response_types_supported": [
"id_token",
"access_token"
],
"subject_types_supported": [],
"id_token_signing_alg_values_supported": [
"RS256",
"ES256",
"ES384"
]
}

You can now request a SPIFFE JWT token, exchange it against the Security Token Service API and use the new token to impersonate the service account (here is how to do it if you want to run those steps manually).

SPIFFE GCP Proxy

In order to make the whole process more user friendly and better integrated with gcloud SDK and Google Cloud libraries, I created a small proxy tool that you can run on your machine along side the spire-agent and the workload to provide a GCP Metadata server compatible approach for retrieving access tokens:

What’s happening here on a high level

The workflow this proxy implements executes the following steps:

  1. The workload requests a GCP Auth token via http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/<sa>/token.
  2. The proxy requests a JSON Web Token (JWT) from the local Spire Agent.
  3. The Spire Agents requests the JWT from the Spire Server.
  4. The proxy takes the SPIFFE JWT and requests a new token from the Security Token Service (STS) API.
  5. STS verifies that the passed SPIFFE JWT is valid, by verifying the issuer by retrieving the .well-known/openid-configuration and the JWKS. (In this POC, the openid-configuration is in fact loaded from GCS)
  6. STS creates and passes back an access token. This token can only be used to impersonate a service account.
  7. The proxy requests a service account access token using the “impersonation token”.
  8. The proxy responds to the workload with the GCP IAM access token for the service account.
  9. The workload can use the access token to authenticate and authorise against Google Cloud APIs.

This reduces the steps you need to do on your machine to simply calling gcloud compute instances list . The proxy will take care for retrieving and exchanging the various tokens and you can just use the existing automagic in the Google tools.

The approach with spiffe-gcp-proxy works well for VMs that run a single workload/require a single identity. If you consider running your workload on a local Kubernetes cluster you can still use the basic approach, but would probably extend the mechanism a little to cope with having multiple workloads with potentially different identities running on the system.

In my opinion if you are running workloads on-premise, SPIFFE is definitely something you should check out. And it’s definitely a technology you should keep on your radar, since I believe it’s going to become much more ubiquitous in the future.

--

--

Christoph Grotz
Google Cloud - Community

I’m a technology enthusiast and focusing on Digital Transformation, Internet of Things and Cloud.