Untangling the Multi-cloud Identity and Access Control Problem

Brandon Lum
Universal Workload Identity
8 min readSep 1, 2021

--

When an organization moves to a multi-cloud environment, one of the first questions a developer will ask is “How do I access my S3 bucket in AWS from my GCP cluster?” (or any other permutations thereof cloud services/providers). This is an unsurprising request. However, the solutions to these problems today are surprisingly inadequate, especially when security and compliance is considered.

CISOs (Chief Information Security Officers) and security engineers/operators need to ensure proper identity and access management across each component. In this blogpost, we will elaborate more about why today’s solutions fall short and propose a shift in the perspective of managing identity and access, moving from a “platform view” of identity to an “organization view” of identity. After all, we have a central organizational view when we manage user identities, why don’t we do the same for our workloads?

A simple scenario

In this blogpost we will use an example of an organization that is running a Kubernetes/OpenShift container platform across 3 different clouds, IBM Cloud, Google Cloud Platform (GCP) and Amazon Web Services (AWS). The organization has workloads in each of the clusters, and they have to access resources across the different clouds. For example, a pod in IBM Cloud has to access an Amazon S3 bucket, or an EKS (Amazon Elastic K8s Service) pod has to access a Cloudant database in IBM Cloud.

Today, each cloud provider solves the identity and access problem perfectly within its own infrastructure. Each cloud provider comes with an identity provider (and root of trust), which attests the identity of each pod (and the compute node it runs on), and provides the necessary x509 certificates or JWT tokens that allow the workloads to authenticate themselves. Each cloud provider also has its own Identity and Access Management (IAM) service. These services provide the ability to control access to resources within the cloud and are able to authenticate and authorize the identities provided by the cloud provider’s infrastructure. Below is a diagram depicting a multi-cloud setup of the above scenario, together with the relevant identity and access components for each cloud.

Equipped with the Identity provider and IAM services in each cloud, a CISO or security engineer/operator has everything they need to manage identity and access to workloads. Therefore, in a single cloud environment the solutions provided are more than sufficient to meet security and compliance requirements. However, the issue is not one of any particular provider, but that of managing identities and access across different providers.

Current multi-cloud identity/access solutions

Let’s circle back to our initial developer’s ask: “How do I access resources in <Cloud_provider_A> from my workload in <Cloud_provider_B>?”

There are two main ways to do this today. One is very simple but insecure, and the other is safer, but requires quite a bit of overhead to manage at scale. Let’s start with the simple solution!

Provision credential for another cloud

If you’ve ever generated an API key or token and copied it into a cluster via Kubernetes secrets, this is the main idea behind the first method. An administrator generates a long-lived credential that can be given to a workload in another cloud, which will be used to authenticate the request for access to a resource. This method is unideal for multiple reasons. First, the token generated is long-lived, therefore the ability to manage risk and control the blast radius is very limited — there is no continuous attestation of the workload. Secondly, it requires an administrator to perform this action, this results in a lot of trust being put in a person (or a component), and complications especially when it comes to compliance (the administrator needs certification for data use and the process is hard to audit).

Below is a depiction of creating an identity document in GCP and passing it to a workload in IBM cloud so that the IBM Cloud workload can access a resource in GCP.

Federating identity with other clouds

The more common method of handling identity and access management across clouds is via federation. This is done by establishing trust between the identity providers in each cloud with the identity and access management of each other clouds.

This is done by telling each IAM system of each cloud to authenticate identities of workloads in every other cloud. This is done usually via OIDC or through exposing a trust bundle (that consists of a certificate authority public key/cert), that can be used to authenticate the provenance of a workload’s identity. In terms of security, this is strictly better than the method mentioned before as credentials are short-lived and an administrator doesn’t need to handle any secret credentials.

The diagram below shows the configuration of GCP IAM such that it can now authenticate IBM Identities. The GCP IAM system can now authenticate and authorize access to GCP resources from IBM Cloud workloads.

Problems with cloud-to-cloud federation

However, this method doesn’t come without some drawbacks. There are several issues with performing federation when dealing with workload identities.

First, support for federation today varies between clouds. For example, for an AWS workload to access a google cloud resource, it is fairly easy to configure via the new beta identity-pool feature in gcloud. However, the opposite (gcloud accessing AWS) is not. This blogpost details how it can be done, and involves additional components that need to be instantiated within a gcloud kubernetes cluster in order for trust to be extended in that direction.

Second, there is the problem of identity schema and schema consistency across clouds. Every cloud’s identity has its own schema. An identity prefix of “example.org/eu-de/prod/” may mean something totally different in different clouds. In one, “eu-de” may refer to the datacenter region, in one it may refer to the name of a cluster, or even the configured language! This includes additional fields as well such as security, policy, role groups, etc. The issue is that each cloud has its own identity schema, and being able to establish trust and make meaning of these identities is not as simple as just connecting their CAs. Some clouds provide mechanisms to map certain fields, but doing this across multiple clouds can become overly complex very quickly. This would make policies and security controls susceptible to misconfiguration.

Finally, comes the issue of scale. Doing all this between one or two cloud services is manageable. However, as the number of clouds and providers scale, so does the management problem. One needs to manage the trust relationships, identity schemas, policies, roles, etc. on each pairwise cloud. This means a quadratic (more specifically, n choose 2) complexity of managing identity and access. The diagram below shows the number of pairwise configurations that need to be done for just 3 providers.

Taking a step further: Organizational(/Universal) Identity

Historically, a lot of workload identity and access management tooling today is built upon the idea that an organization utilizes one platform — the cloud provider of your choice. Since an organization had all its workloads with a single provider, the notion of the platform’s workload identity worked well as a notion of the organization’s workload identity.

However, when multi-cloud is involved, there is now a disconnect between the organizational and platform identity. Therefore, it is necessary to (re-)define the notion of organizational workload identity in the multi-cloud scenario.

If we take a look at how we manage users today, an organization has a user directory, either Windows Active directory or an LDAP server for the organization, that each platform connects to. These identities are all centrally managed in a single place and are universal throughout the organization. We can draw on this idea and apply it to workload identities to create the notion of organizational/universal identity.

Organizational Workload Identity with SPIFFE/SPIRE + Tornjak

We propose a way to define organizational identity by creating a workload identity abstraction across the compute stack of all platforms. Defining a consistent identity schema provides a standardized approach to creating policies and authorization rules across multiple clouds. This creates a central and universal organizational root of trust (or root CA) and identity framework for all workloads that can be consumed by any resource owner or cloud provider.

An example of technologies that can do this are SPIFFE, SPIRE and Tornjak.

  • SPIFFE is an open-source standard for securely identifying software systems in dynamic and heterogeneous environments,
  • SPIRE is an implementation of the principles embodied in the SPIFFE standards. This includes managing platform and workload attestations, APIs for controlling attestation policies and provisioning of coordinating certificate issuance and rotation.
  • Tornjak is a management plane for SPIRE, providing an interface for operators to centrally manage SPIRE servers and related components

In the following diagram, we show how we can use SPIFFE/SPIRE + Tornjak to establish a workload identity component across the compute infrastructure in various clouds. These components are responsible for attesting the underlying compute nodes and workloads, providing the workloads to identities that can attest to infrastructure specific properties. For example, the metadata of the identity (x509/JWT) of each workload can be observed to determine which cloud provider it is on, its datacenter and the container image it is running. All the workload identities are then established via an organizational root of trust (/root CA), which exposes an OIDC discovery service. This can then be configured the same way across all cloud providers and services.

Because the identity framework spans the organization, the identity schema is consistent across all platforms, the configuration of each identity consumer will be simple. Teams managing access control and policies no longer need to be familiar with identity schemas over multiple providers and only have to configure a single OIDC identity provider. A demo of this can be seen with Vault and S3.

Where to next? n=??

As the number of providers and consumers of services grow, reducing the complexity of multiple identity implementations across different platforms will provide better control over security and audit-ability of identity and access in an organization. We present an approach of “Organizational Workload Identity” (or Universal workload identity) that will help to solve the problems of current identity and access management approaches in the multi-cloud world.

If you are interested in learning more, come by our Kubecon + CloudNativeCon talk: “Untangling the Multi-Cloud Identity and Access Problem With SPIFFE Tornjak”

Our Kubecon + CloudNativeCon talk recording.

--

--

Brandon Lum
Universal Workload Identity

Security, Container Cloud, IBM Research. #NablaContainers, Encrypted OCI Containers, Portieris, SPIFFE Tornjak