GCP IAM Integration for LitmusChaos with Workload Identity
In this blog, we’ll take a look at how to do a GCP IAM integration for LitmusChaos using Workload Identity for executing the GCP experiments in a keyless manner when using the Google Kubernetes Engine (GKE) as the execution plane.
Introduction
To execute LitmusChaos GCP experiments, one needs to authenticate with GCP using a service account before trying to access the target resources. Usually, you have only one way of providing the service account credentials to the experiment, using a service account key, but if you’re using a GKE cluster you have a keyless medium of authentication as well. Therefore you have two ways of providing the service account credentials to your GKE cluster:
- Using Secrets: As you would normally do, you can create a secret containing the GCP service account in your GKE cluster, which gets utilized by the experiment for authentication to access your GCP resources.
- IAM Integration: When you’re using a GKE cluster, you can bind a GCP service account to a Kubernetes service account as an IAM policy, which can be then used by the experiment for keyless authentication using GCP Workload Identity. We’ll discuss more on this method in the following sections.
Why use Workload Identity for GCP authentication?
A Google API request can be made using a GCP IAM service account, which is an identity that an application uses to make calls to Google APIs. You might create individual IAM service accounts for the users who execute GCP experiments to enforce role-based access control, then download and save the keys as a Kubernetes secret that you manually rotate. Not only is this time-consuming, but service account keys only last ten years (or until you manually rotate them). An unaccounted-for key could give an attacker extended access in the event of a breach or compromise. Using service account keys as secrets is not an optimal way of authenticating GKE workloads due to this potential blind spot and the management cost of key inventory and rotation.
Workload Identity allows you to restrict the possible “blast radius” of a breach or compromise while enforcing the principle of least privilege across your environment. It accomplishes this by automating workload authentication best practices, eliminating the need for workarounds, and making it simple to implement recommended security best practices.
- Your tasks will only have the permissions they require to fulfill their role with the principle of least privilege. It minimizes the breadth of a potential compromise by not granting broad permissions.
- Unlike the 10-year lifetime service account keys, credentials supplied to the Workload Identity are only valid for a short time, decreasing the blast radius in the case of a compromise.
- The risk of unintentional disclosure of credentials due to a human mistake is greatly reduced because Google controls the namespace service account credentials for you. It also eliminates the need for you to manually rotate these credentials.
Enabling service accounts to access GCP resources
STEP 1: Enable Workload Identity
You can enable Workload Identity on clusters and node pools using the Google Cloud CLI or the Google Cloud Console. Workload Identity must be enabled at the cluster level before you can enable Workload Identity on node pools. Workload Identity can be enabled for an existing cluster as well as a new cluster.
To enable Workload Identity on a new cluster using the console, choose to create a GKE cluster, and aside from the usual configuration, simply enable the Workload Identity under security.
You can also use the gcloud
tool to create the Kubernetes cluster with Workload Identity enabled using the following command:
gcloud container clusters create CLUSTER_NAME \ --region=COMPUTE_REGION \ --workload-pool=PROJECT_ID.svc.id.goog
Replace the following:
CLUSTER_NAME
: the name of your new cluster.COMPUTE_REGION
: the Compute Engine region of your cluster. For zonal clusters, use--zone=COMPUTE_ZONE
.PROJECT_ID
: your Google Cloud project ID.
You can enable Workload Identity on an existing Standard cluster by using the gcloud
CLI or the Cloud Console. Existing node pools are unaffected, but any new node pools in the cluster use Workload Identity by the modification of the same setting as shown above in the Cloud Console. Alternatively, you can use the following command to achieve it:
gcloud container clusters update CLUSTER_NAME \
--region=COMPUTE_REGION \
--workload-pool=PROJECT_ID.svc.id.goog
Replace the following:
CLUSTER_NAME
: the name of your new cluster.COMPUTE_REGION
: the Compute Engine region of your cluster. For zonal clusters, use--zone=COMPUTE_ZONE
.PROJECT_ID
: your Google Cloud project ID.
STEP 2: Configure LitmusChaos to use Workload Identity
Assuming that you already have LitmusChaos installed in your GKE cluster as well as the Kubernetes service account you want to use for your GCP experiments, execute the following steps.
- Get Credentials for your cluster.
gcloud container clusters get-credentials CLUSTER_NAME
Replace CLUSTER_NAME
with the name of your cluster that has Workload Identity enabled.
2. Create an IAM service account for your application or use an existing IAM service account instead. You can use any IAM service account in any project in your organization. For Config Connector, apply the IAMServiceAccount
object for your selected service account. To create a new IAM service account using the gcloud
CLI, run the following command:
gcloud iam service-accounts create GSA_NAME \
--project=GSA_PROJECT
Replace the following:
GSA_NAME
: the name of the new IAM service account.GSA_PROJECT
: the project ID of the Google Cloud project for your IAM service account.
Alternatively, you can also use the Cloud Console UI to create a new GCP IAM Service Account.
3. Ensure that this service account has all the roles required for interacting with the Compute Engine resources including VM Instances and Persistent Disks according to the GCP experiments that you’re willing to run. You can grant additional roles either using the Cloud Console or using the following command:
gcloud projects add-iam-policy-binding PROJECT_ID \
--member "serviceAccount:GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com" \
--role "ROLE_NAME"
Replace the following:
PROJECT_ID
: your Google Cloud project ID.GSA_NAME
: the name of your IAM service account.GSA_PROJECT
: the project ID of the Google Cloud project of your IAM service account.ROLE_NAME
: the IAM role to assign to your service account, likeroles/spanner.viewer
.
4. Allow the Kubernetes service account to be used for the GCP experiments to impersonate the GCP IAM service account by adding an IAM policy binding between the two service accounts. This binding allows the Kubernetes service account to act as the IAM service account.
gcloud iam service-accounts add-iam-policy-binding GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
Replace the following:
GSA_NAME
: the name of your IAM service account.GSA_PROJECT
: the project ID of the Google Cloud project of your IAM service account.KSA_NAME
: the name of the Kubernetes service account to be used for LitmusChaos GCP experiments. If you’re using ChaosCenter, then by default thelitmus-admin
service account is used for all the experiments.NAMESPACE
: the namespace in which the Kubernetes service account to be used for LitmusChaos GCP experiments is present. If you’re using ChaosCenter, then by defaultlitmus
is the namespace.
5. Annotate the Kubernetes service account to be used for LitmusChaos GCP experiments with the email address of the GCP IAM service account.
kubectl annotate serviceaccount KSA_NAME \
--namespace NAMESPACE \
iam.gke.io/gcp-service-account=GSA_NAME@GSA_PROJECT.iam.gserviceaccount.com
Replace the following:
KSA_NAME
: the name of the Kubernetes service account to be used for LitmusChaos GCP experiments. If you’re using ChaosCenter, then by default thelitmus-admin
service account is used for all the experiments.NAMESPACE
: the namespace in which the Kubernetes service account to be used for LitmusChaos GCP experiments is present. If you’re using ChaosCenter, then by defaultlitmus
is the namespace.GSA_NAME
: the name of your IAM service account.GSA_PROJECT
: the project ID of the Google Cloud project of your IAM service account.
STEP 3: Update ChaosEngine Manifest
When creating a new workflow with GCP experiments, edit the manifest YAML and add the following value to the ChaosEngine manifest field .spec.experiments[].spec.components.nodeSelector
to schedule the experiment pod on nodes that use Workload Identity:
iam.gke.io/gke-metadata-server-enabled: "true"
As an example, say you’re adding the GCP VM Instance Stop experiment, you’d make the following change:
STEP 4: Update ChaosExperiment Manifest
Remove cloud-secret
at .spec.definition.secrets
in the ChaosExperiment manifest as we are not using a secret to provide our GCP Service Account credentials.
As an example, say you’re adding the GCP VM Instance Stop experiment, you’d remove the following:
Now you can run your GCP experiments with a keyless authentication provided by GCP using Workload Identity.
Summary
To conclude, we were able to set up a keyless authentication medium for the GCP experiments executing in a GKE that belongs to the target GCP environment. We saw how we can leverage Workload Identity for the GKE node pools to assign an IAM identity to our Kubernetes workloads, which was then used for the authentication of LitmusChaos GCP experiments.