Configuring kubernetes-based TF providers from AWS to GCP with Workload Identity Federation (part 2/2)

Published in

Zencore Engineering

5 min readNov 2, 2023

If you were following the first part of this article (Connecting AWS and GCP workloads with Workload Identity Federation), you might know that we configured a self-hosted CI orchestration agent in AWS to deploy GCP resources. That configuration can be enough in most cases to run your TF deployments. In spite of that, sometimes you’d be using kubernetes-based resources either because your application needs it, you are using a module that requires it or any other reason you could think of.

That’s why we will be creating a GKE cluster and a helm chart release to dive a little further into our use case and explain how to configure the providers in this scenario. We will also describe which are the most common issues you may encounter and how to fix them.

Use case example diagram with kubernetes-based resources

Example code

We’ll create the GKE cluster using the google community modules and deploy a simple Redis release. Checkout the example code:

module "beta_autopilot_private_cluster" {
  source = "terraform-google-modules/kubernetes-engine/google//modules/beta-autopilot-private-cluster"

  name              = "my-cluster"
  project_id        = "my-project"
  region            = "us-central1"
  network           = "default"
  subnetwork        = "default"
  ip_range_pods     = "my-pods-range"
  ip_range_services = "my-service-range"
  
 enable_private_endpoint     = false
  enable_private_nodes       = true
  master_ipv4_cidr_block     = "172.30.0.0/28"
  master_authorized_networks = []

}

resource "helm_release" "redis" {
  name       = "my-redis"
  repository = "oci://registry-1.docker.io/bitnamicharts"
  chart      = "redis"
  version    = "18.2.0"
}

Now lets see how authentication can be done.

Authentication configs

Kubernetes-based providers can easily authenticate to GCP with the following configuration:

data "google_client_config" "provider" {}

data "google_container_cluster" "my_cluster" {
  name     = "my-cluster"
  location = "us-central1"
  
  depends_on = [module.beta_autopilot_private_cluster]
}

provider "helm" {
  kubernetes {
    host  = "https://${data.google_container_cluster.my_cluster.endpoint}"
    token = data.google_client_config.provider.access_token
    
    cluster_ca_certificate = base64decode(data.google_container_cluster.my_cluster.master_auth[0].cluster_ca_certificate)
  }
}

Unfortunately, in our scenario this configuration can throw the following error:

| Error: could not get apiVersions from Kubernetes: could not get apiVersions from Kubernetes: unknown

This is telling us that we are facing an authentication issue where the kubernetes provider is not being able to fetch the token to access the GKE cluster.

This is probably because the CI agent is running in an environment outside of Google Cloud or an environment without gcloud. Even using Workload Identity Federation for some reason the provider can’t find the context and the credentials it needs to connect to the cluster.

A way to fix this is to explicitly add to the pipeline workflow the Application Default Credentials (ADC) configuration as part of the pre-deployment steps, then use the gke-gcloud-auth-plugin to tell the provider to use the ADC.

This will look something like:

...

- |
gcloud iam workload-identity-pools create-cred-config \
projects/<PROJECT_NUMBER>/locations/global/workloadIdentityPools/<POOL_ID>/providers/<PROVIDER_ID> \
--service-account=<SERVICE_ACCOUNT_EMAIL> \
--aws \
--output-file=creds.json
- gcloud auth login --cred-file=creds.json --quiet
- echo GOOGLE_APPLICATION_CREDENTIALS=/$(pwd)/creds.json >> $_ENV

...

It is important to set the GOOGLE_APPLICATION_CREDENTIALS environment variable because this is what the provider will use to authenticate. The above is just an example, the command might vary depending on the CI agent you are using.

Make sure the gke-gcloud-auth-plugin is installed in your agent runner.

Now, update the providers to use the credentials:

data "google_client_config" "provider" {}

data "google_container_cluster" "my_cluster" {
  name     = "my-cluster"
  location = "us-central1"

  depends_on = [module.beta_autopilot_private_cluster]
}

provider "helm" {

  kubernetes {
    host = "https://${data.google_container_cluster.my_cluster.endpoint}"
    cluster_ca_certificate = base64decode(data.google_container_cluster.my_cluster.master_auth[0].cluster_ca_certificate)
    
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      args = ["--use_application_default_credentials"]
      command = "gke-gcloud-auth-plugin"
    }
  }
}

All done! You should be able to deploy helm releases or any other kubernetes-based apps to your cluster now. 🚀

*CI agent runner successful output example*

*Redis release from the example code running on GKE*

Authentication configs without CI orchestration agents

If you want to try out the configuration without actually installing and configuring an orchestration agent, you can simply use an EC2 instance and run terraform commands from it.

In that case, after the previous setup you might start getting errors like:

│ Error: 10 errors occurred:
...
│ * clusterroles.rbac.authorization.k8s.io is forbidden: User "<SA_EMAIL>" cannot create resource "<RESOURCE_NAME>" in API group "rbac.authorization.k8s.io" at the cluster scope: requires one of ["container.clusterRoles.create"] permission(s).
...
│ * clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "<SA_EMAIL>" cannot create resource "<RESOURCE_NAME>" in API group "rbac.authorization.k8s.io" at the cluster scope: requires one of ["container.clusterRoleBindings.create"] permission(s).
...
│ * roles.rbac.authorization.k8s.io is forbidden: User "<SA_EMAIL>" cannot create resource "<RESOURCE_NAME>" in API group "rbac.authorization.k8s.io" in the namespace "<NAMESPACE>": requires one of ["container.roles.create"] permission(s).
...

This is because the provider is now authenticated but the identity doesn’t have permissions inside the cluster. Here you’ll need to add RBAC permissions to authorize it.

You can do that by connecting to the GKE cluster and running the following:

kubectl create serviceaccount cluster-admin-sa
kubectl create clusterrolebinding cluster-admin-sa --clusterrole=cluster-admin --user="<AWS_WORKLOAD_SERVICE_ACCOUNT_EMAIL>"

Replacing AWS_WORKLOAD_SERVICE_ACCOUNT_EMAIL with the GCP service account that is running the workload on behalf of AWS.

This will give the service account cluster-admin access to do anything you want inside the cluster.

Conclusion

Google Workload Identity Federation offers a secure, efficient, and interoperable solution for managing identities and access control in a multi-cloud environment. By simplifying access management and enhancing security, it empowers organizations to make the most of their cloud resources while maintaining a high level of control and compliance.

If you want to learn more about Workload Identity Federation, checkout the first part of this article: Connecting AWS and GCP workloads with Workload Identity Federation (part 1/2).

Curious about other GCP tools? Visit our site to learn more about different GCP solutions and how they can help take your business and applications to the next level.

Configuring kubernetes-based TF providers from AWS to GCP with Workload Identity Federation (part 2/2)

Example code

Authentication configs

Authentication configs without CI orchestration agents

Conclusion

Written by Eunice Bello