Full CI CD on Github Actions enabling Keyless Authentication and Workload Identity

Mazlum Tosun
Google Cloud - Community
9 min readOct 25, 2023

1. Explanation of the use case presented in this article

This article shows a full example of CI CD pipeline with Github Actions enabling keyless authentication and Workload Identity Federation.

Workload Identity Federation uses OIDC behind the scenes and prevents the use of a long-lived Service Account token key.

The long-lived token keys need to be rotated and managed and the best practice for security is to prevent their use.

Here you can see a schema that illustrates the interactions between GitHub Actions and Google Cloud, with Workload Identity Federation.

You can check this article for more details and to have an initiation on this topic :

This article has the goal to be more real world, with a full CI CD, the infra automation with Terraform and no manual actions.

Github Actions is the popular and built in CI CD tool proposed for Github.

Previously we needed to use a Service Account token key for CI CD pipelines developed with Github Actions on Google Cloud, but today we can use a better approach without a key and Workload Identity.

To illustrate this example, we will use a real world use case with the deployment and execution of a Dataflow Flex Template.

We wrote a separate article with the deployment of Flex Template with Cloud Build.

Here you can see the diagram of this use case :

I also created a video on this topic in my GCP Youtube channel, please subscribe to the channel to support my work for the Google Cloud community :

English version

French version

Some explanations :

  • The project is hosted on the following Github repository
  • The entrypoint of the CI CD part with Github Actions corresponds to the resources concerning Workload Identity Federation. These resources are created with Terraform and Cloud Build
  • The Flex Template with Dataflow corresponds to a Docker image and a spec file from Cloud Storage
  • There are two manual jobs with Github Actions. One to deploy the Flex Template, upload the spec file to GCS and publish the Docker image in Artifact Registry. The second to run the Flex Template and the Dataflow job

2. Structure of the project

2.1 Environment variables

Set the following environment variables :

# Terraform part for Workload Identity Federation concerning the CI CD with Github Actions
export PROJECT_ID={{project_id}}
export LOCATION=europe-west1
export TF_STATE_BUCKET={{terraform_state_bucket}}
export TF_STATE_PREFIX=testmazlum
export GOOGLE_PROVIDER_VERSION="= 4.47.0"

export REPO_NAME=internal-images
export IMAGE_NAME="dataflow/team-league-java"
export IMAGE_TAG=latest
export METADATA_FILE="config/metadata.json"
export METADATA_TEMPLATE_FILE_PATH="gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java"
export SDK_LANGUAGE=JAVA
export FLEX_TEMPLATE_BASE_IMAGE=JAVA11
export JAR=target/teams-league-0.1.0.jar
export FLEX_TEMPLATE_JAVA_MAIN_CLASS="fr.groupbees.application.TeamLeagueApp"
export JOB_NAME="team-league-java"

export TEMP_LOCATION=gs://mazlum_dev/dataflow/temp
export STAGING_LOCATION="gs://mazlum_dev/dataflow/staging"
export SA_EMAIL=sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com
export INPUT_FILE="gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json"
export SIDE_INPUT_FILE="gs://mazlum_dev/team_league/input/json/input_team_slogans.json"
export TEAM_LEAGUE_DATASET=mazlum_test
export TEAM_STATS_TABLE=team_stat
export JOB_TYPE=team_league_java_ingestion_job
export FAILURE_OUTPUT_DATASET=mazlum_test
export FAILURE_OUTPUT_TABLE=job_failure
export FAILURE_FEATURE_NAME=team_league

2.2 Python local environment

The Python local environment uses PipEnv as a package manager and to automate the creation of virtual env.

You can check this video from my GCP Youtube channel that shows :

  • How having a Python comfortable local environment with PyEnv, PipEnv, DirEnv and Intellij IDEA and navigate in all the files, classes and methods
  • How to automate the creation of the virtual env for our Python project

2.3 The logic of Dataflow Flex Template part

For the Dataflow Flex Template part, I created a dedicated article on this topic. To have a deep explanation, feel free to read it.

2.4 The CI CD logic with Cloud Build to create the entrypoint for Github Actions

2.4.1 The Terraform part

The entrypoint of the CI CD with Github Actions are the resources that concern Workload Identity Federation.

We want to create these resources with a lightweight and serverless approach. Cloud Build is a good candidate for that.

We also use Terraform and IaC to create them.

The Terraform code is put in the infra folder. We prepared some local variables in locals.tf :

locals {
github_account_name = "tosun-si"
github_repo_name = "dataflow-java-ci-cd"
}

The resources from the main.tf file :

resource "google_iam_workload_identity_pool" "github_actions_ci_cd_pool" {
project = var.project_id
workload_identity_pool_id = "gb-github-actions-ci-cd-pool"
display_name = "Pool CI CD Github actions"
description = "Pool for CI CD Github actions"
}

resource "google_iam_workload_identity_pool_provider" "github_actions_ci_cd_provider" {
project = var.project_id
workload_identity_pool_id = google_iam_workload_identity_pool.github_actions_ci_cd_pool.workload_identity_pool_id
workload_identity_pool_provider_id = "gb-github-actions-ci-cd-provider"
attribute_mapping = {
"google.subject" = "assertion.sub"
"attribute.repository" = "assertion.repository"
}
oidc {
issuer_uri = "https://token.actions.githubusercontent.com"
}
}

resource "google_service_account_iam_member" "dataflow_ci_cd_workload_identity_user" {
service_account_id = "projects/${var.project_id}/serviceAccounts/sa-dataflow-dev@${var.project_id}.iam.gserviceaccount.com"
role = "roles/iam.workloadIdentityUser"
member = "principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github_actions_ci_cd_pool.name}/attribute.repository/${local.github_account_name}/${local.github_repo_name}"
}

We need to create :

  • A Workload Identity pool
  • A Workload Identity Provider with the pool created previously
  • The provider needs to have an attribute mapping for the subject and the repository, because we want to give the authorization only to a single GitHub repository to Google Cloud
  • The issuer in this case, concerns Github Actions
  • The last resource adds a Service Account as a member of the provider with the role workloadIdentityUser This role allows the provider to act as the Service Account.

2.4.2 The Cloud Build part

There are two Cloud Build jobs and separate YAML files :

  • Plan
  • Apply

We use Terragrunt because most of our examples are based on this tool, but it’s not mandatory in this case. We could have used Terraform directly.

The create-workload-identity-ci-cd-github-actions-plan.yaml file :

steps:
- name: alpine/terragrunt:1.3.6
script: |
terragrunt run-all init
terragrunt run-all plan --out tfplan.out
dir: 'infra'
env:
- 'TF_VAR_project_id=$PROJECT_ID'
- 'TF_VAR_env=$_ENV'
- 'TF_STATE_BUCKET=$_TF_STATE_BUCKET'
- 'TF_STATE_PREFIX=$_TF_STATE_PREFIX'
- 'GOOGLE_PROVIDER_VERSION=$_GOOGLE_PROVIDER_VERSION'

The create-workload-identity-ci-cd-github-actions-apply.yaml file :

steps:
- name: alpine/terragrunt:1.3.6
script: |
terragrunt run-all init
terragrunt run-all plan --out tfplan.out
terragrunt run-all apply --terragrunt-non-interactive tfplan.out
dir: 'infra'
env:
- 'TF_VAR_project_id=$PROJECT_ID'
- 'TF_VAR_env=$_ENV'
- 'TF_STATE_BUCKET=$_TF_STATE_BUCKET'
- 'TF_STATE_PREFIX=$_TF_STATE_PREFIX'
- 'GOOGLE_PROVIDER_VERSION=$_GOOGLE_PROVIDER_VERSION'

These jobs create the Workload Identity Federation resources from the Terraform code proposed in the infra folder.

2.5 The CI CD logic with Github Actions

For the Github Actions part, we decided to create manual pipelines to deploy and run the Flex Template.

Manual pipelines are represented by Workflow Dispatch in Github Actions.

We have two separate pipelines, one for the deployment and the other to run the template.

2.5.1 The Flex Template deployment part

The logic of the Flex Template deployment part, is applied in the dataflow-deploy-template-github.yaml file :

name: Deploy Dataflow Flex Template

env:
PROJECT_ID: {{project_id}}
LOCATION: europe-west1
CI_SERVICE_NAME: github-actions

REPO_NAME: internal-images
IMAGE_NAME: 'dataflow/team-league-java'
IMAGE_TAG: latest
METADATA_FILE: 'config/metadata.json'
METADATA_TEMPLATE_FILE_PATH: 'gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java'
SDK_LANGUAGE: 'JAVA'

WORKLOAD_IDENTITY_PROVIDER: 'projects/975119474255/locations/global/workloadIdentityPools/gb-github-actions-ci-cd-pool/providers/gb-github-actions-ci-cd-provider'
SA_CI_CD_EMAIL: 'sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com'

on:
workflow_dispatch:

jobs:
deploy-template:

runs-on: ubuntu-latest

permissions:
contents: 'read'
id-token: 'write'

steps:
- name: 'Checkout'
uses: 'actions/checkout@v3'

- name: 'Google auth'
id: 'auth'
uses: 'google-github-actions/auth@v1'
with:
workload_identity_provider: '${{ env.WORKLOAD_IDENTITY_PROVIDER }}'
service_account: '${{ env.SA_CI_CD_EMAIL }}'

- name: 'Set up Cloud SDK'
uses: 'google-github-actions/setup-gcloud@v1'
with:
version: '>= 444.0.0'
project_id: '${{ env.PROJECT_ID }}'

- name: 'Docker auth'
run: |-
gcloud auth configure-docker ${{ env.LOCATION }}-docker.pkg.dev

- name: 'Build And Publish Flex Template Docker image'
run: |-
REPO_PATH="${{ env.LOCATION }}-docker.pkg.dev/${{ env.PROJECT_ID }}/${{ env.REPO_NAME }}/${{ env.IMAGE_NAME }}/${{ env.CI_SERVICE_NAME }}:${{ env.IMAGE_TAG }}"
docker build -t "${REPO_PATH}" .
docker push "${REPO_PATH}"

- name: 'Create Flex Template Spec file'
run: |
scripts/create_flex_template_spec_file_gcs.sh

The env bloc initializes some environment variables

The on indicates the pipeline is manual with workflow_dispatch option

The jobs are run from unbuntu-latest environment

We need to pass the permissions because Workload Identity uses an access token and OIDC to be authenticated on Google Cloud

Then we write the following steps :

  • The checkout to use the current repo
  • The provided google-github-actions/auth action, allows to authenticate the current repository to Google Cloud via the Workload Identity Provider and the associated Service Account
  • The provided google-github-actions/setup-gcloud that sets up the gcloud SDK
  • The step after allows to be authenticated with Docker via a gcloud command, because we need in the following steps, to build and publish the Flex Template Docker image to Artifact Registry
  • The last step launches a bash script to create the Flex Template spec file and upload it to Cloud Storage

The graphs part from Github Actions

Click on the Actions in the Github repository and at the left side in the picture, select the Deploy Dataflow Flex Template workflow that is a manual job with Workflow Dispatch. Then run the workflow from a branch, main in this example :

We can access to schema of the current workflow :

And access to the job log :

2.5.2 The Flex Template run part

The logic of the Flex Template run part, is applied in the dataflow-run-template-github.yaml file :

name: Run Dataflow Flex Template

env:
PROJECT_ID: gb-poc-373711
LOCATION: europe-west1
CI_SERVICE_NAME: github-actions

METADATA_TEMPLATE_FILE_PATH: "gs://mazlum_dev/dataflow/templates/team_league/java/team-league-java"

JOB_NAME: "team-league-java"
TEMP_LOCATION: "gs://mazlum_dev/dataflow/temp"
STAGING_LOCATION: "gs://mazlum_dev/dataflow/staging"
SA_EMAIL: "sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com"
INPUT_FILE: "gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json"
SIDE_INPUT_FILE: "gs://mazlum_dev/team_league/input/json/input_team_slogans.json"
TEAM_LEAGUE_DATASET: "mazlum_test"
TEAM_STATS_TABLE: "team_stat"
JOB_TYPE: "team_league_java_ingestion_job"
FAILURE_OUTPUT_DATASET: "mazlum_test"
FAILURE_OUTPUT_TABLE: "job_failure"
FAILURE_FEATURE_NAME: "team_league"

WORKLOAD_IDENTITY_PROVIDER: 'projects/975119474255/locations/global/workloadIdentityPools/gb-github-actions-ci-cd-pool/providers/gb-github-actions-ci-cd-provider'
SA_CI_CD_EMAIL: 'sa-dataflow-dev@gb-poc-373711.iam.gserviceaccount.com'

on:
workflow_dispatch:

jobs:
run-template:

runs-on: ubuntu-latest

permissions:
contents: 'read'
id-token: 'write'

steps:
- name: 'Checkout'
uses: 'actions/checkout@v3'

- name: 'Google auth'
id: 'auth'
uses: 'google-github-actions/auth@v1'
with:
workload_identity_provider: '${{ env.WORKLOAD_IDENTITY_PROVIDER }}'
service_account: '${{ env.SA_CI_CD_EMAIL }}'

- name: 'Set up Cloud SDK'
uses: 'google-github-actions/setup-gcloud@v1'
with:
version: '>= 444.0.0'
project_id: '${{ env.PROJECT_ID }}'

- name: 'Run Flex Template And Dataflow Job'
run: |
scripts/run_dataflow_job.sh

The principle is similar to the deployment part. We use the same Actions for the authentication and the setup of the gcloud sdk.

The run_dataflow_job.sh script, runs the Flex Template :

echo "#######Run the Dataflow Flex Template pipeline"

gcloud dataflow flex-template run "$JOB_NAME-$CI_SERVICE_NAME-$(date +%Y%m%d-%H%M%S)" \
--template-file-gcs-location "$METADATA_TEMPLATE_FILE_PATH-$CI_SERVICE_NAME.json" \
--project="$PROJECT_ID" \
--region="$LOCATION" \
--temp-location="$TEMP_LOCATION" \
--staging-location="$STAGING_LOCATION" \
--parameters serviceAccount="$SA_EMAIL" \
--parameters inputJsonFile="$INPUT_FILE" \
--parameters inputFileSlogans="$SIDE_INPUT_FILE" \
--parameters teamLeagueDataset="$TEAM_LEAGUE_DATASET" \
--parameters teamStatsTable="$TEAM_STATS_TABLE" \
--parameters jobType="$JOB_TYPE" \
--parameters failureOutputDataset="$FAILURE_OUTPUT_DATASET" \
--parameters failureOutputTable="$FAILURE_OUTPUT_TABLE" \
--parameters failureFeatureName="$FAILURE_FEATURE_NAME"

The graph for the run has a similar approach than the deployment part, we have a manual pipeline with Workflow Dispatch and run the workflow :

We can access to the diagram of the current workflow :

And access to the job log :

Conclusion

This article showed a complete and concrete example of CI CD pipelines with Github Actions enabling keyless authentication.

Workload Identity Federation is a really interesting feature, because it uses a OIDC and short-lived token for the authentication.

Github Actions supports Workload Identity Federation, which is more secure because it prevents downloading a long-lived Service Account token key. No need to manage this kind of tokens and pass them as secrets from Github Actions.

Moreover there is an existing Action offered by the community to handle the authentication with Workload Identity Federation (open source).

This is the best practice to have more robust and secure CI CD pipelines with Github Actions and to prevent the management of long-lived token keys.

All the code shared on this article is accessible from my Github repository :

I also share interesting sources on this topic :

If you like my articles, videos and want to see my posts, follow me on :

--

--

Mazlum Tosun
Google Cloud - Community

GDE Cloud | Head of Data & Cloud GroupBees | Data | Serverless | IAC | Devops | FP