How to use GCP cloud build trigger GCP composer’s dataflow job

Brian Yu Zhang
3 min readAug 12, 2020

I am working as data engineer in a Fintech company. My manager assigned me a task which is deploying a data pipline in google cloud platform.

This pipeline looks like the picure below

general idea of data pipeline

When there is pull request in github repository, github will trigger cloud build using webhook set up previously. Cloud build simple trigger google SDK command line gsutil rsync sync github repo to the same folder in google composer DAG file located in composer google storage.

In cloud build, I also added other command which will trigger composer job in composer airflow after sync finish to update google storage repo.

This airflow job will actually run a simple dataflow job and eventually data will be saved to google bigquery.

There is lots of articles related to sync github repo to DAG google storage. This article will focus on how to use cloud build trigger composer airflow job.

sync github repo to google storage through cloud build

You can easily use web interface for setting up a trigger for github. However, most settings should be in yaml file and json file located in your repo. Here is the setting for my repo’s cloudbuild.yaml file

steps:- name: gcr.io/cloud-builders/gsutil
id: Sync github repo to DAGs folder
args: ["-m", "rsync", "-r", "-d", "./dags", "gs://to/my/DAGfolder/in/google/storage"]

The actuall command in google SDK should be

gsutil rsync -mrd ./dags gs://to/my/DAGfolder/in/google/storage

make sure set up _GSC_BUCKET environment varible to make sure you are running source folder in current directory.

It gave me error if I put rsync first in args. When I put -m first it works.

use cloud build trigger GCP composer

in the cloud build google official document, there is not much articles related to trigger composer.

It only contains GKE, Cloud run, App Engine, Cloud Functions, and Firebase.

I think it can be achived becase I can run google sdk to trigger composer. In yaml file, I added steps like that.

- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: 'gcloud'
args: ["composer", "environments", "run","composer-cluster-name", "--location=your-location","--project=project-ID", "trigger_dag", "--", "AIRFLOW_ID"]

this yaml equal to google SDK command like

gcloud composer environments run composer-cluster-name --location=your-location --project=project-ID trigger_dag -- AIRFLOW_ID

When I trigger this build job. I got error like that

Seems like I did not properly set up privilege for cloud build cluster.

Because of cloud build using service account to access other cluster, go to IAM&Admin IAM page, add Composer User role into cloudbuild-service-account@cloudbuild.gserviceaccount.com.

After that the problem solved.

I did not mention dataflow and bigquery part. If you have questions, feel free to leave some comments.

--

--

Brian Yu Zhang

I am data engineer in PRA Group. PRA Group (Nasdaq: PRAA) is global leader in acquiring and collecting nonperforming loans through our local subsidiaries.