Implementing CI/CD in Cloud Composer Using Cloud Build and GitHub — Part 2

Amarachi Ogu
8 min readFeb 21

--

In Part 1 of this blog, we explored various approaches for preventing errors in Cloud Composer workflow. In this part, we’ll dive deeper into the technical details and implement CI/CD in building our stock data workflow.

Architecture Diagram

The following diagram depicts the implementation of CI/CD in a Cloud Composer workflow. Please note that this is for illustrative purposes only and is not a definitive CI/CD process for Cloud Composer. This was specifically chosen because it meets the use case.

image source

In this architecture, the code is developed in the local environment and pushed to a GitHub branch, then a pull request is created. The pull request triggers Cloud Build to run a validation test on the DAG (pre-submit job). If the test is passed and the code is merged into the main repository, another cloud build job is triggered to sync the DAG with the development Cloud Composer environment. When everything is working optimally without errors, you can then manually promote the DAG into production.

File Layout

The file can be seen in this GitHub repo.

dags: This directory contains the DAG and the DAG test files. The stock_data_dag.py file is the stock data DAG we developed in a previous blog which gets deployed to Cloud Composer.
Stock_data_dag_test.py contains dag validation tests to ensure the reliability and correctness of the workflows.

To test and validate your DAGs, you can take advantage of various tools and frameworks, such as the DagBag class in Airflow, the Pytest module, or the Cloud Composer DAG Testing Utility. These tools provide a simple and effective way to write and run tests for your DAGs, covering a wide range of use cases and scenarios.

requirements-composer.txt: contains the packages required to update the Cloud Composer environment.

requirements-test.txt: contains the packages required by the DAG test file.

requirements.txt: contains the packages required by the DAG file.

test-dags.cloudbuild.yaml: a YAML configuration file for the Cloud Build DAG validation checks. It runs tests on the DAG after a pull request is made to the GitHub repo. Thus, automating the testing process for the DAG and ensuring their quality and stability. It defined three steps to be executed by the Cloud Build job:

  • Install the dependencies needed by our DAGs.
  • Install the dependencies needed by our unit tests.
  • Run the DAG tests to validate their correctness and reliability.

utils — The utility contains a script, add_dags_to_composer.pywhich syncs the DAGs with your Cloud Composer environment after they are merged to the main branch in the repository. And requirements.txt which contains the packages required by the add_dags_to_composer.py.

This utility script copies DAG files from the dags/ directory in the repository to a temporary directory, ignoring non-DAG Python files. It then uploads all files from the temporary directory to the dags/ folder in the Cloud Composer environment using the Cloud Storage client library.

dags-to-composer.cloudbuild.yaml: this is a YAML configuration file for the Cloud Build sync job. It performs syncing after the pull request is approved and the code is merged to the main branch in the GitHub repository. It defined three steps to be executed by the Cloud Build job:

  • Install the dependencies needed by the DAGs utility script.
  • Update the Cloud Composer environment to install the yfinance dependency.
  • Run the utility script to sync the DAGs in the repository with the Cloud Composer environment.

The complete code can be seen in this GitHub repo.

Prepare the environment

To establish a continuous integration and delivery (CI/CD) process that synchronizes DAGs in the Composer environment, we will follow the steps below, with GitHub as our version control system:

  1. Create two airflow environments, to serve as development and production environments.
  2. Create the Cloud Build trigger

Create two airflow environments, to serve as development and production environments.

Development Environment

gcloud composer environments create dev-environment \
--location us-central1 \
--image-version composer-1.20.5-airflow-2.3.4 \
--service-account "example-account@example-project.iam.gserviceaccount.com"

Production Environment

gcloud composer environments create prod-environment \
--location us-central1 \
--image-version composer-1.20.5-airflow-2.3.4 \
--service-account "example-account@example-project.iam.gserviceaccount.com"

Create the Cloud Build triggers

We will create two Cloud Build Jobs; one for the pre-submission check and the other to sync to the Cloud Composer environment. Follow the guide about building repositories from GitHub to create a GitHub app-based trigger.

Pre-submission check Cloud Build trigger configurations:

Creating and managing build triggers

Configure the Cloud Build Trigger as follows:

Name: dag-tests
Region:us-central1
Description: Pre-submission check — run a validation test on DAGs
Event: Pull Request

Source — Repository: choose your repository
Source — Base branch: .* any branch (change to the name of your repository’s base branch, if required)
Source — Comment Control: not required

Build Configuration — Cloud Build configuration file: test-dags.cloudbuild.yaml (the path to your build file)

Then click on create.

DAG sync job Cloud Build trigger configurations:

Configure the Cloud Build Trigger as follows:

Name: add-dags-to-composer
Region:us-central1
Description: DAG sync job
Event: Push to a branch

Source — Repository: choose your repository
Source — Base branch: ^main$ (change main to the name of your repository’s base branch, if required)

Click on ‘Show Included and ignored file filters’
Source — Included files filter (glob): dags/**

Build Configuration — Cloud Build configuration file: /dags-to-composer.cloudbuild.yaml (the path to your build file)

In the Advanced configuration, add two substitution variables:

_DAGS_DIRECTORY — the directory where dags are located in your repository. In this case, it is dags/.

_DAGS_BUCKET — the Cloud Storage bucket that contains the dags/ folder in your development Cloud Composer environment without the gs:// prefix. In this case, us-central1-dev-environment-aabfb162-bucket.

_cOMPOSER_NAME— dev-environment (change to the name of your Cloud Composer environment, if required)

_COMPOSER_REGION — us-central1

Then click on create.

Please note that if you don’t remove the “dags/” suffix from the Cloud Storage bucket name in the _DAGS_BUCKET substitution variable, the DAG sync job will fail with NotFound error. This is because in Cloud Storage, “dags/” is considered a folder that is part of the object name, not the bucket name.

Also, if you are using the default Cloud Build service account, you need to grant it permission to manage Cloud Composer environments. That is Composer Administrator (composer.admin) role and Service Account User (iam.serviceAccountUser) role.

After the Triggers have been created, go to the Trigger page and see that both triggers are listed.

That’s it! The workflow has been automated. To confirm that everything is working optimally, take the following steps:

  • Make a change to a DAG and push that change to a development branch in your repository
  • Open a pull request against the main branch of your repository
Pull request against the main branch of your repository
  • Cloud Build runs unit tests to check your DAG is valid
Cloud Build runs unit tests to check the DAG is valid
  • If your pull request is approved and merged into your main branch
  • Cloud Build syncs your development Cloud Composer environment with these new changes
Cloud Build syncs your development Cloud Composer environment

Note: If you run an update on a Cloud Composer environment and there are no changes from the previous update, the gcloud composer environments update command will throw an error that looks like this:

ERROR: (gcloud.composer.environments.update) INVALID_ARGUMENT: No change in configuration. Must specify a change to configuration.software_configuration.pypi_dependencies

To avoid this error and suppress that specific error message, the following Bash command could be used:

gcloud composer environments update ${_COMPOSER_NAME} --location=${_COMPOSER_REGION} --update-pypi-packages-from-file=requirements-composer.txt 2>&1 | (grep -v 'No change in configuration. Must specify a change to configuration.software_configuration.pypi_dependencies' || true)

The 2>&1 redirects any error messages to standard output, and grep -v filters out the specific error message. The || true ensures that the grep command does not exit with an error status if it fails to find any matches.

  • You verify that the DAG behaves as expected in your development environment.

Here, the development environment has been updated with the yfinance package.

gcloud composer environments update

And the stock_data_dag.py DAG file has been successfully uploaded to the GSC bucket and executed successfully.

Airflow web server

If your DAG in the developments environment works as expected, you can manually sync the DAG to your production Cloud Composer environment.

To deploy to the production environment, it is recommended to be strategic and choose a time that is least disruptive if things were to go wrong and also, it would be nice to have a human review and carefully deploy to production.

You could then run a similar Cloud Build job. But instead of running all of those tests, it will pull from the main GitHub repository, and run the same sync dag Cloud Build job, but this time points it to the production environment.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this blog, delete the resources used.

Thanks for reading. You are welcome to follow me on LinkedIn or Twitter.

Resources

--

--

Amarachi Ogu