Backporting the latest Google Cloud operators in Apache Airflow and Cloud Composer

Israel Herraiz
Google Cloud - Community
3 min readNov 20, 2020
Photo by Raul Cacho Oses on Unsplash

The Google Cloud operators for Apache Airflow offer a convenient way to connect to services such as BigQuery, Dataflow, Dataproc, from your DAG. If you are not in the latest Airflow version, you might be stuck with a set of operators that don’t bring the latest available functionality.

With the fast pace of evolution of cloud services, this can become an issue that prevents you from leveraging the latest Google Cloud features in your data pipelines. In this post we show how to backport the latest version of the Google Cloud operators, with a focus on Cloud Composer (the Google Cloud’s managed version of Apache Airflow), so you can use them without being forced to upgrade your Airflow version.

For this post, let’s assume that you are using Airflow 1.10.6. The latest available version of Airflow with Cloud Composer is 1.10.10. So in this example, the Google Cloud operators that would be available by default in your system are the ones that were shipped with Airflow 1.10.6.

For instance, the latest version of the DataprocSubmitJobOperator has an additional option that is handy if the Airflow web UI restarts. You can either reattach to an existing job or launch it again (see AIRFLOW-3211).

What if you are in an older version of Airflow and want to use that behavior? How do I update Composer to bring those operators to my current environments?

You don’t have to update your existing Airflow version. You can bring the latest available version of the Google Cloud operators to your Airflow version deployed in your Composer instance, without any downtime or cluster upgrade procedure. The procedure shown here works for Airflow 1.10.* and will allow you to bring any operator from Airflow 2.* to your existing Airflow. Full details about these backport packages are given in the Airflow documentation.

Assuming that your Composer instance is called <COMPOSER_INSTANCE>, located in the region <COMPOSER_REGION>, and that you have permissions to update the instance, to install a backport package in Cloud Composer just run:

gcloud composer environments update --location=<COMPOSER_REGION>       --update-pypi-package=apache-airflow-backport-providers-google <COMPOSER_INSTANCE>

The update will take some minutes. After that, any new DAG run will have the new operators available.

How do I check that if the update succeeded?

After the update, you might be wondering, did the update actually succeed? You need to connect to the worker and check the installed Python dependencies. Once you have created the necessary kubectl configuration, find out the name of one of the worker pods of Airflow. First, check the namespace where the Airflow workers are running, with

kubectl get namespaces

You will get an output similar to this one (in bold face, the namespace where the Airflow worker pods are running):

NAME                                      STATUS AGE
composer-1–10–4-airflow-1–10–6-f15cbe22 Active 160d
default Active 160d
kube-node-lease Active 160d
kube-public Active 160d
kube-system Active 160d

Now let's obtain the pod names in that namespace (change composer-1–10–4-airflow-1–10–6-f15cbe22 by your namespace).

kubectl -n composer-1–10–4-airflow-1–10–6-f15cbe22 get pods

You will get a list like the following. We need to take note of one of the worker pod names:


NAME READY STATUS RESTARTS AGE
airflow-scheduler-6b5cff877f-j9l4t 2/2 Running 0 27m
airflow-worker-d977fffc4–6l9vb 2/2 Running 0 25s
airflow-worker-d977fffc4-kktz2 2/2 Running 0 27s

Now we can connect to one of the workers:

kubectl exec -itn <COMPOSER_NAMESPACE> <WORKER_NAME> --/bin/bash

where <WORKER_NAME> could be either airflow-worker-d977fffc4–6l9vb or airflow-worker-d977fffc4-kktz2 in the list above, and <COMPOSER_NAMESPACE> is composer-1–10–4-airflow-1–10–6-f15cbe22 in this example.

Then you can check the new dependency that we just installed:

pip freeze | grep airflow-backport

You should get an output similar to this one (maybe with a newer version, depending on when you run the command):

apache-airflow-backport-providers-google==2020.11.13

Cloud Composer offers a managed service for Apache Airflow. When you create a Composer instance, you will be pinned to a specific Airflow version, unless you undergo an upgrade for the full cluster. However, in order to enjoy the latest version of the Google Cloud operators, you do not need to update the Airflow version: you can backport the latest operators, even from Airflow 2.x, to any Composer instance using Airflow 1.10.x.

--

--