How to set up Cloud Composer Environment and set up a connection to bigquery.

Hemant Anand Khandare
4 min readFeb 25, 2022

--

If you want to set up Airflow in your GCP cloud environment, what are the options available in GCP?

There are two ways :

1>Spin up the GCE instance in GCP and then install the Airflow.

2>Use GCP managed native service “Cloud composer”.

We will explore the second option here.

As usual, go to the GCP console and click on the hamburger menu.

Go to the Analytics section and click on the Composer.

Click on the create and select Composer 1 for small development work.

Give some name to the environment and select the location where you wish to install the Composer.

Select the Machine type and Disk size to 100 GB.

Please note if you select the Disk size to 20 GB then it may give you an error while the environment is being created as below.

“Your environment could not complete its creation process because it could not successfully initialize the Airflow database. This can happen when the GKE cluster is unable to reach the SQL database over the network.”

As per the google recommendation, we should use at least 100GB for the Composer version which is 1.17.10.

Now scroll down and hit on create button and wait for some time to create the environment.

You would see your environment name on the page, click on it, and go to Environment Configuration to see the details.

Now next step is to create a Bigquery connection with Airflow.
We need to create the service account to automate the BigQueries scripts.

Go to IAM and Admin and click on the service account.

On the service, account Page click on the create connection.

Give some names and click on create and continue.

Under Roles, assign the basic role “Editor” for development work and scroll down to click on the Done.
Please note it is always advised to have “Least Privileges” on resources.

You would see the connection name on the page, click on it, and go to the Key tab.

Click on the Add key and select “create new Key” and download the JSON key.

Note down the JSON key name.

Go to the Cloud Composer page>Click on the DAGs

This would open the GCS bucket. Create a folder with the name “conn”

Click on the “conn” folder and upload the json key downloaded earlier into this folder.

The next step is to configure the connection in Airflow UI.

Click on the Airflow UI from the main page.

Go to the Admin>connection.

Create new connection with scope as “https://www.googleapis.com/auth/cloud-platform

and save the connection.

Now the connection is ready , we can write the dags and use this connection inside the task operators.

eg.

#operators 2 defined
check_count = BigQueryCheckOperator(
task_id=”check_count”,
sql=src_sql1,
use_legacy_sql=False,
gcp_conn_id=”my_gcp_connection”)

Thanks for your time.

Enjoy!!

--

--