Connecting GCP Composer tasks to Cloud SQL

Arik Liber
3 min readMar 1, 2019

--

In this short article, I will outline the required steps to connect to a GCP SQL instance — both from a local development machine and from a GCP Composer tasks running from a Kubernetes cluster.

While everything in this article is already floating around the web — I wanted to have it documented clearly and as explicit as possible for those who follow.

What’s the problem?

If you are coming to GCP from AWS as I do, chances are you had an RDS instance and connected to the database using a VPN or a bastion host in the same VPC.

To connect to GCP SQL however and especially from a Composer cluster — you will need something else called a Cloud SQL Proxy.

What’s Cloud SQL Proxy?

TL;DR “The Cloud SQL Proxy provides secure access to your Cloud SQL Second Generation instances without having to whitelist IP addresses or configure SSL .. works by having a local client, called the proxy, running in the local environment. Your application communicates with the proxy with the standard database protocol used by your database. The proxy uses a secure tunnel to communicate with its companion process running on the server”

Sounds complicated, do I have to do it?

There are other ways like whitelisting your IP and connecting directly but If you are planning to connect to GCP SQL from a Composer job — you better get used to the Cloud SQL Proxy.

Connecting locally

Now that we understand what and why, the easiest way to create the bridge between your development environment and GCP SQL is by running the Cloud SQL Proxy from your local machine, as documented here:

TL;DR Locate your JSON key for GCP and run the following command :

  1. Replace key.json with a path to your JSON key
  2. Replace <PROJECT>:<REGION>:<INSTACNE> with the instance connection name, found on the “Instance details” page:
Grab the “Instance connection name”

Since I have a local MySQL on my dev machine which uses port 3306 (and probably so do you) — I started the proxy on port 3309.

Test the connection by running the following command (Mac):

Replace <USER> with your SQL username, probably root.
If the above didn’t work, just try mysql instead of /usr/...

Connecting from Composer tasks (Kubernetes cluster)

TL;DR Follow these steps:

The official article by google on that matter is a good start but this other official document mentioned above is exactly what we need for our Composer setup for two main reasons:

  1. The “sidecar” container pattern mentioned in the first is not the best approach because if you are using a Kubernetes operator as I do (more on that in a later post) — it makes no sense (even if possible) to launch a sidecar in a short-lived task container.
  2. The solution which is documented in the second one creates a service that is accessible from any pod in the cluster and that’s what we need with our Kubernetes operator. It does so by binding the proxy to 0.0.0.0

Testing

After following the tutorial above you can test the presence of the proxy by running kubectl get pods in your terminal and making sure cloudsqlproxy is present and in a Running status.

You can then tunnel into the airflow scheduler (grab its name from the list above, something like `airflow-scheduler-86cb5dbc4f-cgrtd` ) like so:

and then trying to connect to the database, as we did from our local machine:

If you have successfully connected — rest assured your pods will have access to that database as well.

Good luck!

--

--