Saving 75% with a weekday Cloud Composer development environment

Cloud Composer

Google Cloud Platform provides some tools that let us manage our data. We mainly use Cloud Composer — GCP’s managed Airflow service — to schedule some of our data pipelines with all of our data ending up in GCP’s data warehouse BigQuery for analysis.

We previously had development DAGs next to our production DAGs in our single Composer instance. This made cloud environment separation a bit undesirable as we were checking filenames for _dev when setting environment variables. At the beginning of this year we made the decision to streamline our DAG development by having separate development and live Composer instances.

Currently, in April 2019, you can’t turn on and off a Composer environment; you can only create or destroy. To minimise costs and overhead, we went down the route of having a development Composer instance spinning up and down every weekday between 9:45am and 6:15pm being scheduled by our production Composer instance.

Here’s what our spin up DAG looks like:

import datetime as dt

from airflow import DAG
from airflow.operators.bash_operator import BashOperator

OWNER = 'jlaird'
DAG_NAME = 'composer-dev-environment-setup'

default_args = {
'owner': OWNER,
'depends_on_past': False,
'start_date': dt.datetime(2019, 1, 10),
'retries': 3,
'retry_delay': dt.timedelta(minutes=5),
}

dag = DAG(DAG_NAME,
default_args=default_args,
description='Spin up the development Composer environment',
schedule_interval='45 9 * * 1-5',
catchup=False)

# BashOperators are run from a temporary location so a cd is necessary to ensure relative paths in the Bash scripts aren't broken
bash_command = """
cd ~/gcs/dags/scripts/composer
chmod u+x composer_setup.sh
./composer_setup.sh dev
"""
# Note that there is a space at the end of the above command after dev due to: https://airflow.apache.org/howto/operator.html#jinja-template-not-found

BashOperator(
task_id='composer_setup',
dag=dag,
bash_command=bash_command,
)

As installing the PyPi packages takes a while, we spin up at 9:45 which causes the environment to be ready shortly after 10.

The bash file composer_setup.sh referenced looks like:

#!/usr/bin/env bash
set -exo pipefail

source composer_settings.sh ${1}

set -u

# beta must be used to provide --airflow-version
gcloud beta composer environments create ${COMPOSER_NAME} \
--location=${COMPOSER_LOCATION} \
--airflow-configs=core-dags_are_paused_at_creation=True \
--image-version=composer-1.5.0-airflow-1.10.1 \
--disk-size=20GB \
--python-version=2 \
--node-count=3 \
--labels env=${ENVIRONMENT}

COMPOSER_GCS_BUCKET_DATA_FOLDER=$(gcloud composer environments describe ${COMPOSER_NAME} --location ${COMPOSER_LOCATION} | grep 'dagGcsPrefix' | grep -Eo "\S+/")data

echo "Data folder is ${COMPOSER_GCS_BUCKET_DATA_FOLDER}"

# Copy environment's variables file and service account credentials from our cni-analytics GCS bucket
gsutil cp ${ENV_VARIABLES_JSON_GCS_LOCATION} ${COMPOSER_GCS_BUCKET_DATA_FOLDER}
gsutil cp ${CREDENTIALS_JSON_LOCATION} ${COMPOSER_GCS_BUCKET_DATA_FOLDER}

echo "Importing environment variables from ${COMPOSER_INSTANCE_DATA_FOLDER}/${ENV_VARIABLES_JSON_NAME}..."

# Import environment's variables file
gcloud composer environments run ${COMPOSER_NAME} \
--location ${COMPOSER_LOCATION} variables -- \
-i ${COMPOSER_INSTANCE_DATA_FOLDER}/${ENV_VARIABLES_JSON_NAME}

echo "Importing pypi packages from ./pypi_packages..."

# Install PyPi packages from file
gcloud composer environments update ${COMPOSER_NAME} \
--location ${COMPOSER_LOCATION} \
--update-pypi-packages-from-file=pypi_packages
echo "Setting up bigquery_gdrive connection"

gcloud composer environments run ${ENVIRONMENT} \
--location ${COMPOSER_LOCATION} connections -- --add \
--conn_id=bigquery_gdrive --conn_type=google_cloud_platform \
--conn_extra <<EXTRA '{"extra__google_cloud_platform__project": "our-project",
"extra__google_cloud_platform__key_path": "/home/airflow/gcs/data/service-account.json",
"extra__google_cloud_platform__scope": "https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/cloud-platform"}'
EXTRA
echo "Done"

This imports the PyPi packages and copies over the environment variables from our GCS bucket. We also modify the airflow config with core-dags_are_paused_at_creation=True so that we need to explicitly turn DAGs on and stop them from backfilling automatically.

We also create a new connection bigquery_gdrive because we want our Composer to be able to interact with CSVs hosted on Google Drive.

The composer_settings.sh referenced:

#!/usr/bin/env bash

ENVIRONMENT=$1

case ${ENVIRONMENT} in
dev|live)
COMPOSER_NAME=${ENVIRONMENT}
;;
*)
echo "usage: ./composer_setup.sh {dev|live1}" 1>&2
exit
99
;;
esac

COMPOSER_INSTANCE_DATA_FOLDER=/home/airflow/gcs/data
COMPOSER_LOCATION=europe-west1
echo "Operating on environment ${COMPOSER_NAME}" 1>&2
ENV_VARIABLES_JSON_NAME=airflow_env_variables_${ENVIRONMENT}.json
ENV_VARIABLES_JSON_GCS_LOCATION=gs://analytics/credentials/${ENV_VARIABLES_JSON_NAME}

CREDENTIALS_JSON_NAME=service-account.json
CREDENTIALS_JSON_LOCATION=gs://analytics/credentials/${CREDENTIALS_JSON_NAME}

When we want to keep the development environment on overnight for scheduling purposes, we can simply switch the tear down/up DAGs off. When we switch the DAGs back on, we wouldn’t want them to run multiple times for the days missed; using catchup=False in both DAGs prevents this.

Note: catchup=False does not currently behave as expected in Airflow 1.10, Composer 1.5.0.

Our teardown DAG:

import datetime as dt

from airflow import DAG
from airflow.operators.bash_operator import BashOperator

OWNER = 'jlaird'
DAG_NAME = 'composer-dev-environment-teardown'

default_args = {
'owner': OWNER,
'depends_on_past': False,
'start_date': dt.datetime(2019, 1, 10),
'retries': 3,
'retry_delay': dt.timedelta(minutes=5),
}

dag = DAG(DAG_NAME,
default_args=default_args,
description='Tear down the development Composer environment',
schedule_interval='15 18 * * 1-5',
catchup=False)

# BashOperators are run from a temporary location so a cd is necessary to ensure relative paths in the Bash scripts aren't broken
bash_command = """
cd ~/gcs/dags/scripts/composer
chmod u+x composer_teardown.sh
./composer_teardown.sh dev
"""
# Note that there is a space at the end of the above command due to: https://airflow.apache.org/howto/operator.html#jinja-template-not-found

BashOperator(
task_id='composer_teardown',
dag=dag,
bash_command=bash_command,
)

composer_teardown.sh:

#!/usr/bin/env bash
set -exo pipefail

source composer_settings.sh ${1}

set -u

COMPOSER_GCS_BUCKET=$(gcloud composer environments describe ${COMPOSER_NAME} --location europe-west1 | grep 'dagGcsPrefix' | grep -Eo "\S+/")
COMPOSER_GCS_BUCKET_DATA_FOLDER=${COMPOSER_GCS_BUCKET}data

# Export variables file to composer instance
gcloud composer environments run ${COMPOSER_NAME} \
--location ${COMPOSER_LOCATION} variables -- \
-e ${COMPOSER_INSTANCE_DATA_FOLDER}/${ENV_VARIABLES_JSON_NAME} \

# Overwrite saved environment's variables file in analytics GCS bucket
gsutil cp ${COMPOSER_GCS_BUCKET_DATA_FOLDER}/${ENV_VARIABLES_JSON_NAME} ${ENV_VARIABLES_JSON_GCS_LOCATION}

gcloud composer environments delete ${COMPOSER_NAME} \
--location=${COMPOSER_LOCATION} \
--quiet

# Remove GCS bucket as it doesn't get cleaned up when the Composer instance gets deleted
gsutil -m rm -r ${COMPOSER_GCS_BUCKET}

This exports our variables to the same GCS bucket as above. We went with the decision to export variables from the dev environment when spun down but some may prefer for the dev variables to be ephemeral.

Bear in mind that due to the Composer instance being destroyed, you won’t be able to recover the logs via the Airflow UI. However, Composer has built in Stackdriver integration so you will have logs stored for up to 30 days afterwards!

Before having this development environment, our Cloud Composer costs were ~£250/month. By doing this method, we only see an increase of ~£60 to our monthly Cloud Composer costs. That makes sense that by having a development environment up for 25% of the week, it only increases the monthly cost by 25%.

By spinning the environment up and down, that’s ~£190/month saved that the business has to not invest in Bitcoin! Bargain.