Setting up Airflow on a local Kubernetes cluster using Helm

Rafay Aleem
Uncanny Recursions
Published in
3 min readJun 10, 2020

In this post, I will cover steps to setup production-like Airflow scheduler, worker and a webserver on a local Kubernetes cluster. Later on, I will use the same K8s cluster to schedule ETL tasks using Airflow.

Quickstart a Kubernetes single-node cluster

If you are on Mac, it’s just a matter of a ticking a checkbox and restarting the docker app as follows. This will enable a single-node K8s cluster on your local.

Once you have K8s running, install the K8s dashboard by running the following:

# Install dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml
# Find list of secrets
kubectl get secrets
# Copy the token value from the output of this command
kubectl describe secret default-token-rcsnr
# Access the dashboard
kubectl proxy

Now that you have the token and you are running the proxy, just hit the url that was printed out and use your token to login to the K8s dashboard.

Setting up Airflow using Helm

Helm is a package manager for Kubernetes (think Homebrew if you are familiar with Mac). A more thorough introduction can be found here. You can install it like this on your local:

brew install helm

Add the official Helm stable charts repository:

helm repo add stable https://kubernetes-charts.storage.googleapis.com/

You would also need to clone my repo before we proceed further

git clone git@github.com:mrafayaleem/etl-series.git

Now, change the path on line 12 in chapter1/airflow-helm-config.yaml to the absolute path for your local machine.

In the following snippet, I am creating a volume from my local directory. And then, through extraVolumeMounts , I am mounting the volume to the mountPath inside each of the Airflow pods (scheduler, webserver and worker).

###################################
# Airflow - Common Configs
###################################
airflow:
extraVolumeMounts: # this will get the volume and mount it to that path in the container
- name: dags
mountPath: /opt/airflow/dags # location in the container it will put the directory mentioned below.

extraVolumes: # this will create the volume from the directory
- name: dags
hostPath:
path: "/Users/aleemr/powerhouse/etl-series/dags" # For you this is something like /<absolute-path>/etl-series/dags

With this, you would be able to write dags on your local machine and let Airflow running inside K8s pick it up from there.

Install Airflow with the following command:

helm install airflow stable/airflow -f chapter1/airflow-helm-config.yaml --version 7.2.0

Check the status of that installation:

helm list

Once deployed, run the following and open up http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/node?namespace=default

export POD_NAME=$(kubectl get pods --namespace default -l "component=web,app=airflow" -o jsonpath="{.items[0].metadata.name}")
echo http://127.0.0.1:8080
kubectl port-forward --namespace default $POD_NAME 8080:8080

You should see the sample dag that I have added in the repo. Enable the dag using toggle, trigger it and see it in action.

Running example_dag

Congratulations! You have deployed your first dag on a K8s cluster using Airflow.

In the next post, I will cover KubernetesExecutor and KubernetesPodOperator in detail.

Where to go next from here

  • Read up on the differences between Replica Sets and Stateful Sets in K8s.
  • Figure out why airflow-postgresql, airflow-redis-master and airflow-worker are deployed under Stateful Sets while everything else is in Replica Sets.

--

--

Rafay Aleem
Uncanny Recursions

Senior Data Engineer at Faire. Based in Toronto. Music aficionado who likes playing guitar and is an Eric Clapton fan.