Setting up Airflow on a local Kubernetes cluster using Helm
In this post, I will cover steps to setup production-like Airflow scheduler, worker and a webserver on a local Kubernetes cluster. Later on, I will use the same K8s cluster to schedule ETL tasks using Airflow.
Quickstart a Kubernetes single-node cluster
If you are on Mac, it’s just a matter of a ticking a checkbox and restarting the docker app as follows. This will enable a single-node K8s cluster on your local.
Once you have K8s running, install the K8s dashboard by running the following:
# Install dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml# Find list of secrets
kubectl get secrets# Copy the token value from the output of this command
kubectl describe secret default-token-rcsnr# Access the dashboard
kubectl proxy
Now that you have the token and you are running the proxy, just hit the url that was printed out and use your token to login to the K8s dashboard.
Setting up Airflow using Helm
Helm is a package manager for Kubernetes (think Homebrew if you are familiar with Mac). A more thorough introduction can be found here. You can install it like this on your local:
brew install helm
Add the official Helm stable charts repository:
helm repo add stable https://kubernetes-charts.storage.googleapis.com/
You would also need to clone my repo before we proceed further
git clone git@github.com:mrafayaleem/etl-series.git
Now, change the path on line 12 in chapter1/airflow-helm-config.yaml
to the absolute path for your local machine.
In the following snippet, I am creating a volume from my local directory. And then, through extraVolumeMounts
, I am mounting the volume to the mountPath
inside each of the Airflow pods (scheduler, webserver and worker).
###################################
# Airflow - Common Configs
###################################
airflow:
extraVolumeMounts: # this will get the volume and mount it to that path in the container
- name: dags
mountPath: /opt/airflow/dags # location in the container it will put the directory mentioned below.
extraVolumes: # this will create the volume from the directory
- name: dags
hostPath:
path: "/Users/aleemr/powerhouse/etl-series/dags" # For you this is something like /<absolute-path>/etl-series/dags
With this, you would be able to write dags on your local machine and let Airflow running inside K8s pick it up from there.
Install Airflow with the following command:
helm install airflow stable/airflow -f chapter1/airflow-helm-config.yaml --version 7.2.0
Check the status of that installation:
helm list
Once deployed, run the following and open up http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/node?namespace=default
export POD_NAME=$(kubectl get pods --namespace default -l "component=web,app=airflow" -o jsonpath="{.items[0].metadata.name}")
echo http://127.0.0.1:8080
kubectl port-forward --namespace default $POD_NAME 8080:8080
You should see the sample dag that I have added in the repo. Enable the dag using toggle, trigger it and see it in action.
Congratulations! You have deployed your first dag on a K8s cluster using Airflow.
In the next post, I will cover KubernetesExecutor and KubernetesPodOperator in detail.
Where to go next from here
- Read up on the differences between Replica Sets and Stateful Sets in K8s.
- Figure out why airflow-postgresql, airflow-redis-master and airflow-worker are deployed under Stateful Sets while everything else is in Replica Sets.