Setting up Airflow on Kubernetes with AWS EFS
Working with kubernetes can be a very complex task, setting up airflow on kubernetes and getting airflow to mount an AWS Elastic File System (EFS) volume instead of on AWS Elastic Block Store (EBS) volume can even be more complex.
In this article, we are going to be looking at a step by step process of how to set up airflow on kubernetes and mount an AWS EFS volume to it but before we get started, let's talk about the prerequisites.
Prerequisites:
- Have a running Kubernetes cluster (AWS EKS) with at least one worker node. — tip: you can easily get this up and running with eksctl.
- Get kubectl, helm and git installed.
- (Optional) Import the created cluster to Rancher 2.x to have a beautiful visual representation of your Kubernetes resources.
Download the helm chart for airflow and change some configuration.
At this point, I assume that kubernetes is already running and that kubectl is already authenticated to the right cluster. The first action steps it to get the helm chart for airflow from GitHub. We would be using the KubernetesExecutor
in the airflow config so as to have seamless integration with Kubernetes. you can explore the airflow-kube-helm for more information.
$ git clone https://github.com/BrechtDeVlieger/airflow-kube-helm.git
$ cd airflow-kube-helm
Initialize helm in kubernetes
$ kubectl apply -f airflow/tiller.yaml
$ helm init --service-account tiller
Build the docker image
We need to build a customized version of the puckel/docker-airflow
image.
$ ./examples/kube/docker/build-docker.sh <YOUR/IMAGE/URL> <TAG>
Now, push the newly built docker image to your repository.
$ docker push <YOUR/IMAGE/URL>:<TAG>
Replace the image URL and tag in examples/kube/dags-volume-readwritemany/values.yaml
Create the EFS volume
Creating an AWS EFS volume only takes 3 simple steps on the AWS console. you can check this out in the AWS Documentation.
Tip: Make sure the EFS volume is in the same VPC as the nodes in the cluster and make sure the security group of the nodes allows access to the EFS resource.
Deploy the EFS provisioner
Kubernetes includes some Persistent Volume types by default which includes GCEPersistentDisk, AWSElasticBlockStore, AzureFile, AzureDisk, AzureDisk e.t.c. Each of these PV types defines a specific provisioner that is used to create it’s PV.
Since Kubernetes does not support all PV types by default, the kubernetes-incubator external-storage repository holds a lot more provisioners for kubernetes. That is with the EFS Provisioner we need is located.
The EFS provisioner is a kubernetes deployment that runs a pod with access to the AWS EFS resource. It acts as an EFS broker, allowing other pods to mount the EFS resource as a PV.
The Kubemanifest file below defines a ConfigMap
, Deployment
, StorageClass
and PersistentVolumeClaim
for the EFS prisoner.
Before you use the manifest file for EFS provisioner, you will need to create a namespace.
$ kubectl apply -f namespace.json
you can not “apply” this file by running:
$ kubectl apply -f manifest.yaml -n <NAMESPACE>
Deploy the NFS Server
We would need an NFS server to provision ReadWriteMany volumes.
run:
$ kubectl apply -f nfs-server.yaml -n <NAMESPACE>
Deploy EFS Volumes
deploy volumes to be mounted by airflow components. we would be deploying one volume for dags and another for logs.
$ kubectl apply -f volumes.yaml -n <NAMESPACE>
Install Airflow using helm
That’s a lot of stories, let’s go back to airflow. Deploying airflow is a simple one-liner
$ cd airflow-kube-helm
$ helm upgrade --install airflow airflow/ --namespace airflow --values examples/kube/dags-volume-readwritemany/values.yaml
The output should along the lines of:
Release "airflow" does not exist. Installing it now.
NAME: airflow
LAST DEPLOYED: Thu Mar 14 16:21:02 2019
NAMESPACE: airflow
STATUS: DEPLOYEDRESOURCES:
==> v1/ClusterRole
NAME AGE
airflow-cluster-access 1s==> v1/ConfigMap
NAME DATA AGE
airflow-config 1 1s
airflow-init 1 1s
airflow-init-dags 1 1s
airflow-nginx 1 1s
airflow-postgresql 0 1s[---]
You can then run kubectl get all -n airflow
to check out the status on your pods (and other resources).
[UPDATE] Solving permission issues in scheduler
Sometimes after the setup is done, you might be experiencing a permission issue in the “scheduler” when running task on the airflow dashboard. A fix for this is to add roles to the airflow namespace.
Create a file named role.yaml
and the following configuration.
and then run:
kubectl apply -f role.yaml
Whew! And we are done…
Before You Go —
Did you know that you can give up to 50 👏’s by pressing down on the 👏 button? Give it a try if you liked this article!