Setting up Airflow on Kubernetes with AWS EFS

Emehinola Idowu
Terragon Tech Blog
Published in
4 min readMar 15, 2019

--

AWS EFS volume mounted on Airflow running with KubenertesExecutor on Kubernetes.

Working with kubernetes can be a very complex task, setting up airflow on kubernetes and getting airflow to mount an AWS Elastic File System (EFS) volume instead of on AWS Elastic Block Store (EBS) volume can even be more complex.

In this article, we are going to be looking at a step by step process of how to set up airflow on kubernetes and mount an AWS EFS volume to it but before we get started, let's talk about the prerequisites.

Prerequisites:

  • Have a running Kubernetes cluster (AWS EKS) with at least one worker node. — tip: you can easily get this up and running with eksctl.
  • Get kubectl, helm and git installed.
  • (Optional) Import the created cluster to Rancher 2.x to have a beautiful visual representation of your Kubernetes resources.

Download the helm chart for airflow and change some configuration.

At this point, I assume that kubernetes is already running and that kubectl is already authenticated to the right cluster. The first action steps it to get the helm chart for airflow from GitHub. We would be using the KubernetesExecutor in the airflow config so as to have seamless integration with Kubernetes. you can explore the airflow-kube-helm for more information.

$ git clone https://github.com/BrechtDeVlieger/airflow-kube-helm.git
$ cd airflow-kube-helm

Initialize helm in kubernetes

$ kubectl apply -f airflow/tiller.yaml
$ helm init --service-account tiller

Build the docker image

We need to build a customized version of the puckel/docker-airflow image.

$ ./examples/kube/docker/build-docker.sh <YOUR/IMAGE/URL> <TAG>

Now, push the newly built docker image to your repository.

$ docker push <YOUR/IMAGE/URL>:<TAG>

Replace the image URL and tag in examples/kube/dags-volume-readwritemany/values.yaml

Create the EFS volume

Creating an AWS EFS volume only takes 3 simple steps on the AWS console. you can check this out in the AWS Documentation.

Tip: Make sure the EFS volume is in the same VPC as the nodes in the cluster and make sure the security group of the nodes allows access to the EFS resource.

Deploy the EFS provisioner

Kubernetes includes some Persistent Volume types by default which includes GCEPersistentDisk, AWSElasticBlockStore, AzureFile, AzureDisk, AzureDisk e.t.c. Each of these PV types defines a specific provisioner that is used to create it’s PV.

Since Kubernetes does not support all PV types by default, the kubernetes-incubator external-storage repository holds a lot more provisioners for kubernetes. That is with the EFS Provisioner we need is located.

The EFS provisioner is a kubernetes deployment that runs a pod with access to the AWS EFS resource. It acts as an EFS broker, allowing other pods to mount the EFS resource as a PV.

The Kubemanifest file below defines a ConfigMap, Deployment, StorageClass and PersistentVolumeClaim for the EFS prisoner.

Before you use the manifest file for EFS provisioner, you will need to create a namespace.

$ kubectl apply -f namespace.json

you can not “apply” this file by running:

$ kubectl apply -f manifest.yaml -n <NAMESPACE>

Deploy the NFS Server

We would need an NFS server to provision ReadWriteMany volumes.

run:

$ kubectl apply -f nfs-server.yaml -n <NAMESPACE>

Deploy EFS Volumes

deploy volumes to be mounted by airflow components. we would be deploying one volume for dags and another for logs.

$ kubectl apply -f volumes.yaml -n <NAMESPACE>

Install Airflow using helm

That’s a lot of stories, let’s go back to airflow. Deploying airflow is a simple one-liner

$ cd airflow-kube-helm
$ helm upgrade --install airflow airflow/ --namespace airflow --values examples/kube/dags-volume-readwritemany/values.yaml

The output should along the lines of:

Release "airflow" does not exist. Installing it now.
NAME: airflow
LAST DEPLOYED: Thu Mar 14 16:21:02 2019
NAMESPACE: airflow
STATUS: DEPLOYED
RESOURCES:
==> v1/ClusterRole
NAME AGE
airflow-cluster-access 1s
==> v1/ConfigMap
NAME DATA AGE
airflow-config 1 1s
airflow-init 1 1s
airflow-init-dags 1 1s
airflow-nginx 1 1s
airflow-postgresql 0 1s
[---]

You can then run kubectl get all -n airflow to check out the status on your pods (and other resources).

[UPDATE] Solving permission issues in scheduler

Sometimes after the setup is done, you might be experiencing a permission issue in the “scheduler” when running task on the airflow dashboard. A fix for this is to add roles to the airflow namespace.

Create a file named role.yaml and the following configuration.

and then run:

kubectl apply -f role.yaml

Whew! And we are done…

Before You Go —

Did you know that you can give up to 50 👏’s by pressing down on the 👏 button? Give it a try if you liked this article!

--

--

Emehinola Idowu
Terragon Tech Blog

SRE/DevOps Engineer — I write about things I have learnt so I can come back and read them when I am stuck.