In the previous tutorial, we didn’t delve into the concept of Executors in Airflow. In Airflow, Executors define a mechanism by which task instances get run.

Repo link: https://github.com/mrafayaleem/etl-series

When you spun up an Airflow cluster using helm in the previous tutorial, you might have noticed a scheduler and a worker in your docker ps output. These two components are related to how a CeleryExecutor works in Airflow. We are not going to talk in detail about this type of Executor in this blog. For more details, see this link.

The other type that I would want to touch upon for the sake of brevity is the LocalExecutor. This type of Executor runs a task on a local process inside the scheduler instance. …

In this post, I will cover steps to setup production-like Airflow scheduler, worker and a webserver on a local Kubernetes cluster. Later on, I will use the same K8s cluster to schedule ETL tasks using Airflow.

If you are on Mac, it’s just a matter of a ticking a checkbox and restarting the docker app as follows. This will enable a single-node K8s cluster on your local.

Once you have K8s running, install the K8s dashboard by running the following:

# Install dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml
# Find list of secrets
kubectl get secrets
# Copy the token value from the output of this command
kubectl describe secret…

This series is meant to cover a broad range of topics that involve setting up a production grade ETL pipeline. I have broken it down into chapters and more of them will be added as we move along this series.

I am a Mac user so everything that is written is in context of software and tools installed on OS X. You should be able to find and install equivalent versions of those for your own setup.

Feel free to leave any feedback on my Twitter handle or through my website.

Chapter 1 - Orchestration basics: Setting up Airflow in a local Kubernetes cluster using helm

Chapter 2 - Introduction to KubernetesExecutor and KubernetesPodOperator


Rafay Aleem

Data Engineer at PointClickCare. Based in Toronto. Music aficionado who likes playing guitar and is an Eric Clapton fan.

