In the previous tutorial, we didn’t delve into the concept of Executors in Airflow. In Airflow, Executors define a mechanism by which task instances get run.

Repo link: https://github.com/mrafayaleem/etl-series

When you spun up an Airflow cluster using helm in the previous tutorial, you might have noticed a scheduler and a…


In this post, I will cover steps to setup production-like Airflow scheduler, worker and a webserver on a local Kubernetes cluster. Later on, I will use the same K8s cluster to schedule ETL tasks using Airflow.

Quickstart a Kubernetes single-node cluster

If you are on Mac, it’s just a matter of a ticking a checkbox…


This series is meant to cover a broad range of topics that involve setting up a production grade ETL pipeline. I have broken it down into chapters and more of them will be added as we move along this series.

I am a Mac user so everything that is written is in context of software and tools installed on OS X. You should be able to find and install equivalent versions of those for your own setup.

Feel free to leave any feedback on my Twitter handle or through my website.

Chapter 1 - Orchestration basics: Setting up Airflow in a local Kubernetes cluster using helm

Chapter 2 - Introduction to KubernetesExecutor and KubernetesPodOperator


In this post, we will aim at having a deeper understanding of Random Forest by taking OHLC (open-high-low-close) data for Tesla stock for 5 year period. We will keep unrelated discussion at a minimum and try to focus on everything that is related to our understanding of Random Forest for…


Imagine you have to rewrite an existing web service to move to a new payment gateway (PSP or payment service provider) due to various business use cases. Your first thought might be to replace old with the new one in its entirety and roll that out. That is a naive…


Imagine you have to rewrite an existing web service to move to a new payment gateway (PSP or payment service provider) due to various business use cases. Your first thought might be to replace old with the new one in its entirety and roll that out. That is a naive…


At dubizzle OLX, we recently reached a roadblock on a project where we had to change schema on around 130 tables in our production database with master/slave replication. The simplest possible solution that could have been was to recreate same tables with new schemas, copy the data and update our…


Imagine you have to rewrite an existing web service to move to a new payment gateway (PSP or payment service provider) due to various business use cases. Your first thought might be to replace old with the new one in its entirety and roll that out. That is a naive…


31 Mar 2016

v1.0

Note: This post is open to suggestions that can help achieve fairer results with these benchmarks. It is a versioned post and I would be incrementing it if anything related to these benchmarks changes. Changes can be tracked here as well.

In my previous post, I…


19 Mar 2016

Note: A complementing repository for this post can be found at: https://github.com/mrafayaleem/kafka-jython

With the release of Kafka 0.9.0, the consumer API was redesigned to remove the dependency between consumer and Zookeeper. Prior to the 0.9.0 release, Kafka consumer was dependent on Zookeeper for storing its offsets and…

Rafay Aleem

Data Engineer at PointClickCare. Based in Toronto. Music aficionado who likes playing guitar and is an Eric Clapton fan.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store