Airflow — Dynamic DAG Layout

Saumalya Sarkar
2 min readMay 30, 2020

--

Courtesy: unsplash

Hello World!!!

In my experience of working with Airflow as workflow manager solution after using Several mainstream ETL/Workflow management tools extensively, I found Airflow is by far most fun and easy to develop & manage. The reason behind my claim might be because Airflow brings in configuration-as-code methodology to the plate. I will assume that reader has basic knowledge(whats and whys) on Airflow, if not, give the official documentation a read.

In this article, will go through the process for creating DAG(hardcore ETL guys read workflow) layout dynamically, let’s start with creating a DAG layout using user defined configuration file during run time.

Let’s have a look at the configuration file structure first (feel free to use any config structure, just remember to modify config read and handling part accordingly):

File: https://bitbucket.org/saumalya75/airflowoncontainer/src/master/airflow_home/dags/dag_configuration_demo.json

Now we will have a look at how the configuration file is used. One thing we need to handle explicitly is the data-time objects, so I wrote following helper method as per configuration:

File: https://bitbucket.org/saumalya75/airflowoncontainer/src/master/airflow_home/dags/configurable_data_pipeline_demo.py

And then, read the configuration file:

File: https://bitbucket.org/saumalya75/airflowoncontainer/src/master/airflow_home/dags/configurable_data_pipeline_demo.py

Once the configuration data is converted into json (python native dictionary) object, I had to handle some date-time configuration, which are as follows:

File: https://bitbucket.org/saumalya75/airflowoncontainer/src/master/airflow_home/dags/configurable_data_pipeline_demo.py

And now the fun part, I have used DAG context manager and looped over the configuration to create the DAG layout dynamically. Following is the layout generation part. Please keep in mind, when airflow server is up, scheduler will dynamically compile the DAG file and generate the layout, whenever it refreshes the dag-bag.

File: https://bitbucket.org/saumalya75/airflowoncontainer/src/master/airflow_home/dags/configurable_data_pipeline_demo.py

And That’s how you do it. So, in terms of pythonic way, it’s straight forward, although there are several other ways like using airflow provided BranchOperator, but I find this approach most flexible and easy to handle.

Credit where it’s due:

Although I published the article, it was a joint venture with rohith karnati. He is a python enthusiast, DevOps ninja and Airflow developer. Kudos to his efforts.

All related code is available in this BitBucket repository. Clone it, Fork it, Use it.

Quick note: I have used custom operator and sensor here, please follow my next blog to check out how those can be incorporated.

P.S.: Any suggestions will be highly appreciated. And although I am writing Python for quite a while now, this is my first(most certainly not the last) attempt at Medium. Hope someone will find it useful. I am available at saumalya75@gmail.com and linkedin.com/in/saumalya-sarkar-b3712817b .

--

--

Saumalya Sarkar

Pythonista. Messi Maniac. Data Engineer. ML Practitioner. AI Enthusiasts.