TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Break Up a Big Airflow DAG into Multiple Files

4 min readAug 11, 2021

--

Man on dock at sunset slices pineapple with knife
Photo by Daniel Lincoln on Unsplash

I was working on a Airflow DAG file that was growing into the hundreds of lines. Making changes required bouncing back-and-forth around the file, taking notes on a scratchpad to get everything right. Once I got to the point of opening multiple views of the DAG file in an IDE, I knew it was a good time to stop and find a way to break up the DAG into smaller pieces.

With the advent of TaskGroups in Airflow 2, it’s both conceptually and practically easier to break a big DAG into pieces. The pieces can be reüsed and, of course, they’re easier to update and maintain.

TaskGroups are just UI groupings for related tasks, but the groupings tend to be logical. The tasks in a TaskGroup can be bundled up and abstracted away to make it easier to build a DAG from units that are larger than individual tasks. That being said, TaskGroups aren’t the only way to group tasks and move them out of your DAG file. You could also have a logical chunk of tasks that don’t sit within a TaskGroup. The downside to this latter approach is that you lose the benefits of collapsing the tasks into a single node in the web UI graph view of a DAG run.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Responses (3)