Member-only story
Beneath the Surface: A Closer Look at 4 Airflow Internals
Four Apache Airflow internals you might have missed
I have been working with Airflow for more than three years now and overall, I am quite confident with it. It’s a powerful orchestrator that helps me build data pipelines quickly and in a scalable fashion while for most things I am looking to implement it comes with batteries included.
Recently, and while preparing myself to get a certification for Airflow, I’ve come across many different things I had literally no clue about. And this was essentially my motivation to write this article and share with you a few Airflow internals that have totally blown my mind!
1. Scheduler only parses files containing certain keywords
The Airflow Scheduler will parse only files containing airflow
or dag
in the code! Yes, you’ve heard this right! If a file under the DAG folder does not contain at least one of these two keywords, it will simply not be parsed by the scheduler.
If you want to modify this rule such that this is no longer a requirement for the scheduler, you can simply set DAG_DISCOVERY_SAFE_MODE
configuration setting to False
. In that case, the scheduler will parse all files under your DAG folder (/dags
).