Member-only story
Here Is What I Learned Using Apache Airflow over 6 Years
A journey with Apache Airflow from experiment to production hassle-free
Apache Airflow is undoubtedly the most popular open-source project for data engineering for years. It gains popularity at the right time with The Rise Of Data Engineer, and the core concept of making code as the first-class citizen instead of drag and drop for data pipeline (aka. ETL) is a milestone. The Apache Airflow became an Apache Incubator project in March 2016 and became the top project in January 2019. I have worked on Apache Airflow since 2017 as a user. Along the way, I also contributed to Apache Airflow. Today, I want to share my journey with Airflow and what I learned over 6 years.
What is Airflow
Airflow is a platform created by the community to programmatically author, schedule and monitor workflows. — Airflow Offical Documentation
Apache Airflow is developed by Maxime Beauchemin, who is an ex-Airbnb and ex-Facebook. He developed Airflow at Airbnb (you can tell from the project name). However, the core idea that inspired him was an internal tool used at Facebook.
The primary users of Apache Airflow are data engineers or engineer who needs to schedule workflows, mainly the ETL (Extraction, Transformation…