Should You Use Apache Airflow?

Why Data Engineers Love/Hate Airflow

Ben Rogojan
SeattleDataGuy By SeattleDataGuy

--

Photo by T K on Unsplash

Data pipelines are a key component in any company’s data infrastructure.

One framework that many companies utilize to manage their data extracts and transforms is Airflow. Whether it’s 100% using Airflow and its various operators or using Airflow to orchestrate other components such as Airbyte and dbt.

For those unaware Airflow was developed back in 2014 at Airbnb as a method to help manage their ever-growing need for complex data pipelines and it rapidly gained popularity outside of Airbnb because it was open-source and met a lot of the needs data engineers had.

Now, nearly a decade later, many of us have started to see the cracks in its armor. We have seen its Airflow-isms( Sarah Krasnik).

In turn, this has led to many new frameworks in the Python data pipeline space such as Prefect and Dagster.

Many of us still rely on Airflow. But Airflow has its fair share of quirks and limitations. Many of which don’t become obvious until a team attempts to producitonize and manage Airflow in a far more demanding data culture.

So in this update, I wanted to talk about why data engineers love/hate airflow. Of course, I didn’t just want to state my opinion. Instead, I have interviewed several…

--

--

Ben Rogojan
SeattleDataGuy By SeattleDataGuy

#Data #Engineer, Strategy Development Consultant and All Around Data Guy #deeplearning #dataengineering #datascience #tech https://linktr.ee/SeattleDataGuy