A Beginner’s Guide to Data Engineering — The Series Finale
From ETL Pipelines To Data Engineering Frameworks
At Last, The Finale
From Part I of this series, we learned that a robust data foundation is an important prerequisite for pursuing analytics, be it business intelligence, experimentation, or machine learning. From Part II, we delved deeply into the specifics of Airflow and discussed techniques such as data modeling, star schema, and normalization. Furthermore, we walked through examples of data pipelines and covered ETL best practices. These are all important skills to learn to become an effective data scientist.
Typically, as companies climb up the hierarchy of data analytics and scale beyond a small team, complexity and development costs often increase. At Airbnb, we have more than 100+ contributors who authored Airflow pipelines. This makes enforcing ETL best practices, upholding data quality, and standardizing workflows increasingly challenging. Luckily, one of the antidotes to complexity is the power of abstraction. This principle, of course, is no exception when it comes to data engineering.
In data engineering, abstraction often means identifying and automating ETL patterns that are common in peoples’ workflows. In this final post, we will define the concept of a data…