A Beginner’s Guide to Data Engineering — The Series Finale

From ETL Pipelines To Data Engineering Frameworks

Robert Chang
12 min readJun 24, 2018
Image credit: Well-designed Data Engineering Frameworks Can Open a Lot of Doors and New Possibilities :)

At Last, The Finale

From Part I of this series, we learned that a robust data foundation is an important prerequisite for pursuing analytics, be it business intelligence, experimentation, or machine learning. From Part II, we delved deeply into the specifics of Airflow and discussed techniques such as data modeling, star schema, and normalization. Furthermore, we walked through examples of data pipelines and covered ETL best practices. These are all important skills to learn to become an effective data scientist.

Typically, as companies climb up the hierarchy of data analytics and scale beyond a small team, complexity and development costs often increase. At Airbnb, we have more than 100+ contributors who authored Airflow pipelines. This makes enforcing ETL best practices, upholding data quality, and standardizing workflows increasingly challenging. Luckily, one of the antidotes to complexity is the power of abstraction. This principle, of course, is no exception when it comes to data engineering.

In data engineering, abstraction often means identifying and automating ETL patterns that are common in peoples’ workflows. In this final post, we will define the concept of a data…

--

--

Robert Chang

Data @Airbnb, previously @Twitter. Opinions are my own.