A Beginner’s Guide to Data Engineering — Part I

Data Engineering: The Close Cousin of Data Science

Image credit: A beautiful former slaughterhouse / warehouse at Matadero Madrid, architected by Iñaqui Carnicero


Organization of This Beginner’s Guide

My First Industry Job out of Graduate School

Image credit: Me building ETL pipelines diligently (guy in blue in the middle)

The Hierarchy of Analytics

Source: Monica Rogati’s fantastic Medium post “The AI Hierarchy of Needs”

Building Data Foundations & Warehouses

Source: Jeff Hammerbacher’s slide from UC Berkeley CS 194 course

ETL: Extract, Transform, and Load

Source: Vineet Goel’s “Why Robinhood uses Airflow?” Medium Post
Source: Arthur Wiedmer’s workshop from DataEngConf SF 2017

Choosing ETL Frameworks

Source: Marton Trencseni’s comparison between Luigi, Airflow, and Pinball

Two Paradigms: SQL- v.s. JVM-Centric ETL

Video Source: Josh Wills’ Keynote @ DataEngConf SF 2016

Wrapping Up Beginner’s Guide — Part I

Data @Airbnb, previously @Twitter. Opinions are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store