A Beginner’s Guide to Data Engineering — Part I

Data Engineering: The Close Cousin of Data Science

Robert Chang
Jan 8, 2018 · 14 min read
Image credit: A beautiful former slaughterhouse / warehouse at Matadero Madrid, architected by Iñaqui Carnicero

Motivation

Organization of This Beginner’s Guide

My First Industry Job out of Graduate School

Image credit: Me building ETL pipelines diligently (guy in blue in the middle)

The Hierarchy of Analytics

Source: Monica Rogati’s fantastic Medium post “The AI Hierarchy of Needs”

Building Data Foundations & Warehouses

Source: Jeff Hammerbacher’s slide from UC Berkeley CS 194 course

ETL: Extract, Transform, and Load

Source: Vineet Goel’s “Why Robinhood uses Airflow?” Medium Post
Source: Arthur Wiedmer’s workshop from DataEngConf SF 2017

Choosing ETL Frameworks

Source: Marton Trencseni’s comparison between Luigi, Airflow, and Pinball

Two Paradigms: SQL- v.s. JVM-Centric ETL

Video Source: Josh Wills’ Keynote @ DataEngConf SF 2016

Wrapping Up Beginner’s Guide — Part I

Robert Chang

Written by

Data @Airbnb, previously @Twitter. Thoughtfully opinionated, weakly held. Opinions are my own.