A Beginner’s Guide to Data Engineering — Part I

Data Engineering: The Close Cousin of Data Science

Robert Chang
Jan 8, 2018 · 14 min read
Image for post
Image for post
Image credit: A beautiful former slaughterhouse / warehouse at Matadero Madrid, architected by Iñaqui Carnicero

Motivation

Organization of This Beginner’s Guide

My First Industry Job out of Graduate School

Image for post
Image for post
Image credit: Me building ETL pipelines diligently (guy in blue in the middle)

The Hierarchy of Analytics

Image for post
Image for post
Source: Monica Rogati’s fantastic Medium post “The AI Hierarchy of Needs”

Building Data Foundations & Warehouses

Image for post
Image for post
Source: Jeff Hammerbacher’s slide from UC Berkeley CS 194 course

ETL: Extract, Transform, and Load

Image for post
Image for post
Source: Vineet Goel’s “Why Robinhood uses Airflow?” Medium Post
Source: Arthur Wiedmer’s workshop from DataEngConf SF 2017

Choosing ETL Frameworks

Image for post
Image for post
Source: Marton Trencseni’s comparison between Luigi, Airflow, and Pinball

Two Paradigms: SQL- v.s. JVM-Centric ETL

Image for post
Image for post
Video Source: Josh Wills’ Keynote @ DataEngConf SF 2016

Wrapping Up Beginner’s Guide — Part I

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store