Building Serverless Data Lake Pipeline on AWS

Joey Yi Zhao
Jul 31 · 8 min read

General Data Lake Pipeline

Data lake pipeline

What is the challenge

What is Glue

How does Glue ETL work

Four Steps

How do I discover your data?

Glue Data Category: Crawler

Crawler classifier

Crawler partition

How do I build the ETL

Keep data in sync

Sample Code


References

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade