7 End-To-End Data Engineering Projects for FREE

Darshil Parmar
3 min readDec 12, 2022

--

Here is the list of projects that you can do to kick-start your career in Data Engineering for FREE!

Photo by Christina @ wocintechchat.com on Unsplash

Data Engineering is a booming career right now and one of the biggest problem people face is unable to apply things learned from the courses.

You learned Python, SQL, Spark, and Airflow but you don’t know how to connect these components together to build a final project.

I have created 7 End-To-End Data Engineering projects that can help you to understand the bigger picture behind these systems.

  1. Building Data Model and Writing ETL Job
    Data modeling is an essential part of Data Engineering (DO NOT SKIP THIS)

What will you learn?
✅ Python
✅ SQL
✅ Building Data Models
✅ Basics of DBMS
✅ Writing ETL Job
✅ Querying Data Programmatically
✅ PostgreSQL

2. Build ETL Pipeline on AWS Cloud using Glue, Athena, Lambda, and Redshift

What will you learn?
✅ Python
✅ SQL
✅ Cloud Computing Basics
✅ AWS Services — Athena, Glue, Redshift, S3, IAM
✅ Creating Data Pipeline

3. Covid Data Analysis Project using Python, SQL and AWS

This will be your first end-to-end Data Engineering project on Covid-19 Data

What will you learn?
✅ Python
✅ SQL
✅ Building Data Model
✅ AWS Services — Athena, Glue, Redshift, S3, IAM
✅ Creating Data Pipeline
✅ PostgreSQL

4. YouTube Data Analysis (End-To-End Data Engineering Project)

This is a 3-hour long project where you will execute a complete Data Engineering project

What will you learn?
✅ Python and PySpark
✅ SQL
✅ How to understand the business problem
✅ AWS Services — Athena, Glue, Redshift, S3, IAM, Lambda, Quicksight
✅ Building Data Pipeline and Scheduling it

5. Twitter Data Pipeline using Airflow and AWS

If you are someone who wants to learn the basics about Airflow and how to build a data pipeline this project is for you.

What will you learn?
✅ Python
✅ Basics of Airflow
✅ Working with Twitter Data and Package — Tweepy
✅ Python Package — Pandas
✅ Writing ETL job and storing data on S3

6. Extract Data From Amazon Website using Python

Learn how to extract data from one source and build a dataset from it.

What will you learn?
✅ HTML Basics
✅ Web Scraping fundamentals
✅ Python
✅ Packages (BeautifulSoup, Pandas, Numpy)
✅ How to build a CSV file

7. Stock Market Real-Time Data Analysis using Kafka, AWS, and Python

What will you learn?
✅ Build a Real-Time simulation app using Python
✅ Understand the basics of Kafka (Broker, Producer, Consumer, Zookeeper, and many more)
✅ How install Kafka on EC2 (or any VM)
✅ Write Producer and Consumer code in Python
✅ Generate a real-time streaming pipeline and store data in S3
✅ Analyze Data in Real-Time using Athena

You can follow me for more, planning to write more detailed content on medium.

Thank you for reading :)

--

--

Darshil Parmar

Data Engineering | Building @DataVidhya | YouTube (120k+)