7 End to End Data Engineering Projects That Sets you Apart from the Rest

4 min readDec 1, 2023

In the ever-evolving field of data engineering, standing out from the crowd is about demonstrating practical skills and innovative thinking. For aspiring data engineers, this means not just understanding the theory but also showcasing your ability to handle real-world data challenges from start to finish. Here are seven end-to-end data engineering projects that can significantly boost your portfolio and set you apart from the competition and give you unfair advantage over others.

1. Realtime Change Data Capture Streaming | End to End Data Engineering Project

In this video, we dive deep into the world of Change Data Capture (CDC) and how it can be implemented for real-time data streaming using a powerful tech stack. You will use the integration of technologies like Docker, Postgres, Debezium, Kafka, Apache Spark, and Slack to create an efficient and responsive data pipeline.

You will learn how to:

Configure and Save data into PostgreSQL database
Configure and capture changes on PostgreSQL with Debezium
Stream data into kafka
Add a streaming layer on top of kafka with Apache Spark, Flink, Storm or ksqlDB

2. Reddit Data Pipeline Engineering | AWS End to End Data Engineering

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

With this project, you will learn:

Apache Airflow with Celery and Postgresql
Docker
Using Reddit API
AWS Glue
AWS S3
AWS Athena and
Redshift Data Warehousing

3. Realtime Socket Streaming

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data acquisition, processing, sentiment analysis with ChatGPT, production to kafka topic and connection to elasticsearch.

This project showcases how to:

Transfer Data over TCP/IP socket connection
Use ChatGPT4 with Apache Spark
Streaming and Visualisation with Apache Kafka
Data replication with elasticsearch
Data visualisation on Kibana (and other tools like PowerBI, Tableau, etc)

4. Sales Analytics with Apache Flink | End to End Data Engineering Project

This repository contains an end-to-end data engineering project using Apache Flink, focused on performing sales analytics. The project demonstrates how to ingest, process, and analyze sales data, showcasing the capabilities of Apache Flink for big data processing.

You will learn how to:

Use Apache Flink for Data processing
Setup a new project with Apache Flink
Different Aggregation techniques with Apache Flink
Source-Sink relationship with Apache Flink

5. End to End Data Engineering On Azure

This project provides an end-to-end data processing and visualization of visa numbers in Japan using Azure Cloud Spark Cluster with PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.

You will learn how to:

Use docker compose to setup spark cluster on Azure
Understand the master-worker relationship with Apache Spark
Write custom Spark SQL scripts for data processing
Write custom Spark sink for data output

6. Modern Data Engineering with DBT (Data Build Tool) and BigQuery

This project showcases a deep dive into the powerful combination of DBT and BigQuery, the game-changers in modern data engineering.

You will learn how to:

Set up DBT and BigQuery from Scratch
Linking DBT and BigQuery
Write SQL-based Transformations with DBT
Convert Tables to Views with DBT and vice versa
Seed data to BigQuery with DBT
Write unit tests with DBT
Generate Documentation with DBT

Before we go into the last project…

If you’ve come this far, consider giving me a FOLLOW, LIKE, and SUBSCRIBE to the Youtube channel for more contents like this.

CodeWithYu

Data Engineer | Solution Architect | Open Source. Welcome to CodeWithYu, your gateway to the world of data engineering…

www.youtube.com

7. AWS EMR (Elastic Map Reduce) For Data Engineers

This project demonstrates the use of Amazon Elastic Map Reduce (EMR) for processing large datasets using Apache Spark. It includes a Spark script for ETL (Extract, Transform, Load) operations, AWS command line instructions for setting up and managing the EMR cluster, and a dataset for testing and demonstration purposes.

You will learn how to:

Setup AWS EMR from scratch
Create Spark jobs to be submitted to the cluster
Submit spark jobs to the cluster on AWS
Learn the combination of the different technologies on AWS Cloud

Code to All the Videos are available on my Github:

airscholar - Overview

Building datamasterylab.com. airscholar has 109 repositories available. Follow their code on GitHub.

github.com

Checkout:

5 End-To-End Data Engineering Projects for FREE

Data engineering is the backbone of the modern data-driven world. It’s the meticulous process of designing and building…

medium.com

And that’s a wrap!

If you are interested in any of the topics below:

— Python

— Data Engineering

— Data Science

— SQL

— Cloud Platforms (AWS/GCP/Azure)

— Machine Learning

— Artificial Intelligence

Like and Follow me on all platforms:

Github: airscholar
Twitter: @YusufOGaniyu
Linkedin: Yusuf Ganiyu
Youtube: CodeWithYu
Medium: Yusuf Ganiyu

I regularly share daily contents on Linkedin, X, Medium & YouTube.

More courses available on datamasterylab.com

Yusuf Ganiyu - Medium

Read writing from Yusuf Ganiyu on Medium. Data Engineer | Building datamasterylab.com | LinkedIn…

medium.com

7 End to End Data Engineering Projects That Sets you Apart from the Rest

1. Realtime Change Data Capture Streaming | End to End Data Engineering Project

2. Reddit Data Pipeline Engineering | AWS End to End Data Engineering

3. Realtime Socket Streaming

4. Sales Analytics with Apache Flink | End to End Data Engineering Project

5. End to End Data Engineering On Azure

6. Modern Data Engineering with DBT (Data Build Tool) and BigQuery

Before we go into the last project…

If you’ve come this far, consider giving me a FOLLOW, LIKE, and SUBSCRIBE to the Youtube channel for more contents like this.

CodeWithYu

Data Engineer | Solution Architect | Open Source. Welcome to CodeWithYu, your gateway to the world of data engineering…

7. AWS EMR (Elastic Map Reduce) For Data Engineers

airscholar - Overview

Building datamasterylab.com. airscholar has 109 repositories available. Follow their code on GitHub.

5 End-To-End Data Engineering Projects for FREE

Data engineering is the backbone of the modern data-driven world. It’s the meticulous process of designing and building…

And that’s a wrap!

Yusuf Ganiyu - Medium

Read writing from Yusuf Ganiyu on Medium. Data Engineer | Building datamasterylab.com | LinkedIn…

Written by Yusuf Ganiyu