Building a Data Engineering Project: Part 1
“You can have data without information, but you cannot have information without data.” — Daniel Keys Moran
On August 18 and 25, 2022, I hosted a two-part series for the Women Who Code Python track: Building a Data Engineering Portfolio Project.
A big shoutout to Stephanie Rideout for arranging the series and hosting the talks.
The contents of the series included a project walkthrough and the reviewing concepts, such as data modeling, Apache Spark, and Apache Airflow.
The slides linked here.
The YouTube recording is linked at the end of the article.
In this article, I will highlight the various details from the series, as well as add information that I was unable to cover during the live events.
First, a little backstory…
While learning the concepts of data engineering, I came across an issue. I had put in the effort to learn SQL, data modeling, and Python concepts. I gained enough knowledge to talk about these concepts in individuality. But when it came time to bring these concepts together into a coherent idea, to create a project showcasing my learnt skills, I was struggling. I did not know how to get started, or come up with ideas for projects that would help my endeavor…