Unlocking Insights: Azure Synapse Analytics Bike Share Data Warehouse Project
I recently completed Data Engineering with Azure specialization on Udacity. It took me three months to finish it. It was very challenging but also a very rewarding experience. I learned a lot and upgraded my Data engineering skills and Cloud knowledge.
In this blog I would like to share my experiences with anyone who might be interested in this specialization. I sincerely recommend this course and hope that this blog will be valuable to other people.
You can find the code related to this project in my GitHub repository.
Project introduction
This was the first big project in this course: Azure Synapse Analytics Bike Share Data Warehouse Project
The project offers a great chance to work with real-world data and build a complete data warehouse using Azure Synapse Analytics.
We’ll dive into the interesting process of designing a star schema and organising Divvy’s bike share data into a structured, usable format.
Project Overview
The project is all about Divvy, a popular bike-sharing program in Chicago, Illinois. People can easily grab bikes from stations using kiosks or a mobile app. Divvy shares anonymous data about bike trips, forming the foundation of our project. Our main goal was to create a solid data warehouse using Azure Synapse Analytics, a key tool in Microsoft’s Azure lineup.
Project Tasks
Task 1: Setting Up Azure Resources
We start by creating necessary Azure tools, like an Azure Database for PostgreSQL and an Azure Synapse workspace. The built-in serverless SQL pool and database within the Synapse workspace are vital for this.
Task 2: Creating a Star Schema
This task is about designing a clear structure based on the given relational schema and business needs. The star schema includes fact tables related to trip and payment facts, accompanied by relevant dimension tables.The schema covers trip duration, rider age, payment amount, and other important details.
Task 3: Getting Data Ready in PostgreSQL
To mimic a real production setup, we prepare data in PostgreSQL using a Python script. This sets the stage for further analysis and processing.
Task 4: Moving Data to Azure Blob Storage
Using the Azure Synapse workspace, we employ an ingest wizard to move data from PostgreSQL to Azure Blob Storage. The resulting text files in Blob Storage lay the foundation for our data warehouse.
Task 5: Loading Data into External Tables
This task involves moving data from Blob storage into staging tables within the data warehouse. The script-generating function makes this transition smooth, ensuring the data is ready for further shaping.
Task 6: Shaping Data into the Star Schema
Transforming the staged data into the desired star schema is the final critical step. We achieve this using SQL scripts and operations like CREATE EXTERNAL TABLE AS SELECT (CETAS), shaping the data into the intended format that matches our star schema.
Conclusion
The Bike Share Data Warehouse Project offers an exciting journey into the world of data engineering and analytics using Azure Synapse Analytics. Completing this project gave me valuable experience in designing data solutions and extracting meaningful insights.
For junior data engineers and new developers like me, the Azure Synapse Analytics Bike Share Data Warehouse Project was incredibly helpful. It helped me to build essential skills and gain practical experience. The project allowed me to dive into the details of data warehousing, from creating effective schemas to turning raw data into useful information.
Working with real data, simulating actual environments, and using Azure Synapse Analytics not only improved my technical abilities but also deepened my understanding of cloud-based data solutions.
Last but not the least, after completing this course you will also be rewarded with an official certificate from Udacity, which you can use to showcase your skills in your resume and your LinkedIn profile.
If you liked this blog you can fallow me on Twitter/X to get notified about new blogs like this one or you can subscribe on my newsletter for free.