Azure End-To-End Data Engineering Project: Harnessing the GitHub Student Pack for Students and Beginners

Kaouthar EL BAKOURI
12 min readNov 8, 2023

--

Bring your Hot Chocolate and Enjoy Reading ๐Ÿต

๐Ÿš€ Hey, future data wizards and Azure enthusiasts! Ready to dive into the captivating world of Azure data engineering but not sure where to begin? I come bearing fantastic news! Thanks to the GitHub Student Pack, you can open the door to a full year of thrilling Azure adventures, no strings attached.

This guide is tailored specifically for students and Azure beginners. Iโ€™ll take you on an exhilarating journey through an entire end-to-end Azure data engineering project. Even better, Iโ€™ll reveal the secrets to securing your very own Azure account for a full year by leveraging the incredible benefits of the GitHub Student Pack. Just imagine โ€” a whole year dedicated to hands-on learning and experimenting with Azureโ€™s potent data tools, all right at your fingertips!

What you will learn:

  • Access Azure via GitHub Student Pack for One Year.
  • Data Ingestion with Azure Data Factory (ADF).
  • Exploring Azure Data Lake Storage (ADLS) and Its Capabilities.
  • Performing Data Transformation and Advanced Analytics with Databricks.
  • Using Azure Synapse Analytics for Data Warehousing and Analytics Capabilities.
  • Harnessing Power BI for Data Visualization and Business Intelligence.

Unlocking Azure with the GitHub Student Pack:

  • Go to the GitHub Education page (GitHub Student Developer Pack โ€” GitHub Education) and sign in or create an account if you havenโ€™t already.
  • Confirm your student status by providing necessary identification.
  • Once verified, go to the โ€œGitHub Student Developer Packโ€ section and browse the available offers.
  • Search for the Azure benefit within the pack.
  • You will likely be directed to the Azure portal. Follow the instructions provided to activate your Azure subscription using the GitHub Student Pack offer.
  • Once your Azure subscription is activated, log in to the Azure portal using your newly acquired credentials and start exploring the wide range of services and tools available.

General overview of the project:

Beginning with the 2021 Olympics data stored in GitHubโ€™s CSV files (accessible here: kaoutharElbakouri/2021-Olympics-in-Tokyo-Data (github.com)), this project employs Azure Data Factory (ADF) to smoothly bring this information into the raw layer of Azure Data Lake Storage (ADLS). Moving forward, Azure Databricks takes the lead, refining the dataset and storing the processed data in ADLSโ€™s transformed layer. Azure Synapse Analytics steps in, primarily for robust data warehousing and detailed analysis, allowing deeper exploration and insights. Finally, Power BI visualizes these insights, marking the completion of a step-by-step process and providing a rich and comprehensive view of the 2021 Olympics dataset.

Hereโ€™s the diagram which shows the workflow of the project weโ€™re going to build:

Diagram Project Overview
Diagram Project Overview

We are going to split this project into following sections:

1๏ธโƒฃ. โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ฆโ€‹ โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡บโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ฌโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡บโ€‹โ€‹๐Ÿ‡ตโ€‹โฆ‚

The initial step under the Azure for Students subscription is to create a resource group, specifying its name and region.

create an Azure resource group

This is our created resource group that will acts as a container to hold related Azure resources for efficient management and organization:

2๏ธโƒฃ. โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ฆโ€‹ โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฌโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡บโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡นโ€‹โฆ‚

Building upon the resource group, the next step is to create an Azure Data Lake Storage ADLS (a storage account with namespace hierarchy enabled).

  • Select the created resource group.
  • Click on the โ€œCreate a resourceโ€ button within the resource group.
  • In the search bar, type โ€œStorage accountโ€ and select the option for creating a new storage account.
  • Enter the necessary details for the storage account, such as:

โ˜‘๏ธ Subscription: Choose your Azure for Students subscription.

โ˜‘๏ธ Resource group: Select created resource group

โ˜‘๏ธ Storage account name: Enter โ€œtokyoolympicdatastorage.โ€

โ˜‘๏ธ Region: Choose the appropriate region (e.g., West Europe).

โ˜‘๏ธ Performance, Replication, and Access tier: Select as per your requirements.

โ˜‘๏ธ Hierarchical namespace: Enable this option to create Azure Data Lake Storage (ADLS).

Azure Data Lake Storage

The next step is to create a container. Inside the ADLS account, locate and select the โ€œContainersโ€ section, Click on โ€œNew Containerโ€ and enter the desired name for the container, for example, โ€œtokyoolympicdataโ€, then click create.

ADLS container

The next step is to create two directories within this container. Navigate to the specific container, click on the โ€œAdd Directoryโ€. Enter the directory name as โ€œraw-dataโ€ and confirm the creation. Click on โ€œAdd Directoryโ€ again and enter the directory name as โ€œtransformed-dataโ€ and confirm the creation. This is a common way to organize data.

ADLS container Directories

3๏ธโƒฃ. โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฌโ€‹ โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฆโ€‹ โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฌโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ณโ€‹โฆ‚ โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ซโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฌโ€‹ โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ปโ€‹ โ€‹๐Ÿ‡ซโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ธโ€‹ โ€‹๐Ÿ‡ซโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ฒโ€‹ โ€‹๐Ÿ‡ฌโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ญโ€‹โ€‹๐Ÿ‡บโ€‹โ€‹๐Ÿ‡งโ€‹ โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ดโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡ธโ€‹ โœโ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ผโ€‹-โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฆโ€‹โœ โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡พโ€‹ ๐Ÿ‡บโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฌโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ซโ€‹:

Now that weโ€™ve established the โ€œraw-dataโ€ directory, weโ€™re set to initiate the ingestion process using Azure Data Factory ADF. This involves transferring data from CSV files stored in GitHub to the Azure Data Lake Storage (ADLS).

To do so, we need to create an ADF resource:

โ˜‘๏ธ From the Azure portal, in the search bar, type โ€œData Factoryโ€ and select โ€œData Factoryโ€ from the search results.

โ˜‘๏ธ Click โ€œCreateโ€ to begin the setup of a new Azure Data Factory resource.

โ˜‘๏ธ Enter the necessary information:

  • Name: Provide a unique name for your ADF resource.
  • Subscription: Choose your subscription (Azure for Students).
  • Resource Group: Select the already created resource group.
  • Version: Choose the ADF version you want to use.

โ˜‘๏ธ Once everything is validated, click โ€œCreateโ€ to initiate the provisioning of your new Azure Data Factory resource.

Here is our created ADF resource:

ADF

Once your Azure Data Factory (ADF) resource is created, you can begin working on it by launching ADF Studio.

Our data source in GitHub comprises five CSV files.

Data source

Letโ€™s concentrate on ingesting the โ€œAthletesโ€ CSV file from our GitHub data source into Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF). The same steps will be replicated for the ingestion of the other files as well.

โš’๏ธ Create the necessary Linked services:

To effectively handle each CSV file, itโ€™s essential to create two critical connections: a source linked service and a destination linked service. Our journey begins by crafting a linked service that establishes the crucial connection between Azure Data Factory (ADF) and the diverse data sources involved.

Under the ADF (Azure Data Factory) Manage hub, you can find the option for โ€œLinked Services.โ€ By clicking โ€œNewโ€ and choosing โ€œHTTPโ€ as the connection type (our data is on GitHub) youโ€™ll initiate the process of creating a new linked service:

Athletes HTTP Linked Service

Once youโ€™ve selected the โ€œHTTPโ€ connection type, proceed by naming the linked service, providing the base URL, selecting โ€œAnonymousโ€ for authentication, and opting for โ€œAutoResolveIntegrationRuntimeโ€ for the integration runtime setting.

For the Base URL, select Athletes csv file in github then click raw data and copy the link.

Athletes Base URL

Letโ€™s create the Azure Data Lake storage linked service by following the same steps as for the previous linked service but instead of HTTP for connection type, choose Azure Data Lake Storage Gen2 and provide the necessary details:

ADLS Linked Service

Here are our created linked services:

Linked services

โš’๏ธ Create the Datasets:

Now that our source and destination linked services have been created, we can create the Datasets that will be referenced by the ADF activities. For that, under the ADF Author Hub, locate the Datasets section and create new Dataset.

For the source Dataset, we need to choose HTTP for data store type.

Source Dataset

Then choose CSV for the format type:

Then name your Dataset and specify the already created Linked service (do not forget to publish the Dataset):

Now letโ€™s create the destination Dataset with the same steps but with ADLS as data store type instead of HTTP and set the required properties for our Dataset.

Now that our two Datasets have been created, we can proceed with creating the ADF activity to do the ingestion from source to destination.

โš’๏ธ ADF Ingestion Activity:

Under the Author Hub, locate the pipelines section and create new pipeline, then select the copy activity from the Activities section.

Athletes Ingestion Pipeline

The next step is to configure the source and sink for our activity:

Activity source and Sink

Now we can start debugging our pipeline by clicking on Debug.

our pipeline succeeded and we can see Athletes data loaded to ADLS container (raw-data).

After following the same steps for the remaining CSV files, we need to have these CSV files located in raw-data folder within our ADLS container:

ADLS raw-data

4๏ธโƒฃ. โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฆโ€‹ โ€‹๐Ÿ‡ตโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฌโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฉโ€‹ โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ซโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฒโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ณโ€‹ โ€‹๐Ÿ‡ผโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ญโ€‹ โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡งโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ฐโ€‹โ€‹๐Ÿ‡ธโ€‹โฆ‚ โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ญโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฌโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡ธโ€‹ โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ซโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฒโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ฉโ€‹ โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡พโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ท

Once data has been ingested into the raw layer of Azure Data Lake Storage (ADLS), the next stage involves processing this data using Databricks and then storing it in the transformed layer within ADLS.

For that we need first to create a Databricks resource. To do it, from Azure portal Search for โ€œDatabricksโ€ and select โ€œAzure Databricksโ€ from the available services. Click โ€œCreateโ€ and fill in the required information, such as resource group, workspace name, region, and pricing tier.

Azure Databricks

Once created, click on Launch Workspace and you will be directed to the Databricks workspace. And once Databrick workspace is launched, locate the compute section (This is where you manage clusters) and click on Create to set up a cluster based your configurations.

For me, a cluster is already created and ready to execute Spark code:

Cluster

Now, we can write simple code to get data from ADLS raw-data layer, transform it and put it into ADLS transformed-data layer. To do so, we need to create a notebook by clicking โ€˜New Notebookโ€™ and make sure the created Spark cluster is selected:

Databricks Notebook

We have to create the connection between Databricks and ADLS so we can easily access the data. For that, from the Azure portal search for the App Registrations and click on โ€˜New Registrationโ€™ to register a new application by providing the name of the app:

app registrations

This is our registered app. From that we need the Client ID and Tenant ID:

Within this application locate the โ€˜Certificates & secretsโ€™ section and create a secret that we will need for the connection between Databricks and ADLS:

Secret Key

keep the value of the created secret key for further usage:

Now, we can use the three credentials (client ID, Tenant ID and value of secret key) to connect Databricks to ADLS.

configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "34f9f5d3-593c-4389-a3af-78adfecc736b", #Client ID
"fs.azure.account.oauth2.client.secret": 'yaf8Q~FEMX87TQw5a03cSNm6CrOL2vGxQ-qh8dfM', #Value of secret key
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/eb12f8ec-35f2-415d-97bf-0e34301876a7/oauth2/token"} #Tenant ID


dbutils.fs.mount(
source = "abfss://tokyoolympicdata@tokyouolympicdatastorage.dfs.core.windows.net", # contrainer@storageacc
mount_point = "/mnt/tokyoolymic",
extra_configs = configs)

And to have the permission to access the files stored in ADLS, we need to explicitly giving this access using the credentials of the registered application. So, we need to give permission to this app to access the Data Lake. Under our ADLS container, Click on Access Control (IAM) then click add role assignement:

app access

Select the โ€˜Storage Blob Data contributorโ€™ role, click next and Click on Select members and select the registered app (app1):

To check that we were able to successfully create the connection to ADLS, we can check with this code within our notebook:

%fs
ls "/mnt/tokyoolymic"

Now, we can read our CSV files:

athletes = spark.read.format("csv").option("header","true").option("inferSchema","true").load("/mnt/tokyoolymic/raw-data/athletes.csv")
coaches = spark.read.format("csv").option("header","true").option("inferSchema","true").load("/mnt/tokyoolymic/raw-data/Coaches.csv")
entriesgender = spark.read.format("csv").option("header","true").option("inferSchema","true").load("/mnt/tokyoolymic/raw-data/EntriesGender.csv")
medals = spark.read.format("csv").option("header","true").option("inferSchema","true").load("/mnt/tokyoolymic/raw-data/Medals.csv")
teams = spark.read.format("csv").option("header","true").option("inferSchema","true").load("/mnt/tokyoolymic/raw-data/Teams.csv")

we can print the athletes Data:

We can go over the Apache Spark functions and do whatever transformations we need on our data. (the goal here is not to do the transformations ๐Ÿ˜‡but just to show how we can use Databricks, build the connectionโ€ฆ).

The next step is to write the (transformed ๐ŸŒŸ) data to the transformed-data layer of ADLS:

athletes.repartition(1).write.mode("overwrite").option("header",'true').csv("/mnt/tokyoolymic/transformed-data/athletes")
coaches.repartition(1).write.mode("overwrite").option("header","true").csv("/mnt/tokyoolymic/transformed-data/coaches")
entriesgender.repartition(1).write.mode("overwrite").option("header","true").csv("/mnt/tokyoolymic/transformed-data/entriesgender")
medals.repartition(1).write.mode("overwrite").option("header","true").csv("/mnt/tokyoolymic/transformed-data/medals")
teams.repartition(1).write.mode("overwrite").option("header","true").csv("/mnt/tokyoolymic/transformed-data/teams")

Here is the folders that contain the written data to the storage within the metadata as well:

For example, within athletes folder we have (the first three files contain metadata and the last one is for data):

Now that our data is transformed ๐Ÿ˜œ and stored into the transformed-data layer of ADLS, the next step is to load it into Azure Synapse Analytics where we can use SQL to do our analysis to get insights from the data or build a dashboard on top of it ๐ŸŒŸ

5๏ธโƒฃ. โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฌโ€‹ โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฆโ€‹ โ€‹๐Ÿ‡ซโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ฒโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฟโ€‹โ€‹๐Ÿ‡บโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฆโ€‹ โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฐโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฌโ€‹โ€‹๐Ÿ‡ชโ€‹ (โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡ธโ€‹) โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ดโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฟโ€‹โ€‹๐Ÿ‡บโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡พโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ตโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡พโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹-๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ธโ€‹ โ€‹๐Ÿ‡ซโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ทโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ปโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ฉโ€‹ โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฆโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡พโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ธโ€‹:

From Azure portal, search for Azure Synapse Analytics and click on โ€˜Createโ€™ and provide the required details:

Synapse Analytics

Once it is complete, we can move forward to load our data into Synapse Analytics. To do so, open Synapse Studio and click on the Data hub then click the + button and choose Lake database:

Create the Database and name it:

Once the database is created, we can create the table and load the data (choose create table from data lake):

And provide the table details: name, linked service (choose the default one) and for input file just navigate to the file that contains athletes data inside the transformed-data layer of ADLS (make sure to choose that contains the data and not the metadata).

Upon following the identical procedure for additional data files, our Synapse database now contains the following tables as expected:

These tables provide the foundation for running SQL queries to perform analysis and derive insights:

6๏ธโƒฃ. โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡จโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฌโ€‹ โ€‹๐Ÿ‡ตโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ผโ€‹โ€‹๐Ÿ‡ชโ€‹โ€‹๐Ÿ‡ทโ€‹ โ€‹๐Ÿ‡งโ€‹โ€‹๐Ÿ‡ฎโ€‹ โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ดโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฟโ€‹โ€‹๐Ÿ‡บโ€‹โ€‹๐Ÿ‡ทโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡พโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ตโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ชโ€‹ โ€‹๐Ÿ‡ซโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ทโ€‹ โ€‹๐Ÿ‡ฉโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฆโ€‹ โ€‹๐Ÿ‡ปโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡บโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ฟโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡นโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ดโ€‹โ€‹๐Ÿ‡ณโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฉโ€‹ โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ณโ€‹โ€‹๐Ÿ‡ฆโ€‹โ€‹๐Ÿ‡ฑโ€‹โ€‹๐Ÿ‡พโ€‹โ€‹๐Ÿ‡ธโ€‹โ€‹๐Ÿ‡ฎโ€‹โ€‹๐Ÿ‡ธโ€‹

The final step involves establishing a connection between Power BI and Synapse, enabling the use of Power BI for comprehensive data analysis and visualization based on the Synapse database tables.

Open Microsoft Power BI Desktop, click on โ€˜Get Dataโ€™ and search for Azure Synapse Analytics SQL as a data source then connect:

Go to Synapse Analytics Workspace overview and copy the โ€˜Serverless SQL endpointโ€™:

In Power BI, insert the copied string for the server name and Connect:

Finally, we can select the tables we want, load into Power BI then start our Analysis and Dashboarding works ๐ŸŒŸ

๐“‚๐’ถ๐’น๐‘’ ๐“Œ๐’พ๐“‰๐’ฝ ๐ฟ๐‘œ๐“‹๐‘’ ๐ต๐“Ž ๐’ฆ๐’ถ๐‘œ๐“Š๐“‰๐’ฝ๐’ถ๐“‡ ๐ธ๐ฟ ๐ต๐’œ๐’ฆ๐’ช๐’ฐ๐‘…๐ผ ๐Ÿ‘ฉโ€๐Ÿ’ป

--

--