Data Engineering with Azure (Part 1): Deploy the Synapse Analytics and Data Factory Service

Ahmad Maulana Malik Fattah
Data Engineering Indonesia
4 min readSep 3, 2022
Photo by Claudio Schwarz on Unsplash

Well, first, what is ‘Data Engineering’?

On the AI hierarchy of needs, data is the most fundamental element. We need good data to build a good and robust AI. And maybe you’ve heard the famous quote:

“There is no good AI model without good data”.

Then, how could we have good data? And, what job role is responsible for this? The answer: Data Engineer.

Data Engineer is someone responsible to ensure other data teams have ready-to-consume data. Their main job is to build and maintain a data pipeline that ingests data from various sources, transforms the data as its need, and stores them in an easy-to-access form. This activity is often called ‘data engineering’.

Take a look at the picture. Data engineering consists of three activities at the bottom of the pyramid.

Photo by Hackernoon

Microsoft Azure provides two main services related to data engineering: Data Factory and Synapse Analytics, these two enable data engineers to build a data pipeline in a cloud environment.

In this article, we are going to build a simple data pipeline with Data Factory. The pipeline would ingest data from an Azure SQL Database and stores it in a SQL Pool. This part would cover how to deploy the required services.

Let’s go through it!

We need three Azure resources to be deployed:

  • SQL Database
  • Synapse Analytics
  • Data Factory

In case you are not already have an Azure account, you could follow the previous article here.

Deploy the SQL Database resource first. We will use this database as the data source. Set the “Authentication method” to “Use SQL Authentication”. The username and password we provided here would be used in authentication in Data Factory later. In the “Additional Settings” section, let’s use a “Sample” data source called provided by Azure.

When the resource has been deployed, the structure of the table would look like this one. Here, ‘nawasenadb’ is the SQL database, ‘nawasena’ is the SQL server, and ‘SalesLT*’ is the tables loaded from Azure sample data.

Azure SQL Database — Tables Structure

Next, deploy a Synapse Analytics resource. Make sure to enable the public endpoint when you configure its network. Here is what the resource group of Synapse Analytics would look like.

Azure Synapse Analytics resource group

Seems good! Now, let’s deploy a SQL Pool inside this Synapse Analytics resource. The SQL Pool would be used as the target storage of the ingested data.

Go to the Azure Synapse Studio and choose the Synapse Workspace you’ve created before. On the sidebar, choose the Data menu and create a SQL Pool from there. Let’s just call it synapsesqlpool.

Create a SQL Pool in Synapse Studio

Next, go to the Azure Portal and use the search bar to find the Data Factory service. Go to the service after you found it. Then, hit the Create button and start setup the service.

In the Basic tab, we could configure the location and the global name for the resource. Note that the global name should be unique globally. When you’ve completed the basic configuration, click the next button.

Data Factory ‘Basic’ setup

Thick the checkbox of Configure Git Later. Then, hit the Review + Create button to deploy the service.

Data Factory ‘Git configuration’ setup

If the deployment is successful, you could Go to the resource. In the resource dashboard, you will see Open Azure Data Factory Studio. Let’s go through that.

Data Factory service’s dashboard
Data Factory Studio launch page.

Data Factory Studio is a web-based service where we could use the complete feature of Data Factory.

Then, what’s next?

Let’s jump to the next part to get deeper into the Data Factory features!

--

--

Ahmad Maulana Malik Fattah
Data Engineering Indonesia

Data Engineer || Love to work with data, both in engineering and analytics parts || s.id/who-is-ammfat