Data Engineering with Azure (Part 2): Build Data Ingestion Pipeline with Azure Data Factory

Published in

Data Engineering Indonesia

5 min readSep 3, 2022

We have prepared all the resources we need to build our pipeline in Part 1 of this article. In this part, we are going to do the main objective: build a pipeline that ingests data from an Azure SQL Database and moves it into a SQL Pool.

Let’s go tackle it!

Configure Connection to the Data Source

Go to the Data Factory Studio we already visit in the previous part. There, match the Data Factory name so it fits with the resource we’ve deployed. After that, click Continue.

Now you must see the Data Factory Studio Dashboard. Since we want to ingest data, let’s choose the Ingest menu.

Choose Ingest menu on Data Factory Studio

Our task is simply copying the data from a data source, then we choose the Built-in copy task. For now, let’s not schedule the pipeline, so make sure to choose Run once now in the schedule configuration. Click Next after you completed the configuration.

A sidebar would appear to set the source connection. We want to ingest data from an Azure SQL Database resource, so you could search for ‘SQL’ and then choose Azure SQL Database.

Set a new connection to Azure SQL Database

Next, configure how we want to connect to the resource. Here, let’s use the Azure subscription instead of entering it manually. Make sure that the server, database, and account credentials match the Azure SQL Database resource we’ve deployed in Part 1. In Part 1, the server name is ‘nawasena’, the database name is ‘nawasenadb’, and the username & password is the SQL server account.

Configure data source connection credential

When you’ve done with the configuration, do a Test connection to check did we configure it correctly. If the test result is a success, hit the Create button to create the connection to the Azure SQL Database resource.

Test and create a connection to the data source

… Ensure that the server, database, and account credentials match the Azure SQL Database resourceNext, in the Source data store section, set the data source configuration so it matches the Connection we’ve made. We could also choose which column of the SQL table we want to ingest. When you’ve completed it, hit the Next button.

Great! Next, let’s…

Configure Connection to the Destination Data Store

Now, you may see a new pop-up to set the destination data store. Make a New connection and choose Azure Synapse Analytics as the destination.

Configure the connection to the SQL Pool resource we’ve made inside Synapse Analytics before. As well on the source data store, do the Test connection for the destination too. If everything is okay, you could hit the Create button.

Configure data destination connection credential

Test and create a connection to the data destination

You will be directed to map the columns from the source to the destination data store. Ensure all the columns in the data source has a pair in the data destination. Next, in the Column mapping, we could choose whether to convert the data type or not. In this article, let’s uncheck the Type conversion so the data type on the data destination store will remain the same as on the data source. Then, hit the Next button.

Map columns between the data source and destination

Nice, we’re almost done! Last, let’s make the final settings.

Configure Staging Area

Now we are in the Settings section. Set the task and description of the data ingestion task.

We could also configure a staging area between the source and destination resources. The staging area is a middle zone used for data processing during ETL (Extract, Transform, Load) tasks. For our data ingestion pipeline, a staging area is not required. However, let’s keep Enable staging just to get in touch with the feature.

In the Staging account linked service, click New.

Set the task, description, and staging area

Create the staging connection to an Azure Blob Storage resource. Next, expand the Advanced section and set the Copy method to Polybase. Uncheck the Use type default setting and hit the Next button.

Now, after all the configuration, let’s review a Summary and make sure everything has well-configured. Last, hit Next to deploy our pipeline.

Monitor Deployed Pipeline

After the deployment is successfully finished, go to the Monitor section to get an insight into the pipeline status. In the Pipeline runs page, choose our deployed pipeline. We could monitor the activities by clicking the glasses icon on the right side of the activity name.

Nice, that’s all up!

We’ve explored the ingestion features of the Azure Data Factory. It has many other features we could explore and use in the industry. And besides the Data Factory Studio, Azure also provides SDK to interact with Data Factory by using code.

Stay tuned for the next Data Engineering exploration articles and let me know if you have any thoughts about the article!