Data Engineering with Azure (Part 2): Build Data Ingestion Pipeline with Azure Data Factory
We have prepared all the resources we need to build our pipeline in Part 1 of this article. In this part, we are going to do the main objective: build a pipeline that ingests data from an Azure SQL Database and moves it into a SQL Pool.
Let’s go tackle it!
Configure Connection to the Data Source
Go to the Data Factory Studio we already visit in the previous part. There, match the Data Factory name so it fits with the resource we’ve deployed. After that, click Continue.
Now you must see the Data Factory Studio Dashboard. Since we want to ingest data, let’s choose the Ingest menu.
Our task is simply copying the data from a data source, then we choose the Built-in copy task. For now, let’s not schedule the pipeline, so make sure to choose Run once now in the schedule configuration. Click Next after you completed the configuration.
A sidebar would appear to set the source connection. We want to ingest data from an Azure SQL Database resource, so you could search for ‘SQL’ and then choose Azure SQL Database.
Next, configure how we want to connect to the resource. Here, let’s use the Azure subscription instead of entering it manually. Make sure that the server, database, and account credentials match the Azure SQL Database resource we’ve deployed in Part 1. In Part 1, the server name is ‘nawasena’, the database name is ‘nawasenadb’, and the username & password is the SQL server account.
When you’ve done with the configuration, do a Test connection to check did we configure it correctly. If the test result is a success, hit the Create button to create the connection to the Azure SQL Database resource.
… Ensure that the server, database, and account credentials match the Azure SQL Database resourceNext, in the Source data store section, set the data source configuration so it matches the Connection we’ve made. We could also choose which column of the SQL table we want to ingest. When you’ve completed it, hit the Next button.
Great! Next, let’s…
Configure Connection to the Destination Data Store
Now, you may see a new pop-up to set the destination data store. Make a New connection and choose Azure Synapse Analytics as the destination.
Configure the connection to the SQL Pool resource we’ve made inside Synapse Analytics before. As well on the source data store, do the Test connection for the destination too. If everything is okay, you could hit the Create button.
You will be directed to map the columns from the source to the destination data store. Ensure all the columns in the data source has a pair in the data destination. Next, in the Column mapping, we could choose whether to convert the data type or not. In this article, let’s uncheck the Type conversion so the data type on the data destination store will remain the same as on the data source. Then, hit the Next button.
Nice, we’re almost done! Last, let’s make the final settings.
Configure Staging Area
Now we are in the Settings section. Set the task and description of the data ingestion task.
We could also configure a staging area between the source and destination resources. The staging area is a middle zone used for data processing during ETL (Extract, Transform, Load) tasks. For our data ingestion pipeline, a staging area is not required. However, let’s keep Enable staging just to get in touch with the feature.
In the Staging account linked service, click New.
Create the staging connection to an Azure Blob Storage resource. Next, expand the Advanced section and set the Copy method to Polybase. Uncheck the Use type default setting and hit the Next button.
Now, after all the configuration, let’s review a Summary and make sure everything has well-configured. Last, hit Next to deploy our pipeline.
Monitor Deployed Pipeline
After the deployment is successfully finished, go to the Monitor section to get an insight into the pipeline status. In the Pipeline runs page, choose our deployed pipeline. We could monitor the activities by clicking the glasses icon on the right side of the activity name.
Nice, that’s all up!
We’ve explored the ingestion features of the Azure Data Factory. It has many other features we could explore and use in the industry. And besides the Data Factory Studio, Azure also provides SDK to interact with Data Factory by using code.
Stay tuned for the next Data Engineering exploration articles and let me know if you have any thoughts about the article!