Copy Data from Blob Storage to Azure SQL Using Azure Data Factory

Sleepy Turtle
4 min readJun 12, 2022

--

In this article, I will demonstrate how to copy data from Blob storage to Azure SQL. It is a common scenario when you have application logs and you want to create an analytics board using those logs.

Once your logs are in SQL, you can create an analytics dashboard on multiple platforms such as PowerBI that uses the SQL database for input. In this post, I will describe how to connect Data factory, SQL database, and storage container resources and copy the data into a SQL file.

Getting Started

  1. The first thing we need is an Azure subscription and a resource group. I have created a temporary resource group called Blog and within it I have created a Data factory, SQL database, and storage container resource.

2. Create Data Factory Resource:

  • Click on Create New Resource in the resource group and search for Data Factory.
Create Data Factory Resource
  • Once you have selected the name and region, you will need to configure the Git repository in order to save your changes and collaborate with others.
  • I am going to skip this for now, but once the Data Factory is created, I will connect it to my GitHub account.
  • Go to the Data Factory resource and click open Azure Data Factory Studio.
Data Factory Resource
  • In order to connect your data factory with GitHub, you need to create a repository for the same user, otherwise, you need to add azure email as a collaborator within the GitHub repository.

3. Create Storage account resource:

  • Click on Create New Resource in the resource group and search for Storage account.
  • Go to the container after you have created your storage account, create a new blogadf container, and upload a sample JSON file.

4. Create SQL database:

  • Click on Create New Resource in the resource group and search for Azure SQL.
  • Look for SQL database and click on create.
Create SQL database

You’ve now built all of the resources needed for this demonstration. It’s now time to link all three resources together.

Theory

Linked Service: The Linked Service serves as a link between the data storage and the data factory. It allows you to connect your data factory to external resources.

Datasets: According to the documentation “A dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs.

Before you create a dataset, you must create a linked service to link your data store to the service. You need to create two linked services, Azure Storage and Azure SQL Database. Next, create two datasets, the blob dataset associated with the Azure Storage Linked Service and the SQL dataset associated with the Azure SQL Linked Service.

Below is screenshot of how you can create datasets and linked services in portal:

Right click on Datasets and select New dataset.

Blob linked service

Select the same process for SQL dataset.

Since your datasets are ready, now we have to create a pipeline which consists of a Copy data activity.

Now click on Pipelines and create a new one. Click on Copy data activity and drag it into the pipeline. The source for Copy data activity is Blob dataset.

In the sink, select your SQL dataset, where data will be copied to.

Finally, map the JSON values to the SQL columns and click on save on the top.

And now it’s time to test the pipeline. To test run the pipeline, click on the debug option on the top.

After the pipeline runs successfully, go to the SQL database and check if the data has been copied.

That’s it for today. Next time we will work on data flow activity.

--

--