Setting Up Azure Data Factory

3 min readMay 18, 2024

Azure Data Factory (ADF) is a powerful service for moving and transforming data in the cloud. In this guide, I’ll demonstrate how to set up an Azure Data Factory instance and create a data pipeline to copy data from an input Blob Storage container to an output Blob Storage container.

This example covers the basics of setting up a pipeline and can be extended to move or transform data between various sources and destinations.

Lab Preparation Steps

1. Create a Resource Group

Go to Azure Portal > Resource groups > Add.
Choose your subscription and name your resource group (e.g., DemoDataFactory).
Click “Review + create” and then “Create.”

2. Create a Blob Storage Account

Go to your Resource group > Add > Type “Storage account” > Create.
Fill in the details and name your storage account (e.g., blobstorage123).
Click “Review + create” and then “Create.”

3. Create Blob Containers

Go to your Blob Storage account > Containers > Add.
Create a container named “source” and another named “destination.”

4. Upload a JSON File to the Source Container

Go to the “source” container > Upload > Select a JSON file (e.g., demo.json) > Upload.

Creating the Data Factory

1. Add Data Factory to the Resource Group

Go to your Resource group > Add > Type “Data Factory” > Create.
Fill in the details, name your data factory (e.g., DataFactory123), and click “Review + create.”
Configure Git later if you don’t have a Git repository > Review + create > Create.

Setting Up the Pipeline in the Data Factory

1. Create Linked Services

Go to your Data Factory > Author & Monitor > Manage > Linked services > New.
Select Azure Blob Storage > Continue > Fill in details > Test connection > Create.

2. Create Datasets

Go to Author > Datasets > Add new dataset.
Select Azure Blob Storage > Continue > Select format (Binary) > Continue.
Fill in details, link to the “source” container, and create.
Repeat for the “destination” container.

3. Create the Pipeline

Go to Author > Pipelines > Add new pipeline.
Drag “Copy data” to the pipeline.
Set up Source and Sink (Destination) datasets.
Publish the pipeline.

4. Run the Pipeline

Add a trigger to run on a schedule or run manually.
Monitor the pipeline run to ensure it succeeds.

5. Verify Data Movement

Go to your Blob Storage account > Containers > Check that the JSON file is in both the “source” and “destination” containers.

Conclusion

Azure Data Factory makes it easy to move and transform data between various sources and destinations. By following this guide, you can set up a data factory instance and create a pipeline to automate data movement. This functionality is essential for data analytics solutions.

Note: These steps are correct at the time of composing this article. In the future, they may be subject to change.