Setting Up Azure Data Factory

Vitu Hansakul
3 min readMay 18, 2024

--

Azure Data Factory (ADF) is a powerful service for moving and transforming data in the cloud. In this guide, I’ll demonstrate how to set up an Azure Data Factory instance and create a data pipeline to copy data from an input Blob Storage container to an output Blob Storage container.

This example covers the basics of setting up a pipeline and can be extended to move or transform data between various sources and destinations.

Lab Preparation Steps

1. Create a Resource Group

  • Go to Azure Portal > Resource groups > Add.
  • Choose your subscription and name your resource group (e.g., DemoDataFactory).
  • Click “Review + create” and then “Create.”

2. Create a Blob Storage Account

  • Go to your Resource group > Add > Type “Storage account” > Create.
  • Fill in the details and name your storage account (e.g., blobstorage123).
  • Click “Review + create” and then “Create.”

3. Create Blob Containers

  • Go to your Blob Storage account > Containers > Add.
  • Create a container named “source” and another named “destination.”

4. Upload a JSON File to the Source Container

  • Go to the “source” container > Upload > Select a JSON file (e.g., demo.json) > Upload.

Creating the Data Factory

1. Add Data Factory to the Resource Group

  • Go to your Resource group > Add > Type “Data Factory” > Create.
  • Fill in the details, name your data factory (e.g., DataFactory123), and click “Review + create.”
  • Configure Git later if you don’t have a Git repository > Review + create > Create.

Setting Up the Pipeline in the Data Factory

1. Create Linked Services

  • Go to your Data Factory > Author & Monitor > Manage > Linked services > New.
  • Select Azure Blob Storage > Continue > Fill in details > Test connection > Create.

2. Create Datasets

  • Go to Author > Datasets > Add new dataset.
  • Select Azure Blob Storage > Continue > Select format (Binary) > Continue.
  • Fill in details, link to the “source” container, and create.
  • Repeat for the “destination” container.

3. Create the Pipeline

  • Go to Author > Pipelines > Add new pipeline.
  • Drag “Copy data” to the pipeline.
  • Set up Source and Sink (Destination) datasets.
  • Publish the pipeline.

4. Run the Pipeline

  • Add a trigger to run on a schedule or run manually.
  • Monitor the pipeline run to ensure it succeeds.

5. Verify Data Movement

  • Go to your Blob Storage account > Containers > Check that the JSON file is in both the “source” and “destination” containers.

Conclusion

Azure Data Factory makes it easy to move and transform data between various sources and destinations. By following this guide, you can set up a data factory instance and create a pipeline to automate data movement. This functionality is essential for data analytics solutions.

Note: These steps are correct at the time of composing this article. In the future, they may be subject to change.

--

--