File Transfer from Azure BLOB to AWS S3 : Step-by-Step Guide

Sarath Chandran
Litmus7 Systems Consulting
4 min readSep 22, 2023

Now-a-days, many organizations are following a Multi-cloud approach - a strategy which leverages cloud computing services from at least two different cloud providers. This gives organizations the flexibility to optimize performance, control cost and avoid vendor lock-in. In such Multi-cloud scenarios, we often reach a point where we need to transfer data from one cloud platform to another.

Here, we will be discussing about such an approach where we need to transfer data from Azure BLOB storage to AWS S3 bucket.

On a high level, below are the steps:

  • Create a Python script to transfer file from BLOB to S3 (Script is given at the end of this post).
  • Create an Azure Batch account and configure the batch pool.
  • Create an ADF pipeline with a custom activity, and connect to Azure batch to run the data transfer script.

Now, on a detailed level, below are the steps that could be followed:

  • Create an Azure Batch account by providing necessary details.
Fig 1: Create Azure Batch Account
Fig 2: Create Azure Batch Account
  • Go to the resource, and under Pools tab, create a new pool by providing required details as shown below:
Fig 3: Create new Pool
Fig 4: Create new Pool
  • In the next step, make sure to provide at least 2 target dedicated nodes.
Fig 5: Create new Pool
Fig 6: Start task configuration
  • After this, submit the details.
  • Once the nodes are up and running, and status becomes steady, we are good to use the Azure batch.
  • From the keys section of Azure batch created, copy the primary access key.
  • Now, in ADF, create a Custom activity and provide the Batch account details. Key and end point details are available in the Key tab in Azure batch window.
Fig 7: ADF Custom activity
Fig 8: ADF Custom activity Keys
  • Place the python code for transferring file to S3 inside a container in Azure ADLS or Blob, and specify this linked service and path in the custom activity settings tab. Also provide the command to trigger the Python script as shown below:
Fig 9: ADF Custom activity Settings
  • Note that for the data copy script, additional python packages related to azure, aws and boto3 also will be required. In this case, modify the start task command as shown below:

cmd /c “python-3.11.4-amd64.exe /quiet InstallAllUsers=1 PrependPath=1 Include_test=0 && pip install azure-storage-blob && pip install s3fs && pip install pandas && pip install boto3”

  • Also, if the above start task modification is made to an existing pool in Azure batch, make sure that the compute nodes are restarted for the changes to take effect.
  • Job run status can be tracked from Jobs tab in Azure Batch.
Fig 10: Azure Batch Job status
  • Error logs if any will be available in stderr file and standard output streams like print statement will be available in stdout file.
Fig 11: Azure Batch Job stdout

Python script

from azure.storage.blob import BlobServiceClient, ContainerClient 

import pandas as pd

from io import BytesIO

from io import StringIO

import boto3



print("Reading data from Azure Blob...")

src_storage_accnt = "storage_accnt_name"

src_container = "container_name"

src_file = "source_file_name"

connect_str = "DefaultEndpointsProtocol=https;AccountName=<account_name>;AccountKey=<*********>==;EndpointSuffix=core.windows.net"

blob_service_client = ContainerClient.from_connection_string(conn_str=connect_str, container_name=src_container)

try:

blob_data = blob_service_client.download_blob(src_file)

src_blob = blob_data.readall()

except Exception as e:

print("Exception occurred while reading data from blob")

raise Exception("Data read exception")



print("Data read from Azure Blob successfully...")

print("Writing data to AWS S3...")

tgt_bucket = "target_bucket_name"

tgt_directory = "target_directory_name"

tgt_file = "target_file_name"

tgt_user = "target_user_name"

try:

access_key = "***********"

secret_key = "*************************"

aws_object_key = tgt_directory + "/" + tgt_file

s3 = boto3.client("s3", aws_access_key_id=access_key, aws_secret_access_key=secret_key)

s3.put_object(Body=src_blob, Bucket=tgt_bucket, Key=aws_object_key)

except Exception as e:

print("Exception occurred while writing data to bucket")

raise Exception("Data write exception")

print("Data written to AWS S3 successfully...")

--

--