Migration of data from On-Premise NFS file share to AWS S3 using AWS DataSync

chinmay mandal
7 min readApr 28, 2024

--

What is AWS DataSync

AWS DataSync is a managed storage transfer service which helps migrate data from on-premises to AWS cloud, Between the clouds or within the AWS Cloud.

It helps us quickly, easily, and securely transfer our file or object data to, from, and between AWS storage services.

We are going to migrate the data from a on-premise NFS or NAS storage to AWS S3.

Architecture to follow

  1. From Web browser to DataSync agent. TCP 80: Your computer to obtain the agent activation key. After successful activation, DataSync closes the agent’s port 80.
  2. DataSync agent to DataSync Service. TCP 1024–1064: For control traffic between the DataSync agent and the AWS service.
  3. DataSync agent to DataSync Service. TCP 443: For data transfer from the DataSync VM to the AWS service.
  4. DataSync service to Amazon S3. TCP 443.

Prerequisites

  1. AWS Accounts
  2. AWS VPC with subnets and routing
  3. S3 bucket
  4. S3 and AWS DataSync VPC endpoints

Steps to Follow

We are going to install DataSync agent on EC2 instance. The agent will communicate with DataSync service using VPC endpoint. DataSync agent also communicate with S3 using S3 VPC endpoint.

Step 1: Verify S3 bucket creation

Create a simple S3 bucket.

Step 2: Verify the S3 and AWS DataSync VPC endpoint

Create VPC endpoint under VPC section. search service name as datasync.

follow to create DataSync vpc endpoint

Please find below datasync VPC endpoint.

AWS DataSync VPC Endpoint

AWS DataSync VPC endpoint security group. Make sure to open below ports from your DataSync agent. (TCP 443, TCP 1024–1064).

Similarly create S3 Interface vpc endpoint.

S3 VPC Endpoint

Step 3: Install AWS DataSync Agent on Ec2

Create Ec2 Instance:

First create an Ec2 instance with below provided AMI. AWS provides a Datasync AMI.

Use the following AWS CLI command to get the latest DataSync Amazon Machine Image (AMI) ID for your AWS Region.

aws ssm get-parameter --name /aws/service/datasync/ami --region us-east-1
{
"Parameter": {
"Name": "/aws/service/datasync/ami",
"Type": "String",
"Value": "ami-02ffe626aeecac317",
"Version": 104,
"LastModifiedDate": "2024-04-24T16:58:18.766000+00:00",
"ARN": "arn:aws:ssm:us-east-1::parameter/aws/service/datasync/ami",
"DataType": "text"

Navigate to Ec2 instance console, create an Ec2 instance with above fetched AMI.

Fill the name of Ec2 instance, choose the fetched AMI.

Choose m5.xlarge as instance type, Choose the required VPC and subnet details.

Choose a security group with inbound port of TCP 80 from your web browser and inbound port for NFS or SMB files share.

Now Ec2 is created.

Install Agent on Ec2 instance

Navigate to AWS DataSync, click on transfer data, then click on agents

Choose deploy agent as Ec2 instance, choose endpoint type as VPC endpoint and choose the earlier created VPC endpoint from the dropdown.

Then choose subnet and earlier created security group.

For Activation key there are 2 options, we are going to choose automatically get the activation key from agent. Provide the agent IP address. Make sure this IP address is accessible from your browser over port 80.

Click on Get key. Once clicked it will get the activation key and forward to next page with the activation key.

Click on create agent.

Step 4:Create a source location for AWS DataSync

As the DataSync agent is created now, we can navigate to Locations under data transfer, click on create location.

Our source location is a NFS on-premise share location. Provide the IP address or domain of NFS. Provide the mount path as well.

Click on create.

Step 5:Create a destination location for AWS DataSync

Our Destination would be a Amazon S3.

As the DataSync agent is created now, we can navigate to Locations under data transfer, click on create location

Choose location Amazon S3, choose the earlier created bucket from dropdown. Choose an IAM role with below permission or generate one.

Click on create location.

IAM permission.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AWSDataSyncS3BucketPermissions",
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::<bucket-name>",
"Condition": {
"StringEquals": {
"aws:ResourceAccount": "Account-Number"
}
}
},
{
"Sid": "AWSDataSyncS3ObjectPermissions",
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:GetObject",
"s3:GetObjectTagging",
"s3:GetObjectVersion",
"s3:GetObjectVersionTagging",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:PutObjectTagging"
],
"Resource": "arn:aws:s3:::<bucket-name>/*",
"Condition": {
"StringEquals": {
"aws:ResourceAccount": "Account-Number"
}
}
}
]
}

Step 6:Create and start your AWS DataSync task

Now we have created the source and destination location. Lets create the Migration task.

Navigate to task under Data transfer, click on create task.

Choose an existing location under source location option. click next.

Choose an existing location under destination location option. click next.

Provide a task name, you can choose everything or specific files, object to transfer from source also specify what to exclude.

Under transfer options, choose the one you required for your data tansfer.

You can schedule the task, when it should run or you want it to manually run.

Option to generate a detailed report on DataSync transfer task.

You can specify a log group to track all your logs about the errors or transfers.

Now the data transfer task has been created.

Testing:

To test let's upload few files to source NFS location.

Verify the destination that it doesn’t have any objects or files.

Click on Start with defaults

Now navigate to history section to check on the execution.

Now see the 4 files transferred

Verify the S3 bucket, you will be able to see the transferred along with the metadata file

We got few errors as well

check the cloudwatch logs for more details on the error

Below is the successful data transfer cloudwatch logs

Below is the screenshot on data transfer happens on differed data. it will skip the already transferred data. This is because we choose earlier to Transfer only data that has changed.

Transfer only the data has changed

Conclusion:

AWS DataSync is one of the best service to migrate files or objects from On-premises NFS or SMB location to AWS cloud. The above scenario which we demonstrated above is on migration of data from NFS to S3. During the migration make sure you have opened correct ports for communication. Make sure you have provided the correct permission on the NFS share.

buy me a Tea

References

  1. https://docs.aws.amazon.com/datasync/latest/userguide/what-is-datasync.html

--

--

chinmay mandal

Multi-Cloud Architect with 10 years of experience. 10-AWS, 3-GCP, 3-Terraform.Skills: AWS, GCP, Networking,Terraform