Seamless Data Synchronization With AWS DataSync

Akshay Waditke
Ankercloud Engineering
5 min readMar 15, 2022

AWS DataSync is a data transfer service that makes it easy to automate moving data between on-premises storage and AWS storage services.

AWS DataSync allows you to copy large datasets with millions of files, without having to build custom solutions with open source tools or license and manage expensive commercial network acceleration software. It automates the scheduling of transfer activities, validates copied data, and uses a purpose-built network protocol and multi-threaded architecture to achieve very high efficiency on the wire. You can use DataSync to migrate active data to AWS, archive data to free up on-premises storage capacity, replicate data to AWS for business continuity, or transfer data to the cloud for analysis and processing.

With the help of DataSync, we can copy data between:

On-premises Data Storage

• Network File System (NFS) file servers

• Server Message Block (SMB) file servers

  • Hadoop Distributed File System (HDFS)
  • On-premises (self-managed) object storage

AWS Data Storage

• Amazon Simple Storage Service (Amazon S3) buckets

• Amazon EFS file systems

• Amazon FSx for Windows File Server file systems

• Amazon FSx for Lustre file systems

Here are some useful features of DataSync:

  • A single DataSync agent is capable of saturating a 10 Gb/s network link.
  • Disaster Recovery or replicate data for long-term archival.
  • DataSync suits well for data set synchronization between environments that keeps changing
  • Data transfer speed is extremely fast and rapid using multi-threaded architecture
  • DataSync auto-scales cloud resources to support higher-volume transfers and make it easy to add agents on-premises.
  • Uses AWS Direct Connect or internet links to AWS and is ideal for one-time data migrations, recurring data processing workflows, and automated replication for data protection and recovery
  • All your data is encrypted in transit with TLS. DataSync supports the use of default encryption for S3 buckets using the Amazon S3-Managed Encryption Key (SSE-S3) and the rest of the time using the Amazon EFS file system encryption of data.
  • DataSync ensures that your data remains intact both in transit and at rest by checking integrity.
  • Task scheduling enables you to configure executing a task periodically, to detect and copy changes from your source storage system to the destination.
  • DataSync supports VPC endpoints (powered by AWS Private Link) to move files directly into your Amazon VPC.

There are two ways to transfer Data with DataSync :

  1. Transfer data between on-premises to AWS

2. Transfer data between AWS Storage services

Steps For Data Transfer:

  1. Deploy an agent: Deploy a DataSync agent and associate it to your AWS account via the Management Console or API. The agent will be used to access your NFS server or SMB file share to read data from it or write data to it.
  2. Create a data transfer task: Create a task by specifying the location of your data source and destination, and any options you want to use to configure the transfer, such as the desired task schedule.
  3. Start the transfer: Start the task and monitor data movement in the console or with Amazon Cloud Watch.

Components of DataSync :

Agent: A virtual machine (VM) that’s used to read data from or write data to a self-managed location. An agent isn’t required when transferring between AWS storage services in the same AWS account.

4 agents: VMware ESXi, KVM, Hyper-V, EC2

Location: Any source or destination location that’s used in the data transfer, such as Amazon S3, Amazon EFS, Amazon FSx for Windows File Server,
Amazon FSx for Lustre, Network File System (NFS), Server Message Block (SMB), Hadoop Distributed File System (HDFS), or self-managed object storage.

Task: A source location and a destination location, and a configuration that defines how data is transferred.

A task always transfers data from the source to the destination.
The configuration can include options such as task scheduling, bandwidth limit, and so on. A task is the complete definition of a data transfer.

Task execution: An individual run of a task, which includes information such as the start time, end time, bytes written, and status.

Security is important when it comes to data, so let’s quickly understand security in DataSync:

Cloud security at AWS is the highest priority. As an AWS customer, you benefit from a data centre and network architecture that is built to meet the requirements of the most security-sensitive organizations. Security is a shared responsibility between AWS and you.

The shared responsibility model describes this as security of the cloud and security in the cloud:

  • Security of the cloud: AWS is responsible for protecting the infrastructure that runs AWS services in the AWS Cloud. AWS also provides you with services that you can use securely. Third-party auditors regularly test and verify the effectiveness of our security as part of the AWS compliance programs. To learn about the compliance programs that apply to AWS DataSync, see AWS services in scope by compliance program.
  • Security in the cloud: Your responsibility is determined by the AWS service that you use. You are also responsible for other factors including the sensitivity of your data, your company’s requirements, and applicable laws and regulations.

Benefits:

  • Automates management of processes and infrastructure
  • Easy to move data between on-premises storage and AWS
  • Automatic encryption of data
  • Up to 10-times faster than open-source tooling
  • Purpose-built network protocol and parallel, multi-threaded architecture
  • Move data cost-effectively with DataSync’s flat, per-gigabyte pricing

Conclusion:

AWS DataSync securely migrates data to AWS with end-to-end security, reduces expensive on-premises data movement costs, rapidly migrates file and object data to the cloud and eases data management.

If you are interested in knowing more about the benefits of AWS DataSync and how they can help you migrate data seamlessly, write to us at info@ankercloud.com and we will get back to you!

--

--