Step By Step Guide to Data Migration with AWS Storage Gateway File Gateway(AWS Academy)
In this guide, we will use the AWS Storage Gateway File Gateway service to attach a Network File System (NFS) mount to an on-premises data store. We will then replicate that data to an S3 bucket in AWS. Additionally, we will configure advanced Amazon S3 features, like Amazon S3 lifecycle policies and cross-Region replication.
Outcomes after completing this guide:
- Configure a File Gateway with an NFS file share and attach it to a Linux instance
- Migrate a set of data from the Linux instance to an S3 bucket
- Create and configure a primary S3 bucket to migrate on-premises server data to AWS
- Create and configure a secondary S3 bucket to use for cross-Region replication
- Create an S3 lifecycle policy to automatically manage data in a bucket
But before jumping further let’s understand the basics of AWS Storage Gateway and File Gateway
- AWS Storage Gateway:
->AWS Storage Gateway is the overarching service provided by Amazon Web Services.
->It offers a variety of gateway types, including File Gateway, Volume Gateway, and Tape Gateway, each tailored for different storage use cases.
->In addition to the File Gateway, it includes Volume Gateway, which provides block storage volumes, and Tape Gateway, which provides virtual tape library (VTL) interfaces.
->It enables seamlessly integrate your on-premises applications with AWS cloud storage services like Amazon S3, Amazon Glacier, and others. - File Gateway:
->File Gateway is a specific type of gateway provided by AWS Storage Gateway.
->It presents a file interface to the on-premises applications, allowing us to store files as objects in Amazon S3 buckets.
->It supports industry-standard file protocols such as NFS (Network File System) and SMB (Server Message Block).
->File Gateway is ideal for scenarios where you need to extend existing on-premises file storage to the cloud without modifying your applications.
Task 1: Reviewing the lab architecture
- This lab environment uses a total of three AWS Regions. A Linux EC2 instance that emulates an on-premises server is deployed to the us-east-1 (N. Virginia) Region. The Storage Gateway virtual appliance is deployed to the same Region as the Linux server. In a real-world scenario, the appliance would be deployed in a VMware vSphere or Microsoft Hyper-V environment, or as a physical Storage Gateway appliance.
- The primary S3 bucket is created in the us-east-2 (Ohio) Region. Data from the Linux host is copied to the primary S3 bucket. This bucket can also be called the source.
- The secondary S3 bucket is created in the us-west-2 (Oregon) Region. This secondary bucket is the target for the cross-Region replication policy. It can also be called the destination.
Task 2: Creating the primary and secondary S3 buckets
- Before we configure the File Gateway, we must create the primary S3 bucket (or the source) where we will replicate the data. We will also create the secondary bucket (or the destination) that will be used for cross-Region replication.
- In the search box to the right of Services, search for and choose S3 to open the S3 console.
- Choose Create bucket then configure these settings:
->Bucket name: Create a name that you can remember easily. It must be globally unique.
->Region: US East (Ohio) us-east-2
->Bucket Versioning: Enable - For cross-Region replication, we must enable versioning for both the source and destination buckets. But why?
->Ensuring Consistency Across Regions: Versioning in Amazon S3 allows you to keep multiple versions of an object in the same bucket. When cross-region replication is enabled, versioning ensures that all versions of objects, including any subsequent updates or deletions, are replicated to the destination bucket in the target region. This helps maintain data consistency across regions.
->Handling Concurrent Updates: Versioning helps in scenarios where multiple users or applications might be updating the same object concurrently. With versioning enabled, each update to an object generates a new version, allowing Amazon S3 to accurately replicate these changes to the destination bucket in the target region.
->Recovery and Rollback: Versioning provides a built-in mechanism for recovering from accidental deletion or modification of objects. If an object is deleted or overwritten, the previous versions are retained in the bucket. This capability is valuable when replicating data across regions, as it allows you to roll back to a previous version of an object in case of unintended changes or deletions.
->Compliance and Data Governance: Versioning helps organizations meet compliance requirements and enforce data governance policies. By retaining multiple versions of objects, organizations can track changes over time and maintain a complete audit trail of data modifications, which is especially important when replicating data across regions for disaster recovery or compliance purposes. - Choose Create bucket
- Repeat the previous steps in this task to create a second bucket with the following configuration:
->Bucket name: Create a name you can easily remember. It must be globally unique.
->Region: US West (Oregon) us-west-2
->Versioning: Enable
Task 3: Enabling cross-Region replication
- Now that we have created our two S3 buckets and enabled versioning on them, we can create a replication policy.
- Select the name of the source bucket that we created in the US East (Ohio) Region.
- Select the Management tab and under Replication rules select Create replication rule
- Configure the Replication rule:
->Replication rule name:crr-full-bucket
->Status Enabled
->Source bucket:
=>For Choose a rule scope, select Apply to all objects in the bucket
->Destination:
=>Choose a bucket in this account
=>Choose Browse S3 and select the bucket we created in the US West (Oregon) Region.
=>Select Choose path
=>IAM role: S3-CRR-Role
— ->Note: To find the AWS Identity and Access Management (IAM) role, in the search box, enter:S3-CRR
(This role was pre-created with the required permissions for this lab) - Choose Save. When prompted, if we want to replicate existing objects, choose No, and then choose Submit
- Return to and select the link to the bucket we created in the US East (Ohio) Region.
- Choose Upload to upload a file from the local computer to the bucket. For this lab, use a small file that does not contain sensitive information, such as a blank text file.
- Choose Add files, locate and open the file, then choose Upload
- Wait for the file to upload, then choose Close. Return to the bucket we created in the US West (Oregon) Region.
- The file that we uploaded should also now have been copied to this bucket.
Task 4: Configuring the File Gateway and creating an NFS file share
- In this task, we will deploy the File Gateway appliance as an Amazon Elastic Compute Cloud (Amazon EC2) instance. We will then configure a cache disk, select an S3 bucket to synchronize the on-premises files to, and select an IAM policy to use. Finally, we will create an NFS file share on the File Gateway.
- In the search box to the right of Services, search for and choose Storage Gateway to open the Storage Gateway console.
- At the top-right of the console, verify that the current Region is N. Virginia.
- Choose Create gateway then begin configuring the Step 1: Set up gateway settings:
->Gateway name:File Gateway
->Gateway time zone: Choose GMT -5:00 Eastern Time (US & Canada), Bogota, Lima
->Gateway type: Amazon S3 File Gateway
->Host platform: choose Amazon EC2. Choose Customize your settings. Then choose the Launch instance button.(A new tab opens to the EC2 instance launch wizard. This link automatically selects the correct Amazon Machine Image (AMI) that must be used for the File Gateway appliance.) - In the Launch an instance screen, begin configuring the gateway as described:
->Name:File Gateway Appliance
->AMI from catalog: Accept the default aws-storage-gateway AMI.
->Instance type: Select the t2.xlarge instance type
->Key pair name — required: choose the existing vockey key pair. - Configure the network and security group settings for the gateway.
->Next to Network settings, choose Edit, then configure:
=>VPC: On-Prem-VPC
=>Subnet: On-Prem-Subnet
=>Auto-assign public IP: Enable
=>Under Firewall (security groups), choose Select an existing security group.
->For Common security groups:
=>Select the security group with FileGatewayAccess in the name
=>Note: This security group is configured to allow traffic through ports 80 (HTTP), 443 (HTTPS), 53 (DNS), 123 (NTP), and 2049 (NFS). These ports enable the activation of the File Gateway appliance. They also enable connectivity from the Linux server to the NFS share that we will create on the File Gateway.
=>Also select the security group with OnPremSshAccess in the name
=>Note: This security group is configured to allow Secure Shell (SSH) connections on port 22.
=>Verify that both security group now appear as selected (details on each will appear in boxes in the console). - Configure the storage settings for the gateway.
->In the Configure storage panel, notice there is already an entry to create one 80GiB root volume.
->Choose Add new volume
->Set the size of the EBS volume to150
GiB - Finish creating the gateway.
->In the Summary panel on the right, keep the number of instances set to 1, and choose Launch instance - Monitor the status of the deployment and wait for Status Checks to complete.
- Select the File Gateway instance, then in the Details tab below, locate the Public IPv4 address and copy it. We will use this IP address when we complete the File Gateway deployment.
- Return to the AWS Storage Gateway tab in the browser. It should still be at the Set up gateway on Amazon EC2 screen.
- Check the box next to I completed all the steps above and launched the EC2 instance, then choose Next
- Configure the Step 2: Connect to AWS settings:
->In the Gateway connection options:
=>For IP address, paste in the IPv4 Public IP address that we copied from the File Gateway Appliance instance
->For the Service endpoint, select Publicly accessible.
->Choose Next - In the Step 3: Review and activate settings screen choose Activate gateway
- Configure the Step 4: Configure gateway settings:
->CloudWatch log group: Deactivate logging
->CloudWatch alarms: No Alarm
->A Successfully activated gateway File Gateway Appliance message displays.
->In the Configure cache storage panel, we will see that a message the local disks are loading.
->Wait for the local disks status to show that it finished processing (approximately 1 minute).
->Choose Configure - Start creating a file share.
->Wait for File Gateway status to change to Running.
->From the left side panel, choose File shares.
->Choose Create file share - On the Create file share screen, configure these settings:
->Gateway: Select the name of the File Gateway that we just created (which should be File Gateway)
->File share protocol: NFS
->Amazon S3 bucket name: Choose the name of the source bucket that you created in the US East (Ohio) us-east-2 Region in Task 1.
->Choose Customize configuration
->For File share name useshare
and choose Next. - On the Amazon S3 storage settings screen, configure these settings:
->Storage class for new objects: S3 Standard
->Access your S3 bucket: Use an existing IAM role
->IAM role: Paste the FgwIamPolicyARN, which we can retrieve by following these instructions –
=>Choose the Details dropdown menu above these instructions
=>Select Show
=>Copy the FgwIamPolicyARN value
->Choose Next - In the File access settings screen, accept the default settings.
->Choose Next - Scroll to the bottom of the Review and create screen, then select Create
- Monitor the status of the deployment and wait for Status to change to Available, which takes less than a minute.
- Select the file share that we just created by choosing the link.
- At the bottom of the screen, note the command to mount the file share on Linux. We will need it for the next task.
Task 5: Mounting the file share to the Linux instance and migrating the data
- Before we can migrate data to the NFS share that we created, we must first mount the share. In this task, we will mount the NFS share on a Linux server, then copy data to the share.
- Connect to the On-Prem Linux Server instance using Putty.
- We should now be connected to the instance.
- On the Linux instance, to view the data that exists on this server, enter the following command:
ls /media/data
- We should see 20 image files in the .png format.
- Create the directory that will be used to synchronize data with the S3 bucket by using the following command:
sudo mkdir -p /mnt/nfs/s3
- In the above command, if the directory “/mnt/nfs” already exists, the command will create the directory “s3” within it and if the directory “/mnt/nfs” doesn’t exist, the command will create both “/mnt/nfs” and “s3” directories.
- Mount the file share on the Linux instance by using the command that we located in the Storage Gateway file shares details screen at the end of the last task.
sudo mount -t nfs -o nolock,hard <File-Gateway-appliance-private-IP-address>:/share /mnt/nfs/s3
- The above command, mounts the NFS share located on the File Gateway appliance onto the local directory “/mnt/nfs/s3”, allowing us to access the files and directories in the NFS share as if they were part of the local filesystem.
- Verify that the share was mounted correctly by entering the following command:
df -h
- The above command is useful for quickly checking the disk space usage across different filesystems on the system and identifying any potential issues such as low disk space.
- Now that we have created the mount point, we can copy the data that we want to migrate to Amazon S3 into the share by using this command:
cp -v /media/data/*.png /mnt/nfs/s3
- The above command copies all PNG files from the directory “/media/data” to the directory “/mnt/nfs/s3”.
Task 6: Verifying that the data is migrated
- We have finished configuring the gateway and copying data into the NFS share. Now, you will verify that the configuration works as intended.
- In the Services search box, search for and choose S3 to open the S3 console.
- Select the bucket that we created in the US East (Ohio) Region.
->Verify that the 20 image files are listed. - Return to the Buckets page and select the bucket that we created in the US West (Oregon) Region.
->Verify that the image files were replicated to this bucket, based on the policy that we created earlier. - Congratulations, we successfully migrated data to Amazon S3 by using AWS Storage Gateway in File Gateway mode! After the data is stored in Amazon S3, we can act on it like native Amazon S3 data. In this lab, we created a replication policy to copy the data to a secondary Region. We could also perform other operations, such as configuring a lifecycle policy. For example, we could migrate infrequently used data automatically from S3 Standard to Amazon Simple Storage Service Glacier for long-term storage, which can reduce costs.
Key Learnings
- AWS Storage Gateway File Gateway Service: Understanding the functionality and utilization of the AWS Storage Gateway File Gateway service is crucial. This service enables seamless integration between on-premises environments and AWS cloud storage. Specifically, the File Gateway service provides a way to attach a Network File System (NFS) mount to an on-premises data store, allowing existing applications to access data stored in Amazon S3 as if it were local storage.
- Data Replication to S3 Bucket: Learning the process of replicating data from an on-premises Linux instance to an S3 bucket in AWS is essential for efficient data management. AWS Storage Gateway facilitates this replication by acting as a bridge between the on-premises environment and AWS cloud storage. By configuring a File Gateway with an NFS file share, users can migrate data from their Linux instances to S3 buckets seamlessly.
- Advanced Amazon S3 Features: Configuring advanced Amazon S3 features such as lifecycle policies and cross-Region replication enhances data management capabilities. Lifecycle policies enable automatic management of objects stored in S3 buckets, allowing users to define rules for transitioning data between storage classes or deleting objects based on predefined criteria. Cross-Region replication provides redundancy and disaster recovery capabilities by replicating data from a source bucket in one AWS Region to a destination bucket in another Region.
- Primary and Secondary S3 Buckets: Creating primary and secondary S3 buckets involves configuring bucket settings such as bucket name, region, and versioning. Enabling versioning ensures that multiple versions of objects are retained in the bucket, facilitating data recovery and compliance requirements. Primary buckets serve as the source for data replication, while secondary buckets act as the destination for replicated data, supporting cross-Region replication policies.
- Cross-Region Replication: Enabling cross-Region replication between S3 buckets involves creating replication rules to specify which objects are replicated and defining IAM roles to grant permissions for replication. Cross-Region replication enhances data durability and availability by replicating objects across multiple AWS Regions, providing resilience against Region-wide outages and improving data locality for global applications.
- File Gateway Configuration: Deploying and configuring the File Gateway appliance involves setting up gateway settings, network configurations, storage settings, and creating NFS file shares. By selecting an appropriate EC2 instance type, configuring network and security group settings, and provisioning storage volumes, users can ensure optimal performance and scalability of the File Gateway appliance. Creating NFS file shares allows on-premises applications to access data stored in S3 buckets via NFS protocols, enabling seamless integration with existing workflows.
- Data Migration: Mounting NFS file shares on Linux instances and migrating data to S3 buckets involves executing commands to mount the shares and copy data from local directories to the mounted shares. Verifying the data migration ensures that data is successfully transferred to the S3 buckets, allowing users to confirm the integrity and completeness of the migration process.
If you found value and enlightenment in this blog post, I encourage you to express your appreciation by giving it a clap! Also click the follow button to stay connected and receive notifications about future posts. Let’s venture on this path together and delve deeper into the fascinating realms of Cloud and Security.