Event Driven Transfers from Amazon S3 to Google Cloud Storage using Storage Transfer Service

Aman Puri
Google Cloud - Community
6 min readDec 13, 2022

Google Cloud Platform has officially announced Public Preview for Storage Transfer Service initiated with Event Driven Transfers. This will help us to transfer objects from Amazon S3 to Google Cloud Storage (GCS) bucket or one GCS bucket to other GCS in response to events whenever a new object has been created in the source.

Current Scenario

Initially Storage Transfer Service only had options to run objects on a particular schedule. While this helps with a minimum schedule of 1 hour for transferring data repeatadly, for customers who require a use-case for continuously syncing data between Amazon S3 and GCS could not be fast enough. Customers with a multi-cloud ecosystem would require immediate transfers to GCS as soon as objects are created or modified, which was not present.

How does Event Driven Transfer Work?

Storage Transfer Service can listen to event notifications in AWS or Google Cloud to automatically transfer data that has been added or updated in the source location.

Benefits of Event Driven Transfer

Because event-driven transfers listen for changes to the source bucket, updates are copied to the destination in near-real time. Storage Transfer Service doesn’t need to execute a list operation against the source, saving time and money.

Use cases include:

  • Event-driven analytics: Replicate data from AWS to Cloud Storage to perform analytics and processing.
  • Cloud Storage replication: Enable automatic, asynchronous object replication between Cloud Storage buckets.
  • DR/HA setup: Replicate objects from source to backup destination in order of minutes.
  • Live migration: Event-driven transfer can power low-downtime migration, on the order of minutes of downtime, as a follow up step to one-time batch migration.

Steps to set up Event Driven Transfers between Amazon S3 and GCS:

Prerequisites include:

  1. An Amazon S3 bucket as a source
  2. Google Cloud Storage bucket as a destination

Steps:

  1. In the AWS section, go to Simple Queue Service and click on Create Queue.
AWS page for SQS

2. Enter the Name of the queue, scroll down below and go to the Access Policy section and click on Advanced. You should see a JSON object there similar to something below. Note the Principal.AWS and Resource from the JSON.

Access Policy section while creating an SQS queue

3. Replace the Principal.AWS and Resource with Principal.AWS and RESOURCE_ARN in the below JSON code. Also replace the S3_BUCKET_ARN with your source S3 bucket’s Amazon Resource Name or ARN (You can find the S3 bucket’s ARN in the S3 bucket’s Properties section). Replace the JSON in the Access policy section with the below JSON and click on Create Queue.

{
"Version": "2012-10-17",
"Id": "example-ID",
"Statement": [
{
"Sid": "example-statement-ID",
"Effect": "Allow",
"Principal": {
"Service": "s3.amazonaws.com"
},
"Action": "SQS:SendMessage",
"Resource": "RESOURCE_ARN",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "PRINCIPAL.AWS"
},
"ArnLike": {
"aws:SourceArn": "S3_BUCKET_ARN"
}
}
}
]
}

This queue will respond whenever an object will be create or modified in the source S3 bucket.

4. Once the SQS queue is created, note the ARN for the queue which is in a format similar to below:

arn:aws:sqs:us-east-1:1234567890:event-queue"

5. Next is to enable the notifications on Amazon S3, go to source S3 bucket and go to Properties. Scroll down to Event Notifications and click on Create Event Notification.

Create an event notification in the Source S3 bucket

4. Specify a name for this event. In the Event types section, select All object create events. As the Destination select SQS queue and select the queue you created for this transfer. Click Save changes.

5. For creating an AWS Access Key ID and Secret Access Key OR a Federated Identity ARN with sufficient permissions to run the job, you can follow the steps here. You also set custom permissions required to run the job using the JSON below. You can replace AWS_BUCKET_NAME and AWS_QUEUE_ARN with the source S3 bucket and the SQS queue ARN.

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sqs:DeleteMessage",
"sqs:ChangeMessageVisibility",
"sqs:ReceiveMessage",
"s3:GetObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::AWS_BUCKET_NAME",
"arn:aws:s3:::AWS_BUCKET_NAME/*",
"AWS_QUEUE_ARN"
]
}
]
}

Once created, note the following information:

  • For a user, note the access key ID and secret key.
  • For a Federated Identity role, note the Amazon Resource Name (ARN), which has the format arn:aws:iam::AWS_ACCOUNT:role/ROLE_NAME.

6. IN GCP console, create a transfer job with Amazon S3 as the source and Google Cloud Storage as the destination and click on next step.

7. In the source, type the bucket path or the bucket name with path if any of the source S3 bucket and the Access Key ID and Secret Access key or Federated Identity role and click next step.

8. In the destination, choose the destination GCS bucket and click on next step.

9. In Choose how and when to run job, switch the transfer type to Event driven and enter the SQS queue ARN. You can also schedule when the transfer should start listening to the SQS queue and optionally when to stop listening. Optionally, click on next step for modifying any additional settings provided by Storage Transfer Service or click on Create to create the job.

10. Storage Transfer Service will now listen to the SQS object. Whenever, any object will be created/modified in the Amazon S3 bucket will immediately synchronise with the destination GCS bucket and will transfer the newly created/modified S3 objects to GCS.

IMPORTANT NOTE: Currently this service is in public preview and only supports S3 to GCS and GCS to GCS transfers

Thanks for Reading! 😊

References:

You can read about Event Driven Transfers in the official doc

Storage Transfer Service Release Notes

--

--