Migrating S3 Buckets Across AWS Accounts 🌩

Out With the Old, In with the New

Amazon Simple Storage Service is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web.

A year ago, we decided to segregate our AWS account and migrate data to two new AWS accounts: DEV and PROD. We built Codeflow as a tool to deploy all our apps to Kubernetes and migrating S3 buckets was one of the last steps before we could completely deprecate the legacy account. 💥🍾

How do you organize a file dumping site, that was used for 3 years and had no structure or naming conventions defined or enforced? First you get some aspirin and a bucket of coffee, then off to the internet and start researching how to transfer 100+ buckets across accounts…

I did the same and after some googling I found this article: How do I transfer ownership of Amazon S3 objects to a different AWS account?

I got my test up and running in no time and it worked like a charm. This is easy when you want to transfer small number of buckets, but what do you do when you want to re-organize and transfer 1oo+ buckets? Automate everything!

The first thing I did was export all the bucket names to a spreadsheet, asking all stakeholders to fill in the blanks:

Then, we defined new names for the buckets following a new naming convention

<region>-<org>-<project-name> => us-east-1-checkr-documents
PRO TIP: Start using naming conventions everywhere as early as possible!!!

When testing, the most time consuming thing was the policy creation and adding the right permissions for the copy user. I started to build a simple CLI tool with Cobra, one-thing-led-to-another and in a couple of hours I came out with a tool called s3-sync that automates all steps with one command. It’s now open source and really easy to extend if you need any additional features.

First thing you need to do is copy the config config.yaml to config.prod.yaml and fill in the correct source and destination account information.

account_number: 945671751555
aws_access_key_id: ...
aws_secret_access_key: ...
aws_region: us-east-1
account_number: 945671751555
aws_user: username
aws_access_key_id: ...
aws_secret_access_key: ...
aws_region: us-east-1
enable_bucket_versioning: true
sync_sse: AES256
saso-test-1: us-east-1-checkr-saso-test-1
saso-test-2: us-east-1-checkr-saso-test-2

There are also two additional options you can enable on destination bucket versioning (enable_bucket_versioning) and encryption (sync_sse).

Last part of the config is bucket linking source-name: destination-name. You can define as many buckets as you want and if you created the spreadsheet you can just copy the names from there.

Once you are done with configuration you can run go run main.go sync --config config.prod.yaml (I didn't create the binary, but you can always post a comment and if there is interest in using this tool …).

S3-sync uses AWS cli tools so make sure you have them installed.

Under the hood…

  1. User policy gets created and linked to destination user with required bucket permissions.

2. New policy is set on the bucket (in case policy already exists it will migrate Statement with existing policy).

3. Set bucket versioning if enabled (with a little coding it’s possible to add other bucket features here).

4. Run sync with AWS cli tool — aws s3 sync s3://saso-test-1 s3://us-east-1-saso-test-1

The tool does not delete any buckets, but it does update policies, so be careful when running in production. I suggest you first test it with example buckets and then move to production data.

By using this we were able to quickly migrate our data and finally put some sanity into bucket names. Hoorray!