Syncing S3 buckets from different providers/endpoints (part 1)

Bo Bracquez
5 min readNov 4, 2019

--

In a previous post I have setup LocalStack in Docker and created a S3 bucket. Now what I want to achieve is to sync a local/mocked S3 bucket to a live AWS S3 bucket. Sounds fairly easy. Right…?

AWS Cli (docs)

What do the AWS Cli docs say? They advise us to used the aws s3 sync command. We already know that we can set the --endpoint parameter for the AWS Cli tools. However, we cannot define two different endpoints for the command so we cannot just use the aws s3 sync command. The vanilla AWS Cli tools do not like to way to go for this.

Rclone

Rclone — rsync for cloud storage

Rclone is an utility tool that lets you sync cloud storage. This sounds promising! When looking at the S3 documentation we can see that they have support for “8 / Any other S3 compatible provider \ “Other””.

Choose your S3 provider.
Choose a number from below, or type in your own value
1 / Amazon Web Services (AWS) S3
\ “AWS”
2 / Ceph Object Storage
\ “Ceph”
3 / Digital Ocean Spaces
\ “DigitalOcean”
4 / Dreamhost DreamObjects
\ “Dreamhost”
5 / IBM COS S3
\ “IBMCOS”
6 / Minio Object Storage
\ “Minio”
7 / Wasabi Object Storage
\ “Wasabi”
8 / Any other S3 compatible provider
\ “Other”

What ‘concerns’ me is that in the future I want to sync other AWS services too, and using a lot of different tools feels a bit clunky. While Rclone might work for S3 buckets, it ceraintly does not work DynamoDB. I believe that I would feel more secure by sticking to the AWS Cli tools, but how since they do not offer this functionality out of the box? Well, let’s think outside the box!

Source: https://commons.wikimedia.org/wiki/File:Think_Outside_the_Box_Idea_Flat_Icon_Vector.svg

Thinking outside the box

What if we could do this… AWS S3 bucket-> local filesystem and then local filesystem -> mocked S3 bucket? This sounds doable for me, certainly because we can sync S3 buckets to the local filesystem so in fact it will just be a matter of switching AWS profiles!

When looking at the docs we can see how we sync to the local filesystem and back.

S3 bucket to local filesystem

$ aws s3 sync s3://mybucket .
download: s3://mybucket/test.txt to test.txt
download: s3://mybucket/test2.txt to test2.txt

Local filesystem to S3 bucket

$ aws s3 sync . s3://mybucket
upload: test.txt to s3://mybucket/test.txt
upload: test2.txt to s3://mybucket/test2.txt

As we can see, this theory seems viable. Of course we need to define the endpoint for our local S3 bucket. But what about your AWS profile?

AWS Cli Profiles

We know that in order to use the AWS Cli tools we need to configure it, how else is it going to know who we are and what we are trying to do :)?

This can be done by invoking aws configure , this will prompt you with several fields to fill in, an example can be found below (this was grabbed from the referenced page).

$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: json

The AWS CLI supports using any of multiple named profiles that are stored in the config and credentials files. You can configure additional profiles by using aws configure with the --profile option, or by adding entries to the config and credentials files.

We can use “Named Profile” to make our lives easier, this sounds like a good approach compared to eg. using export on unix systems to set your credentials etc, and it is cross platform. I went on an adventure and started cconfiguring this and then *poof* I remembered something when I got to configuring the credentials for my local environment… I did not need to use any sort of valid credentials for LocalStack! This just made everything a whole lot easier to do. We can just setup our AWS Cli tools with our valid AWS credentials and use it both for our ‘live’ S3 buckets and locally mocked S3 buckets because LocalStack does not validate/use any actual credentials. I went onto’s LocalStack’s GitHub to confirm this, and here I found a GitHib comment that explains this more in detail.

The final preparation step

As my final preparation step I wanted to find a way that made using LocalStack and the AWS Cli tools easier and smoother. Who likes typing the --endpoint paramter every time? It sounds too much like copy-paste to me. I investigated some projects that tackled this issue, I even found a project LocalStack to aide people in this. Sadly enough, the last updates were from ‘last year’ and it did not seem that active and it would have been yet another tool we have to rely on while we can do it with maybe one line of ‘code’ in bash. And this command’s alias is… alias! The nice part of this is that alias also works in Windows’ PowerShell.

A Bash alias is essentially nothing more than a keyboard shortcut, an abbreviation, a means of avoiding typing a long command sequence. If, for example, we include alias lm=”ls -l | more” in the ~/.bashrc file, then each lm [1] typed at the command-line will automatically be replaced by a ls -l | more. This can save a great deal of typing at the command-line and avoid having to remember complex combinations of commands and options. Setting alias rm=”rm -i” (interactive mode delete) may save a good deal of grief, since it can prevent inadvertently deleting important files. http://tldp.org/LDP/abs/html/aliases.html

By using alias we can set up something like the following, please note that your endpoint may be different. Maybe there are better ways to set this up, but this seems to work fine for me (at this moment). I am certainly open for suggestions on how to improve this.

$ alias s3local='aws --endpoint=http://192.168.99.100:4572 s3'

Next up I will be integration everything that was described here in order to sync S3 buckets between a live and local environment. Maybe I will even throw in a full sync as in listing all the buckets and then syncing them, but this is dependant on how things will work out of course… I am not a magician :p.

--

--