Working with S3 tools can be little difficult, especially when you have a large set of data to download/upload.
I had a requirement in one of the projects to help a customer download a large number of data from AWS S3 bucket to local Windows Server. The data was primarily of more than 10 million files, which makes it little tiresome task for any download tool. To rescue me with days of download operations of more than 10 million files, AWS provided CLI tools which can help automate the task!
That was not the end of the story, but the very beginning of it :)
Based on the 30 minutes of operations of AWS CLI tool to download files from S3 bucket, I realized that the download speed is getting maxed at 4.5MiB/s, i.e. 4.71 Megabytes per second. That translates to 8.47 GB in 30 minutes of operations. This was super slow speed!
I started looking at other tools to help me speed up the process. Following are the parameters I decided to consider:
- Multi-threaded operations
- Download Speed (which is correlated to # 1)
- Use AWS transfer acceleration feature to speed up the process
- Most important: The tool must work on Windows!
Here are the tools, I tested:
To my surprise, S5cmd passed all the test in the above parameters. However, there is no direct compiler available which could help me build “.exe” or “.msi” extension to install in Windows. This is where my research scope expanded!
After a lot of research and reading articles, I concluded that there is no article or steps available on Internet to use S5cmd on Windows! Hence, I decided to pin the steps for readers.
There are a few pre-req to use S5cmd on Windows:
Step # 1 : You will need to install Git from here.
Step # 3: Install AWS CLI tool from here. This is required to pick up the AWS credentials which S5cmd is going to use further.
Step # 4: Once AWS CLI tool is installed, configure it using ‘aws configure’ command and key in your AWS Access Key ID and Secret Access Key, leave the region as default (note: if your S3 bucket is in us-east-1 then no need to change the region, else you will need to specify the related region here)
Step # 5: Install S5cmd using this command in the command prompt:
go get github.com/peak/s5cmd
Viola! You are ready. Just type S5cmd in the command prompt, you should be able to see the list of options associated with S5cmd
Here is the command to copy the entire bucket (note: the bucket path and the destination must be in double quotes to avoid any path break/delimiters)
s5cmd cp -stat "s3://bucketname/*" "F:\S3Download"
Here comes the interesting part, the results!
Comparison of S5cmd VS AWS CLI tool with 30 minutes of operations:
As you can see, S5cmd does a better job in download large files from S3 bucket and this is possible only with inbuilt AWS Transfer Acceleration in S5cmd.
- Check the bucket size
s5cmd du --humanize "s3://bucketname/*"
- List the content of the buckets
s5cmd ls "s3://bucketname/*"
- Command to copy entire S3 bucket
s5cmd cp --retry-count 20 -stat "s3://bucketname/*" "path to local drive folder"