AWS S3 bucket: bulk copy or rename files from Windows
Recently we have had the need at Friend Theory to bulk move, copy multiple files at once on our AWS S3 buckets, based on a specific renaming pattern. Here’s the approach I’ve used and how I did it.
1. Install and Configure AWS CLI
Install AWS CLI following the instructions on the link below:
You can check your installation succeeded by running:
$ aws --version
Run the initial configuration to allow the CLI to connect to your AWS account:
$ aws configure
AWS Access Key ID [None]: SGFJHCGHDUHGKJ84EXAMPLE
AWS Secret Access Key [None]: QWERTYUIOPASDFGHJKLEXAMPLE
Default region name [None]: us-west-2
Default output format [None]: text
(the access key ID and secret come from an IAM User identity, these credentials are created on AWS console. Usually you would already have one or more already created, or you can create a new one just for this).
More info here: https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
2. Moving and renaming single files on AWS S3
The aws s3 copy and move commands work more or less like the native UNIX mv and cp commands:
$> aws s3 cp s3://bucket-name/path/to/source-file.ext destination-file.ext$> aws s3 mv s3://bucket-name/path/to/source-file.ext destination-file.ext
3. Script to bulk rename files on AWS S3 bucket
Without further ado, here’s the script I ended up using (details and explanations below so you can adapt it to your needs):
$> aws s3api list-objects --bucket friend-theory-dev --prefix "test/profile-pictures/" --delimiter "/" | ForEach-Object { $_.split("`t")[2] } | Select-String -Pattern 100x100.jpg | ForEach-Object -Process {$outputFile = $_ -replace '100x100', 'sm'; $outputFile = $outputFile -replace '/profile-pictures', '/users-pictures'; aws s3 cp s3://friend-theory-dev/$_ s3://friend-theory-dev/$outputFile }
Let’s take this piece by piece:
Listing objects:
$> aws s3api list-objects --bucket sample-bucket --prefix “folderA” --delimiter “/”
OWNER hello 123456789
CONTENTS “1234567890” folderA/11111–100x100.jpg 2019–06–17T16:29:39.000Z 44193 STANDARD
OWNER hello 123456789
CONTENTS “1234567890” folderA/11111–500x500.jpg 2019–06–17T16:29:39.000Z 44193 STANDARD
OWNER hello 123456789
CONTENTS “1234567890” folderA/22222–100x100.jpg 2019–06–17T16:29:39.000Z 50071 STANDARD
...
This first part uses the lower-level aws s3api list-objects which outputs a list of AWS objects.
- The
--bucket
parameter specifies the name of the bucket - The
--prefix
parameter specifies the path within the bucket (folder). The delimiter “/” is there to prevent recursion if there were folders within folderA. - Note: if the default output format of your AWS CLI configuration is JSON, you will have to add an extra parameter — output text to ask for a text output.
Splitting the output:
As usual in UNIX or PowerShell, we use the pipe “|” to pass the output of one command and input for the following one:
| ForEach-Object { $_.split(“`t”)[2] }
For each of the “Objects” returned (i.e. each line of the text output), we apply the split function to $_ String, the $ sign referencing a variable and the underscore being the default name of the input passed to the function.
So we apply a split function to each line of the previous output, splitting on the TAB character `t (an escaped “t” character, equivalent to a \t).
Finally, we take the index 2 of that split, which means the name of the file.
Output:
folderA/11111–100x100.jpg
folderA/11111–500x500.jpg
folderA/22222–100x100.jpg
...
Matching with a Pattern
This part is optional. You don’t need it if you want to move or copy ALL of the files in your folder.
We use the PowerShell Select-String function (~ like a grep in UNIX) to match the lines that contain the pattern we need. In this case it’s pretty simple, every file that contains “100x100.jpg” will be matched.
| Select-String -Pattern 100x100.jpg
But Select-String can be quite powerful. See the full documentation.
Rename the files
The last part of the script is again running the ForEach-Object function on the previous output, a runs a process onto it, that will calculate the output path and file:
| ForEach-Object -Process {$outputFile = $_ -replace ‘100x100’, ‘sm’; $outputFile = $outputFile -replace ‘/folderA’, ‘/folderB’; aws s3 cp s3://sample-bucket/$_ s3://sample-bucket/$outputFile }
The first part is an assignation: the initial input $_
we replace the characters “100x100” by “sm” (that was my requirement, this is where you replace with your own renaming rule) and assign this to the variable $outputFile
.
This part can be removed if you don’t need the files to be renamed but only moved or copied to another folder.
$outputFile = $_ -replace ‘100x100’, ‘sm’;
folderA/11111–100x100.jpg is replaced by folderA/11111-sm.jpg
The second part is another assignation: in the previous variable we replace “folderA/” by “folderB/”. This part can be removed if you don’t need to change folder but only rename files.
$outputFile = $outputFile -replace ‘/folderA’, ‘/folderB’;
folderA/11111-sm.jpg becomes folderB/11111-sm.jpg
Finally, the last part is making a call to AWS S3 API to copy the file from its initial path and name in $_
variable, to the destination one in $outputFile
aws s3 cp s3://sample-bucket/$_ s3://sample-bucket/$outputFile
If you need to move instead of copy the files, you just need to change cp
into mv
.
THAT’S IT !
All in all, this combines many different concepts, but makes for a powerful and versatile command that can achieve a lot of maintenance on AWS S3 buckets directly from your Windows machine.
Credit: Gerard Vivancos for his article on how to do this under UNIX, which I used as a base to do this under Windows (http://gerardvivancos.com/2016/04/12/Single-and-Bulk-Renaming-of-Objects-in-Amazon-S3/).