Looping through an S3 Bucket and performing actions using the AWS-CLI and BASH

I was recently tasked with moving our S3 objects into a different organizational structure to support our cloudfront distribution standards. In this blog post, I am going to go through the steps it took me to successfully do so.

Check your credentials and policy

The first step to dealing with the aws-cli is to set up and verify that you have the security credentials to handle any operations you plan on using. To do this, go to the IAM Management Console and head over to Users. Click on the User you plan on using and take a glance at the Policy (If you do not have a User or a policy, you will need to make one using the provided tools in the console).

Note: You should not edit your bucket policy for these operations. Editing your bucket policy is like setting global permissions, you only want to set very specific permissions to a specific person

For S3 operations, you will need a policy like this:

{
“Effect”: “Allow”,
“Condition”: {
“Bool”: {
“aws:MultiFactorAuthPresent”: “true”
}
},
“Action”: [
“s3:*”
],
“Resource”: [
“arn:aws:s3:::bucket-name”,
“arn:aws:s3:::bucket-name/*”,
]
}

Actions: Actions define all of the actions that are allowed with this policy. In this case, the wile card s3:* states that all s3 actions are allowed. If it is desired to minimize what is allowed, specific s3 actions can be listed

Resource: Resources must point specifically to the s3 bucket that is desired. Each bucket should be listed that you want the user to be able to perform the actions on. Note, if you only have “arn:aws:s3:::bucket-name/*” then you will not be able to perform actions on the actual bucket (calling listObjects on the actual bucket would not be allowed), which is why having two entries here is important

Condition: Any special conditions are listed here. In my case, 2-factor authentication is required for this policy to be used. More on how to do that with the AWS CLI later

Using the AWS-CLI

AWS Configure

After installing the aws-cli (I personally used brew), it is now important to configure the cli. Simply type aws configure in the terminal. Enter the Access Key ID and the Secret that you got when you set up your user, the region name and your preferred output (probably json).

2-Factor Auth in the cli

If 2-factor authentication is required in the policy (and really it should). It adds an extra step everytime you use the cli

aws sts get-session-token — serial-number arn:aws:iam::12345678:mfa/<username> — token-code 796568

The serial number is listed in the IAM Management Console under Assigned MFA device on the user you are looking at. The token code is whatever the 2-factor authentication app gives you.

Once you put these fields in correctly, it will return this object

{
"Credentials": {
"SecretAccessKey": "secretAccessKeyString",
"SessionToken": "Session Token",
"Expiration": "2017-03-13T15:27:29Z",
"AccessKeyId": "Access Key ID"
}
}

You now need to export these credentials by simply exporting these variables in your terminal

export AWS_ACCESS_KEY_ID=accessKeyID
export AWS_SECRET_ACCESS_KEY=secretAccessKey
export AWS_SESSION_TOKEN=sessionToken

note: because these are now exported in your terminal window, if you open up a new terminal window, you will have to re-export these values

At this point you should have the ability to perform any s3 actions on the buckets stated in the policy

Performing S3 Actions

In the case of my task, I needed to export some information from our database, and convert it to a SET so I could reorganize the structure of our s3 objects. To do this, I needed an associative array, which is not supported in the version of Bash that gets shipped with macs. So first I had to install Bash 4.0 or higher.

Importing JSON into bash

declare -A myArray
while IFS="=" read -r key value
do
myArray[$key]="$value"
done < <(jq -r "to_entries|map(\"\(.key)=\(.value)\")|.[]" ~/Desktop/profiles.json)

this converts a simple json object into a bash associative array

Validate the import

it wouldn’t hurt to make sure no mistakes were made in this process. Take a minute to loop through your object and make sure everything looks good before you start mutating s3 objects

count=0
for key in "${!myArray[@]}"
do
let count=count+1
echo "$key = ${myArray[$key]}"
done
echo $count

Looping through an s3 Bucket

origin="bucket-name/path/to/folder/"
count=0
for path in $(aws s3 ls $origin);
do
oldID=${path%/}
newID=${myArray[$oldID]} #gets the newID
if [[ "$newID" != "" ]]; then
destination="$origin/$newID/$path"
    aws s3 cp "s3://$origin$path" $destination --recursive

let count=count+1
echo "transferred $count images"
fi
done

aws s3 ls bucket-name will loop through any items at that path location.

oldID=${path%/} saves the relative path of the file and removes the trailing slash

newID=${myArray[$oldID]} retrieves the new ID from the previously created associative array

if [[ “$newID” != “” ]]; then This ensures that none of the operations are going to be performed on empty paths. Some form of validation is required for these scripts, otherwise you will perform your actions on other objects that are listed when you call ls such as PRE which contains metadata and does not point to an actual s3 object

aws s3 cp “s3://$origin$path” $destination — recursive finally we call the command. In this case we are only copying instead of moving, that way reverting in the case of an error is much easier and safer. The recursive flag forces every file within each folder of the current directory to be moved using the same path information

More information on commands you can make with the AWS-CLI using s3 http://docs.aws.amazon.com/cli/latest/userguide/using-s3-commands.html