Using The AWS S3 CLI to Expand Storage

9 min readNov 12, 2016

Because I dual boot Arch Linux and macOS on the same machine, I typically end up lacking in storage space on one or both of my partitions. The problem is I’m an ebook hoarder. Even after reading them, I refuse to throw them in the trash, and I know that as soon as they’re on my backup drives, I’ll never see them again. So I decided to use an AWS S3 bucket to provide myself with some breathing room.

After signing up for AWS and creating a user, we will be walking through how to create a bucket, send files, receive files, and sync files all from the AWS CLI. I’ll be focusing on using macOS; however, the linux experience should be similar, and the principles will be same for Windows, though it may require substantial modifications.

What we will not be doing

We won’t be attempting to mount the bucket by setting up some sort of NFS (Dropbox, OwnCloud style). Though there are a few wrappers that would allow for this sort of setup with an AWS S3 bucket such as ExpanDrive, the goal of this post is to tackle a real use case for every day computer use as well as familiarize ourselves with the AWS CLI. I’ll touch on how to mount a bucket towards the end, but won’t go into great detail.

Signing Up For AWS

Sign up for the AWS free tier. Please read what you’re agreeing to. You will be required to enter a credit/debit card so be extremely careful what instances, machines, or objects you create as “free tier” doesn’t mean everything is completely free (typically my bill is between 50 cents a 5 dollars); however, if you spin up an an enterprise machine and leave it running, you’re going to have a bad time.

S3 Buckets are pretty cheap and shouldn’t cost more then a few cents/month.

Creating a User and User Group

We need to create a user and download their credentials in order to use the AWS CLI. There are two ways to accomplish this task.

Web Console:

If you are just signing up or have never created an AWS user, you can use the AWS web console. This is quite straight forward and can be accomplished here (aws guide). Let’s create the User Group first. Click on “Groups”, “Create New Group”, and attach the “AdministratorAccess” policy. As you can guess, this policy should be used with great caution, and if you plan on using S3 to store sensitive files, I recommend you take care to only attach the minimum amount of privileges necessary to complete the task. If you only want to be able to manage your S3 instance, you could alternatively review the “AmazonS3FullAccess” permission instead.

Next, click on “Users”, “Create New Users”, enter a name and make sure the checkbox to generate an access key is checked. It is very important that you download your credentials and keep them secure. So click on Download Credentials.

AWS CLI

The second method for creating users and user groups is the AWS Command Line Interface (CLI). However, this only works if you already have a AWS account associated with CLI, which if you just signed up for AWS or don’t have the CLI installed, then you will have to use the previous method.

Create the group.

$ aws iam create-group --group-name S3MyGroupName

Create the user.

$ aws iam create-user --user-name S3MyUserName

Add the user to a group.

$ aws iam add-user-to-group --user-name S3MyUserName --group-name S3MyGroupName

Get the user credentials.

$ aws iam create-access-key --user-name S3UserName

Save the credentials to a securely stored file.

Installing the AWS CLI

The AWS CLI works with both python 2 and python 3. The easiest way to install it is with pip. I’m going to walk through installing with python3 on a mac, but you can check out the github repo for more instructions.

## Install python3 on macOs
$ brew install python3 #installs pip3 as well
 
## Install CLI
$ sudo pip3 install awscli
## if using El-Capitan add the "--ignore-installed six" flag

Configure AWS CLI

You need to associate the CLI with your AWS account. You can retrieve the credentials from either the file you downloaded from the web console or from your create-access-key command.

$ aws configure
AWS Access Key ID []: ENTER_ACCESS_KEY_ID
AWS Secret Access Key []: ENTER_SECRET_ACCESS_KEY
Default region name []: us-east-1
Default ouput format []: nothing or "json"

I set the default region to us-east-1. You can use whatever region you want.

Default vs Custom Profiles

When you run aws configure you’re actually configuring the default user. You can use multiple user profiles by setting a creating a new user profile in ~/.aws/config and passing a “— profile username” option to your command.

Creating a Bucket

I’m going to call my bucket “toy-chest”. S3 bucket’s have to be unique across all users so it may take some time to find a name you like that’s also available. We use the make bucket (mb) command to create the bucket.

# aws s3 mb s3://name-of-bucket
$ aws s3 mb s3://toy-chest

Uploading Files

Let’s create a Cloud and bucket (toy-chest) directory where we’ll work with our files

$ mkdir -p ~/Cloud/toy-chest
$ cd ~/Cloud/toy-chest

Copy

We’ll start by copying a file to our s3 bucket.

$ touch test.txt
$ echo "Hello Darkness My Old Friend" > test.txt
$ aws s3 cp test.txt s3://toy-chest/

Listing

To review what files and directories exist in our bucket, we can use the ls command.

$ aws s3 ls s3://toy-chest/
> 2016-11-10 13:24:32         29 test.txt

If I had multiple buckets, I could view them by just running ls.

$ aws s3 ls
> 2016-11-10 12:48:39 toy-chest

And if I wanted to see all the files in all the directories on my bucket, I could pass a “ — recursive” option.

$ aws s3 ls s3://toy-chest/ --recursive
> 2016-11-10 13:24:32         29 test.txt
> 2016-11-10 13:24:32         30 myfolder/anotherfile.txt

Dumping to stdout

Sometimes you want to view the contents of a file. While there is no efficient way to do this, it is possible to dump the contents to stdout.

$ aws s3 cp s3://toy-chest/test.txt -
> Hello Darkness My Old Friend$ aws s3 cp --quiet s3://toy-chest/test.txt /dev/stdout
Hello Darkness My Old Friend

You are copying to stdout so understand that this will be more resource intensive than a simple cat.

If you’re curious as to what we use the “ — quiet” option, here’s the output without it:

$ aws s3 cp s3://toy-chest/test.txt /dev/stdout
Hello Darkness My Old Friendwith 1 file(s) remaining
download: s3://toy-chest/test.txt to ../../../dev/stdout

Downloading Files

Similar to how we uploaded files we can download files with cp.

$ aws s3 cp s3://toy-chest/test.txt test2.txt

Working With Directories

Moving one file at a time can be tedious. I created a new books directory and added a subdirectory full of ebook files called aws-administration.

$ mkdir -p books/aws-administration
$ cp ~/Downloads/AWSAdministration.* ./books/aws-administration/
$ cp ~/Downloads/eBookCode.md ./books/aws-administration/

I also want to add all my notes for that book into the same directory.

$ cp ~/Documents/notes/Aws ./books/aws-administration/

So how do I add all of these files to my bucket in one command?

$ aws s3 cp books s3://toy-chest/books --recursive

Because the s3 bucket is on a free tier and ebook files are quite large, it can take some time.

However, what if I accidentally added a unwanted book locally to my books folder and don’t want to move it out before I run every cp command. I may want to keep it locally on my computer but i definitely don’t want to save it to my bucket. I’ll create a junk directory inside my books folder and add a file in there.

$ mkdir ./books/junk 
$ touch books/junk/junk.txt
$ echo "No body wants this" > books/junk/junk.txt

But what if I also had another file in my junk folder that I’ve decided to upload to my bucket?

$ touch books/junk/notjunk.txt
$ echo "I may want this" > books/junk/notjunk.txt

So now I have a junk folderwhich contains files that I want as well as files that I do not want. Luckily I can exclude files from my copy command.

$ aws s3 cp books s3://toy-chest/books --recursive --exclude junk

Unfortunately, this would also exclude my notjunk.txt file as well. Luckily, the CLI allows me to overwrite rules within the command. Rules further to the right take precedence over rules to the left.

$ aws s3 cp books s3://toy-chest/books --recursive --exclude ./books/junk/* --include "./books/junk/notjunk.txt"

So up until now, I’ve been copying and overwriting the files each time. However, what if I only wanted to sync the files? The cli provides us with a sync options as well.

$ aws s3 sync books s3://toy-chest/books --recursive --exclude ./books/junk/* --include "./books/junk/notjunk.txt"

This is much more appropriate than merely copying. Here’s a description of the command

Syncs directories and S3 prefixes. Recursively copies new and updated files from the source directory to the destination. Only creates folders in the destination if they contain one or more files.

Removing Files

I’ve made a terrible mistake and I’ve decided that I actually didn’t want to include either junk.txt or notjunk.txt.

$ rm ./books/junk/junk.txt
$ rm ./books/not/junk.txt

However, if I were to sync it now, the two files would still exist in my bucket. The cli provides us with a “ — delete” option which will delete files on the s3 bucket that are not present locally.

$ aws s3 sync books s3://toy-chest/books --recursive --delete

In addition, there is an rm command which will also allow you to delete files

$ aws s3 rm s3://toy-chest/books/junk/junk.txt

A Note On Options

It’s great to check the help pages for these commands as many of them use similar options such as “ — recursive”, “ — include”, and “ — exclude”

Sharing Files

One cool things you can do is generate an https url with which you can use to share s3 objects with your friends. The “ — expires-in” option determines how many seconds the url is valid for (default is 3600).

$ aws s3 presign s3://toy-chest --expires-in 60

Note: You can only create presigned urls of objects, not the actual bucket itself.

Finishing up

So let’s solve my problem. I’ve gone ahead added a few more of my books into my toy-chest bucket. Here is my final directory structure.

$ longcommand| toy-chest
|  |---books
|  |  |---aws-administration
|  |  |  notes
|  |  |  |  ...
|  |  |  ...
|  |  |---design-ml-python
|  |  |  notes
|  |  |  |  ...
|  |  |  ...
|  |  |---expert-python
|  |  |  notes
|  |  |  |  ...
|  |  |  ...
|  |  |---learning-php7-high-perf
|  |  |  notes
|  |  |  |  ...
|  |  |  ...
|  |  |---numpy-essential
|  |  |  notes
|  |  |  |  ...
|  |  |  ...

Note: I didn’t have tree installed on my mac and didn’t feel like installing it so I ran a very long command to get a nice tree view. You can find the command here

$ aws s3 sync books s3://toy-chest/books --recursive --delete

Although it took a while, I’ve successfully synced my books folder and deleted my unwanted junk.

Mounting

Those hoping to replace DropBox or OwnCloud with S3 may be disappointed. While it certainly is possible, it will be a less efficient and a slower process than the alternative. However, if you’re interested, there are two notable options. First, there is ExpanDrive. From what I can tell, it is closed source and may require the purchase of a license (I don’t install software unless I’m planning on using it so take my understanding of ExpanDrive with a grain of salt). Second, there a few open source tools which may help in your quest. The most promising one seems to be the s3fs-fuse package which would only work for Unix systems. There is also goofys. Expect to put in a little effort if you want either of them to work.

Conclusion

There we have it. My ebooks are free to pile up in my virtual garage (right next to my virtual Lamborghini). Moving forward, I think it would be cool to build a gui to manage the collection. Many frameworks implement file system APIs with S3 drivers so I think I’ll get work on building a nice front end SPA to handle my collection..

Follow me on twitter? Maybe? @3lpsy

Feel free to comment down below or reach out to me on twitter if you have any comments, corrections, or suggestions for how I can improve this post.