Image by Valentimka from Pixabay

How to Increase Usability of AWS S3

Bivás Biswas
May 19 · 7 min read

Just like users, objects are first class citizens in the AWS world. They’re given a name at birth, a tag to identify any specific ethnicity (groups in AWS), they have different versions of themselves, and they have a life with different phases which they can transition through with time!

In this article, we’ll talk about Naming, Tagging, and Versioning, and we’ll talk about Lifecycle Management in the next article.

It All Starts With a Unique Name

S3 uses Object-based storage. As we discussed in The Fabulous Buckets Of AWS, object-based storage has no filesystem. You can upload your files from a Mac or Windows. It doesn't matter what the origination format of your data is, they get converted to an object and gets put inside a bucket.

Having no filesystem is great but it takes away an important concept that we are all used to. The directory structure!

We’re used to organizing our files. Vacation pictures from Hawaii go in maybe D:/Pictures/Vacation/Hawaii, or taxes for 2019 in C:/Documents/Taxes/2019

There is no way to get this file ‘system’ without a filesystem! Every object is thrown in the Bucket under the ‘root’ directory. There is no concept of a root either since that is also tied to a filesystem.

Since there is no filesystem, we won’t refer to the names as filenames anymore. Objects don’t have filenames they have Key-Value pairs to identify them. The key identifies the object and the value is the object itself.

Without a way to organize your files, it can get messy very quickly. How do we get around to this problem?

S3 has a clever solution. Make the directory structure as part of your filename.

So, the names of all your vacation pictures now have ‘/Pictures/Vacation/Hawaii/’ prepended to its name. The key for sunset1.jpg now becomes ‘/Pictures/Vacation/Hawaii/sunset1.jpg’.

If the name of your Bucket is my-gallery, and if it was created in California region then the entire URL of your sunset1.jpg would now be

http://s3.amazonaws.com/my-gallery/Pictures/Vacation/Hawaii/sunset1.jpg

There are different ways to represent the URL of an object. If you're interested in learning more about how the URL is formed, I talk about that in The Fabulous Buckets Of AWS.

The Pictures, Vacation, and Hawaii in the above example are called Prefixes and the ‘/’ is called the Delimiter.

You can’t include the Delimiter with the Prefix.

S3 will add a Delimiter to a Prefix automatically. Which means you can’t just create the above ‘folder and subfolder’ in one shot. You need to create the ‘Picture’ first, then go in there and create the Vacation, and so on. Finally, S3 will display it in the bucket as a folder-subfolder hierarchy. See picture below- (the name of my bucket is ‘bivas-us-west’)

Screenshot of my AWS S3 account showing how Prefixes are created in S3

Now, from the root of the bucket, it just looks like a normal drive on your computer hard drive— a collection of files and ‘folders’!

Notice that there is no ‘Move’ option for the _MG_8873.jpg file even though there appears to be a folder in the same directory. Well, now you know its not really a folder, it’s just a clever key for a file. You can’t move a file into another file.

To move the jpg into Hawaii ‘folder’, you’d need to first download the file to your hard drive, delete it from the bucket, then navigate all the way into the Hawaii folder and upload the file there. S3 will now create a new resource for it with a new Key.

Give your Tags a Relationship with Other AWS Services

When you tag an object, you’re adding information to the metadata of the object. Tags are represented by a key-value pair.

e.g. type=sunset

type is the key, and sunset is the value.

Tagging allows batch processing of files. The prefixes we saw above are good for categorizing resources with similar prefixes but tagging ties in with other AWS services so you can batch process your data. We’ll see some use cases below-

Tags & Search

Without tagging, you’d need to browse every file to check. With tags added, you can just search for the tag sunset.

You can add multiple tags, up to 10, to an object like so

type=sunset
project=travel
classification=nature

Tags & IAM

Besides, search, tags are primarily used for Access control. In AWS every user gets an IAM role that defines the what resources they can or cannot access. IAM is like the admin or guest account on your computer. An IAM role can be configured such that it can only access resources with a certain tag. For example, if you’re a travel photographer and uploaded your pictures on S3, and you want your editor to go in and add/modify tags, you can create a user for your editor with an IAM role that restricts access to resources with project=travel tags only. That way they don’t have access to all your other data besides these pictures that you may have stored on S3.

Tags & Lifecycle Management

Like humans, objects in AWS can have a lifecycle. You can set parameters on an object that’ll start the clock ticking on its age. The age of an object is defined by the frequency of access. If the object is not accessed after a particular amount of time, it can be moved over to the Standard S3-IA (Infrequent Access) tier thereby saving you money as this tier is a little cheaper than the Standard S3.

You can configure a whole batch of objects to be moved over to different tiers by using their tags. Maybe every object irrespective of their prefixes that has a tag of year=2019 gets moved over automatically to S3-IA when 2020 rolls in. And, to make sure that the move affects photos only and not other documents tagged with 2019, you can use the prefix value in combination with the tag. If all your photos are stored with a prefix of “photos/”, you can use that in the filter when configuring the bucket lifecycle.

AWS Cloudwatch can dump information on a bunch of objects based on their tags. Tags can also be used to isolate groups of objects to run analytics on.

We Need History to Root Us in the Present — Versioning

After you start storing data on S3 you’ll soon find yourself updating that data. S3 provides the ability to keep a history of the updates which is also known as versioning.

Unlike your hard drive where the new file with the same name will overwrite the old file. In S3, the new file, and the old file will co-exist. You’ll always see and access the new file but if you want to go back, you can.

Did you know that you can version files on Google Drive? Your UI/UX artists who are not usually git or Perforce savvy can use Google Drive to upload the latest versions of their work, and Drive will create a version in the background to retain the old one.

By default, versioning is turned off. You have to enable it. Once you enable versioning you start paying for it. It costs extra.

And, once you enable versioning you can’t go back to unversioning if that’s a word. However, you can suspend it.

So a bucket can be in 3 different states —

Unversioned (by default), Versioned, Suspended (not versioning any further updates but the older versions stay unless you go and delete them)

The way S3 deletes a version is by marking it for deletion instead of nuking it from the cloud. Once you do a DELETE request on the resource, the resource gets marked but stays. When you do a subsequent GET request on the same resource, S3 will return an Object not found error even though the object is still on S3. An admin can go in later and maybe evaluate all the delete requests before permanently deleting the object. This prevents just about anybody deleting just about anything.
S3 allows MFA (Multi-Factor Authentication) on DELETE requests which adds an extra layer of security where you’ll need to verify that you have a particular device you said you did when you created the account. For more information on how authentication works, you can read my article “I’m Not a Robot!” here

There is a problem with Versioning. If a bucket contains resources that are frequently updated by the user of your app and you turn on versioning then each one of those updates will end up creating a new version (as it should). The problem is that you could end up with millions of versions in a short amount of time!

When this happens, S3 will start throttling your request to the bucket and you’ll start to get HTTP server time outs (503). If you end up with millions of versions, you’ll need to call AWS and figure it out.

Summary

In this article, we talked about Names, Tags, and Versions.

We saw how we can emulate a traditional directory structure on S3 by using Prefixes and Delimiters.

We learned about tagging an object and how, in combination with Prefixes, we can batch process objects in a way that is otherwise simply not possible.

We also talked about versions and the advantages of versioning as well as its pitfall.

In my next article named “Life Of an Object on AWS” I’ll talk about Lifecycle management on S3 and how that can save you thousands of dollars.


I love AWS. I’m an AWS Certified Solutions Architect. If you have questions or comments, I’d love to hear them

Future Vision

A publication centered around high quality storytelling

Bivás Biswas

Written by

Harvard Business School. Former VP Engineering for AR startup. Former Computer Scientist for US Defense projects. Hobby - photography! PicturePerfectEgg.com

Future Vision

A publication centered around high quality storytelling