The Comprehensive AWS S3 Study Guide

Michael Weeks
Absolute Zero
7 min readOct 23, 2019

--

AWS Simple Storage Service (S3) is Amazon’s object storage service that deploys at the click of a mouse. In fact, it’s actually the same exact storage that Amazon uses to power their own ecommerce machine. Essentially, the idea here is that you can store and retrieve any amount of data, from anywhere on the web, at any time through the AWS Console.

I’ve put together this guide as a comprehensive review for the CSAA certification through Amazon. The professional certification will probably span more content and technicalities. I’ve compiled information from various courses, blogs, and videos here as a single comprehensive review. I might’ve missed a few smaller pieces but this will help you understand the bigger picture of Amazon S3.

As with most IT these days, it’s going to be highly available, durable, and highly scalable as well as easy to use. If you think about it, back in the day you actually had to go and provision storage by buying all the storage hardware (like a NAS, SAN, etc.) and setting it up. You had the headache of buying, maintaining, securing, updating, and running all of this complex machinery plus you were limited to that machinery’s capabilities.

Nowadays, at the click of a mouse, you can provision storage of any type, quantity, location, and more. This technological emergence has fueled the explosion of technology and drastically reduced the time it takes to support and deploy things like software as a service (SaaS).

Think of S3 as a safe place to store all of your files (object storage). Now AWS also has block storage available through a solution called Elastic Block Store. So what’s the difference between block storage and object storage?

Block Storage vs. Object Storage vs. File Storage

So what are the differences between these 3? File storage is just data organized and represented as a hierarchy of filed in folders. Block storage is ‘blocks’ of data in organized, evenly sized volumes (think of Lego blocks). Object storage manages data and links that data to associated metadata (data about data).

So what is the AWS solution for each type of storage?

  • Block = Elastic Block Store (EBS)
  • Object = Simple Storage Service (S3)
  • File = Elastic File System (EFS)

You can find a great, simple comparison of these 3 services and when to use them here.

Object storage is the clear winner for storing unstructured data here. You have unlimited scaling with search features. As data expands and becomes more prevalent (like the universe), object storage will only get bigger.

S3 Overview

  • An object is a file and potentially any metadata (data about data describing the file)
  • Buckets store objects, objects have keys which are unique identifiers
  • S3 isn’t suitable to install an operating system or database on because it’s object-based (you would want to use block storage for these)
Image from CloudAcademy

Get Buckets

  • S3 utilizes ‘buckets’ to store data. You can have up to 100 buckets per account and you can get that number upped to 1,000 if you submit a service limit increase.
  • The total data (volume) you can store as well as objects per bucket is unlimited
  • Individual object sizes are limit from 0 bytes to 5 terabytes
  • However, the largest object size that can be uploaded in a single PUT operation is 5 gigabytes
  • Any objects larger than 100 megabytes should be uploaded using the Multipart Upload capacity
  • Bucket names must be unique, they’re universal (think of them as website names)
  • After creating a bucket name, it cannot be changed
  • Regions cannot be changed either after creation of a bucket
  • Buckets support static websites (configure for website hosting)

Data Consistency (How it works)

  • Read after write consistency for PUTS of new objects
  • i.e. — If you upload a file to S3 you can read it right away
  • Eventual consistency for overwrite PUTS and DELETES
  • This means these can take some time to reflect an update if you’re modifying or deleting something in S3
  • Eventual consistency for read after write GET/HEAD requests
  • Successful uploads will generate a HTTP 200 code

Storage Classes — What storage solution do you need?

Here it’s better to let the charts do the talking but all S3 storage is highly durability rated at 99.99999999999% (11 9’s). Storage design really depends on your needs.

AWS Storage Classes

You’re charged for S3 by the storage you use, requests made, storage management, data transfers, transfer acceleration, and even cross-region replication. You can find more info on Amazon’s S3 Storage pricing page.

S3 — RRS

S3 RRS (reduced redundancy storage) is S3 One Zone but RRS has been deprecated and they’re pushing S3 One Zone now. RRS is still active but Amazon doesn’t actively promote it. RRS basically stores data at a lower level of redundancy than standard S3 and doesn’t replicate objects as many times as standard S3.

Versioning

Versioning is exactly what it sounds like. It’s having multiple versions of an object in the same bucket. This feature helps you preserve objects, allows you to retrieve different versions of objects, and restore any version of an object over time. This feature provides cover for user errors and allows you to easily recover from something like an application failure.

To work, versioning must be enabled on the bucket (it’s disabled by default). AWS recommends using versioning as a best practice to recover items that’ve been overwritten or deleted and need to be recovered.

When you PUT an object into a bucket with versioning, the old version is not overwritten. When you DELETE an object from a bucket with versioning, all versions remain and a delete marker is added.

Cross-Region Replication

Cross-Region Replication automatically and asynchronously copies objects across buckets in different AWS regions. Why is this feature important? It minimizes latency, enables compliance with more strict requirements, and it increases operational efficiency overall.

For Cross-Region Replication to work, you must have Versioning enabled (both source and destination buckets). In addition, the buckets must be in different AWS regions. If you’re having trouble using CRR, check that the owner of the source bucket has READ and READ_ACP permissions with the object ACL (access control list). S3 has to have permissions to replicate objects from one bucket to another.

Here’s where things get tricky, if you’re the owner of the source bucket but not the object inside of the bucket, the object owner has to grant you READ and READ_ACP permissions with the object ACL.

It’s important to know cross region replication:

  • Will not replicate deletes
  • Will not retroactively upload files to a new CRR bucket
  • Current and previous versions apply with CRR

Transfer Acceleration

Transfer Acceleration enables very fast, secure transfers of files over long distances (S3 bucket to client) by utilizing CloudFront’s edge locations. Remember, edge locations are just endpoints for AWS used for caching content (part of the CloudFront content delivery network).

When data arrives at the edge location, it’s routed to S3 utilizing an optimized network path. The benefit of using this service is that it quickly allows you to send data to edge locations which, in turn, speed user upload times.

AWS Transfer Acceleration

CloudFront

CloudFront is a content-delivery network (CDN). It’s basically a distributed network of servers that delivers webpages and other content to users based on their geographic locations, the origin of the webpages, and content delivery server.

Some important tips for the exam include:

  • You can clear cached objects but you’ll be charged
  • Edge locations allow for read and write access
  • RTMP is used for media streaming or adobe flash media
  • Objects are cached for the life of the TTL (time to live) which is always in seconds
  • A web distribution is typically used for websites
  • An origin is the origin of all the files the CDN will distribute (like the root). This could be an S3 bucket, EC2 instance, ELB, Route53, MediaStore Container, MediaStore Channel.
Image from AWS CloudFront documentation

--

--