AWS S3 in 10 minutes.

Deepak Sharma
tech_vichaar
Published in
5 min readFeb 2, 2020

AWS S3 is a Storage Service in AWS Ecosystem

You need a storage solution which provides nearly unlimited storage capacity with 99.99% availability, high scalability and 99.999999999 (11*9s) durability.
Your Answer: AWS S3

What is S3?

S3 refers to Simple Storage Service. It is service in Amazon Web Services ecosystem used primarily for storage needs. It features simple, secure, highly durable object storage.

S3 enables users to store files and retrieve them any time from any where in the world with minimum latency and high availability. Each File in S3 is called as Object and its path as Key. So, like Python everything is an object here.

S3 is a not a block storage hence can’t be used to deploy Operating Systems or Databases or any other software.

Feature-full S3

S3 allows objects storage and hence each object can range from
0 – 5 TB. Since storage capacity of S3 is unlimited, so any number of 5 TB objects can be stored in it. The largest object that can be uploaded in a single PUT request is of 5 GB. It is always recommended to use Multipart Upload for faster uploads.

S3 have a global namespace policy which means each bucket name should be unique globally irrespective of any availability zones.

S3 Console

Here, Bucket demo.com can’t be created just because it exists somewhere in Globe.

Data Consistency Model

Amazon S3 achieves high availability by replicating data across multiple servers within AWS data centers. It has 2 consistency models

i) Read-after write eventual consistency: New Object is available immediately after a write.

ii) Eventual Consistency for all overwrite and delete object requests

Lets understand this, Transactions on S3 are Atomic. New object will not be available to read or write until its supposed replication to other servers is complete. This is observed in Edit (PUT) and Delete (DELETE) object requests only until then all read requests of a particular object returns old object version only.

Similarly for DELETE requests, if DELETE bucket request is successful and immediately after that we put list buckets command in cli (command line) mode, there is possibility to find Deleted bucket in cli for a while, which is actually a time taken to apply changes across all its replications.

Types of S3

There are different types of Simple Storage offered by AWS and selecting appropriate one will be helpful in large cost savings and long term gains.

  • S3 Standard: General purpose storage for any type of data, typically used for frequently accessed data.
  • S3 Standard — Infrequent Access : For long lived but infrequently accessed data.
  • S3 Intelligent — Tiering: It is an Intelligent Tiering model which marks objects as frequent or infrequent access and correspondingly labels them one of above. For Ex: If an object is not accessed in 30 consecutive days, it will be moved to infrequent access tier.
  • S3 One Zone — Infrequent Access: For re-createable infrequently accessed data that needs millisecond access. Data is available in one availability zone only.
  • S3 Glacier: For long-term backups and archives with retrieval option from 3 hrs to 4 hours.
  • S3 Glacier Deep Archive: For long-term data archiving that is accessed once or twice in a year and can be restored within 12 hours. This is a cheapest among all, but unlike others data retrieval make take up-to 12 Hrs.

Security

By default, all buckets in S3 are Private which means its contents can’t be accessed outsiders. Access Controls to buckets can be set by

i) Bucket Policies: Applied at Bucket level. Policies if required, can be generated at AWS Policy Generator Tool and generated Json can be pasted/ applied into S3 Bucket Policy Section.

ii) Access Control Lists: Applied at Object level for controlling object level permissions. For ex. A public bucket can have Private object to using this feature in conjunction with Bucket Policies.

All requests made to S3 buckets can be logged and exported to different formats.

Bucket ACL and Policy Section

Now Suppose we wish to make a file in Private Bucket Publicly accessible keeping others as Private. We can do so by modifying Bucket Policies to allows Public Access

Turn OFF default Public Access Blocking at Bucket Settings

After then head back to object which intents to be publicly accessible and update its Access Control List to public.

Set Object Level Permissions

After this, Our object `bucket.png` is publicly accessible keeping others as they was.

Encryption

For Data Protection, S3 Offers multiple types of data encryption which includes

  1. Encryption for data in Transit (HTTPS/TLS)
  2. Encryption for data at Rest:
  • Server Side Encryption: AWS supports two kinds of Encryption methodologies.
    i) S3 Managed Keys : SSE-S3 : Each object is encrypted using unique keys and these keys are further encrypted using Master key, which is rotated regularly.

    ii) AWS Key Management Service: SSE-KMS : Its a Special Service for Key Encryption. An object is encrypted using unique keys in combination of customer managed keys.

    iii) Customer Managed Keys: SSE-C : Objects are encrypted by AWS using customer Managed keys only.
  • Client Side Encryption: User Encrypts files before uploading in S3

So, to implement Server side encryption during uploads, user need to pass
x-amz-server-side-encryption: AES256

x-amz-server-side-encryption: ams:kms

Additionally Bucket Policies can also be updated to enforce encryption on each uploads.

S3 Nits

Transfer Acceleration

Enable Faster Uploads to S3. Data by users is uploaded to Edge location further transferred to routed to S3 via Amazon’s Optimized Network.

CloudFront

CloudFront can be leveraged to deliver static as well as dynamic contents to users via a distributed network of edge locations separated geographically.

Content Delivery Network (CDN) works on concept of Edge Locations where objects get cached up-to defined TTL and are located near to remote users which hence minimizes access latency.

CloudFront can be integrated with On-Prem data centers as Origins to leverage benefits of CDN.
CloudFront supports Two types of content distribution:

a) Web Distribution for Trasfer of static or not static files.

b) RTMP Distribution for Media Streaming

References:
https://aws.amazon.com/s3/faqs/
https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html

--

--

Deepak Sharma
tech_vichaar

Software Engineer @RedHat. Loves R&D, DevOps, and Engineering. Football and Chess are Love. https://finddeepak.com