Amazon S3 Fundamentals — AWS Solutions Architect Associate Course
Chapter 7: Amazon S3 Fundamentals for the AWS Solutions Architect Associate Certification
Amazon Simple Storage Service (Amazon S3) is an Object Storage Service that offers industry-leading scalability, data availability, security, and performance. Let’s dive into it!
- S3 Intro
- Object Storage Classes
- Security & Policies
- Versioning
- Encryption
- S3 Websites
- CORS
- S3 Lifecycle Rules
- S3 Event Notification
- S3 Access Logs
- S3 Access Points
- S3 MFA-Delete
- S3 Requester Pays
- Athena
Remember that all the chapters from the course can be found in the following link:
Amazon S3 Introduction
In S3, we store objects in buckets, and each object can have a maximum of 5TB. We can understand objects are files in a regular file system, and buckets are directories. Buckets have a globally unique name and are defined at the region level.
The main characteristics of the objects are:
- Key → The name that you assign to an object. You use the object key to retrieve the object.
- Value → The content that you are storing.
- Metadata → A set of name-value pairs with which you can store information regarding the object.
- Access control information → Control Access to the objects. You can make some users not being able to access it.
- Version ID → Within a bucket, a key and version ID uniquely identify an object. It is generated by S3 when you add an object to a bucket.
OBJECT STORAGE CLASSES
Amazon S3 offers a range of storage classes designed for different use cases. We must know all of them for the exam, as this is a common question. This relates to how objects are stored, so it is at the object level, not the bucket level. Types of classes:
1. General-Purpose Storage:
- S3 Standard General purpose → It offers high durability, availability, and performance object storage for frequently accessed data.
- S3 Standard-Infrequent Access (IA) → It is used for data that is accessed less frequently but requires rapid access when needed. Storing these objects is cheaper.
- S3 One Zone-Infrequent Access (IA) → Data is accessed less frequently but requires rapid access when needed without replicating the data in at least three AZs. Ideal for customers who want a lower-cost option for infrequently accessed data but do not require so much availability as the previous options.
- S3 Intelligent Tiering → Automatically moves objects from storage classes so that the user pays less money. It delivers automatic cost savings by moving objects between four access tiers when access patterns change.
2. Glacier: Low cost, Amazon S3 Glacier is a secure cloud storage service for data archiving and long-term backup. The main difference with S3 General Purpose Storage is that if you want to restore files, it will take some time (unless the new .
- S3 Glacier Instant Retrieval → It is designed for long-lived data that is accessed once per quarter and requires immediate access.
- S3 Glacier Flexible Retrieval → A file must be here for at least 90 days. You should use it if your data is accessed 1–2 times per year and is retrieved asynchronously.
- S3 Glacier Deep Archive → A file must be here for at least 180 days. It is used for long-lived archive data that is accessed less than once per year and is retrieved asynchronously.
Which class is the best? This question will be answered depending on the files you want to store and whether or not you are interested in accessing them at the moment. There is no concrete answer, and one or the other will be better for each situation. This is a typical exam question; choose the class that best suits your needs.
SECURITY & POLICIES
As we saw in the introduction, a policy is an AWS object that defines its permissions when associated with an identity or resource. We have these types:
1. User-based → The ones that we already know. They are attached to an IAM user, group, or role. If this policy doesn’t allow it, the user might not be able to see the bucket or object.
2. Resource-based → JSON Bucket policies. They are applied both to buckets and objects in S3. We can use the Policy Generator to create them. We have the following attributes:
- Effect → Allow/Deny
- Principal → Who can do an action over the bucket/object.
- Action → What the user can do over the bucket/object.
- Resource → Object/bucket affected.
In this example, everybody can get an object in the bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicRead",
"Principal": "*",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:GetObjectVersion"],
"Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*",
}
]
}
By default, S3 buckets block all public access to prevent company data leaks, but you can change that if necessary for your application.
VERSIONING
You can activate versioning at the bucket level for more security. Uploading an object with the same key will not overwrite but create a new version. It is good practice to implement. If we want to see the versions of a bucket, we need to click “List Versions”; otherwise, we would only see the last one.
Some considerations:
- If you stop the versioning of a bucket, the previous versions are kept.
- If you do versioning when there are already objects, the old objects will have a “null version”.
- If we delete an object when versioning is enabled, it will not be deleted, but a Delete Marker will be created that will indicate to AWS not to show it so that we can restore it if necessary. If we want to delete the object, we must delete the Delete Marker.
ENCRYPTION
Data protection refers to protecting data in transit (as it travels to and from Amazon S3) and at rest (while it is stored on disks in Amazon S3 data centers). We have the following ways to apply encryption:
1. Server-side encryption → Server-side encryption is the data encryption at its destination by the application or service that receives it. Therefore, you send the object without encryption, and the server will encrypt it. Types:
- SSE-S3 (Server Side Encryption) → Encrypt S3 objects using keys managed by AWS. To request server-side encryption using the object creation REST APIs, provide the “x-amz-server-side-encryption” request header.
- SSE-KMS (Key Management Service) → AWS Key Management Service (AWS KMS) is a service that combines secure, highly available hardware and software to provide a key management system scaled for the cloud. You can use AWS KMS to encrypt your Amazon S3 objects. The only difference with SSE-S3 is that AWS KMS manages the encryption key.
- SSE-C → In this case, the customer provides the encryption keys. The encryption key you provide is part of your request, so AWS doesn’t store any key. HTTPS is mandatory. We can only use it with the CLI. We can find more information at the following link.
- DSSE-KMS (NEW) → Dual-layer server-side encryption with AWS KMS applies two layers of encryption to objects when they are uploaded to Amazon S3.
2. Client-side encryption → Client-side encryption is the act of encrypting data before sending it, in this case, to S3. One way to do it is with the AWS Encryption SDK. The client also has to decrypt it.
S3 WEBSITES
We can host static websites in S3 and make them accessible online. We can get 403 Access Denied Errors when accessing because the bucket policy does not allow public reads; we only need to modify it by following these steps:
- Upload the static files to an S3 bucket.
- Activate “Static Website Hosting” in the bucket properties.
- Uncheck “Block all public Access”.
- Write a public access policy; it’s not enough with the previous step.
- It will generate an endpoint that you can access to see your website.
You can read more about this process at the following link.
CROSS-ORIGIN RESOURCE SHARING (CORS)
If you access a website, you should not get additional data from third-party servers, as this can be malicious. But there can be exceptions if both website owners agree to cooperate. Cross-Origin Resource Sharing (CORS) regulates this cooperation. CORS is an HTTP-header-based mechanism that allows a server to indicate any origins (domain, scheme, or port) other than its own, from which a browser should permit the loading of resources.
Let’s see an example of how it works with this example from https://lenguajejs.com. Let’s call “domain.com” as Domain A and “otherdomain.com” as Domain B.
- First example → Domain A tries to make an AJAX request to itself. As it’s the Same Origin, it will work.
- Second Example → Domain A wants to request Domain B. If we don’t have CORS enabled in Domain B, it will fail
- Third Example → Domain A wants to request Domain B, which has CORS enabled. In this case, it will work.
The same thing happens with S3 buckets. If a client makes a cross-origin request to our S3 bucket, we need to enable the correct CORS headers in the S3 bucket. To specify all the origins, you can use the symbol ‘*’.
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"PUT",
"POST",
"DELETE"
],
"AllowedOrigins": [
"http://www.example.com"
],
"ExposeHeaders": []
}
]
S3 LIFECYCLE RULES
You can move objects between Storage classes. This can be done manually or automatically with lifecycle rules. There are two types of actions:
- Transition actions → They define when objects move from one storage class to another. For example, move an object from S3 Standard to S3 Glacier after 90 days.
- Expiration actions → You set the objects to be deleted after a certain period.
S3 EVENT NOTIFICATIONS
You can use the Amazon S3 Event Notifications feature to receive notifications when certain events happen in your S3 bucket (object created, deleted, replicated, etc.). This can trigger other services (Lambda, SNS, and SQS). For example, we could create a process that processes the file when an object is created in a bucket using a Lambda function (which we will study later) and creates a thumbnail.
S3 ACCESS LOGS
It provides detailed records of the requests that are made to a bucket. So, by enabling these logs, we can save all the requests to a bucket in another bucket. Never make the bucket where you store the logs the same as the app's bucket, as it will create an infinite loop. If a user puts something in the bucket, it will be logged in the same bucket, create another log, and so on ad infinitum.
S3 ACCESS POINTS
With S3 Access Points, customers can create unique access control policies for each access point to easily control access to shared dataset. Each access point will have its own security, then our users can access our access points and connect to the part of the bucket they have access to. Each access point has:
- Its DNS name
- Access Point Policy
For example, you can create an access point for your S3 bucket that grants “Marketing users” access to the “Marketing” folder of this bucket.
S3 MFA-DELETE
You can add another layer of security by configuring a bucket to enable MFA (multi-factor authentication) deletion. When you do this, the bucket owner must include two forms of authentication in any request to delete a version or change the versioning state of the bucket.
You cannot enable MFA Delete using the AWS Management Console. You must use the AWS Command Line Interface (AWS CLI) or the API.
S3 REQUESTER PAYS (NEW)
With S3 Requester Pays enabled in a bucket, the requester instead of the bucket owner pays the cost of the request and the data download from the bucket. This can be useful when you want to share data but not incur charges associated with others accessing the data.
ATHENA
Amazon Athena is an interactive query service that easily analyzes data in Amazon S3 using standard SQL. Athena is Serverless. It allows you to perform SQL queries directly against S3 files. It’s used in Business Intelligence, Business Analytics, or reporting.
Thanks for Reading!
And that’s it for the S3 chapter. This is perhaps one of the most extended chapters of the course. If you like my work and want to support me…
- The BEST way is to follow me on Medium here.
- Feel free to clap if this post is helpful for you! :)