Durability on Amazon S3

Published in

Own Engineering

4 min readFeb 20, 2023

Deep Dive on Amazon S3

Have you ever wondered how Amazon S3 is so durable and secure that you never need to worry about losing your data? How do they manage to keep the data highly available despite storage device failure at their data centers? Well, the answers to these questions are covered in this post.

An essential requirement of any storage system is to retain your data: storing it without losing it. Storing the data in the cloud means keeping it in someone else’s computer and it is the responsibility of the storage system to protect it from loss. At the core level, they need media devices such as hard drives to store the data. But, hard drives can fail, leading to lost data. Data redundancy or duplication is one solution to prevent this situation. Replicas of the data are stored in different devices; if a hard drive fails we can read the data from one of its replicas. There are chances that all the hard drives that contain the replicas could be prone to failure and we lose our data.

Amazon’s solution for this is “Erasure Coding”. It is a mechanism of storing the data that Amazon developed to achieve higher durability.

Erasure Coding

Let us consider storing an object in S3. As part of Erasure Coding, the object that needs to be stored in S3 is chopped into several chunks called “shards”, let’s say into 7 shards in our scenario. Then, specific algorithms are used on the object to generate additional shards called “Parity Shards”, let’s say 5, such that you can generate the original object by assembling any 7 of the shards. You can make any combination of 7 shards (all primary or all parity shards + 2 primary) to get the original object.

This allows us to control the levels of redundancies, besides replicating the objects 3 or 4 times. AWS S3 performs this operation on the customer’s data objects and stores the shards in different media devices, disk racks, data centers, and availability zones. During GET they reassemble the shards and create an object from them. The benefit of this is that any individual customer’s data is spread across multiple media devices and any individual disk contains a small part of it. Thus, we can still assemble the data even in the cases such as disk failures, making it highly durable.

Durable “Chain of Custody”

Consider the scenario of a user uploading a file into an S3 bucket. It’s a simple PUT operation in S3, but in reality, the data travels a long way from your computer to the Amazon data centers through internet cables and a variety of intermediaries. Even with network-level data integrity support, there is a possibility that some of the bits may flip. There are chances that they may flip on the user’s machine in memory before transmitting or in the network devices. In such cases, we might end up storing the object we didn’t mean to store because, by the time the object reached the Amazon S3, something else may have arrived. So, how do we maintain integrity?

Checksums

To maintain integrity during the data travel in the Amazon network, checksums are used. For every PUT when the object arrives at the front door of the Amazon, a checksum is calculated, if it doesn’t have one, and is bound to the object and travels with it forever. Thus at any given point in time, we can detect if something has flipped and the object has been altered, and whether the durability has been weakened.

This solves the problem partially, i.e from the front door of Amazon to the actual storage node. What about the alterations that can happen before reaching the Amazon front door, i.e network devices?

Well, this is where the customer comes in. Early this year (2022), AWS delivered several new algorithms. They are much more performant and rely on the HTTP trailer that you send at the end of the PUT request header. Interestingly, AWS SDKs are capable of doing all these wonderful things as needed.

import software.amazon.awssdk.services.s3.model.ChecksumAlgorithm;
import software.amazon.awssdk.services.s3.model.CreateMultipartUploadRequest;

...

CreateMultipartUploadRequest createMultipartUploadRequest = CreateMultipartUploadRequest.builder()
                        .bucket(BUCKET)
                        .key(FILE_NAME)
                        .checksumAlgorithm(ChecksumAlgorithm.SHA256)
                        .build();

You can use any of the following algorithms

CRC32
CRC32C
SHA-1
SHA-256

By using AWS S3 checksums, which is relatively easy and SDKs enable it, the customer also participates in the durable chain of custody and will have a lot of assurance that the object never gets altered.

Durability on Amazon S3

Erasure Coding

Durable “Chain of Custody”

Checksums

Written by Vamshi Krishna Chetpelly