Amazon Storage: S3, EBS and EFS

Here I am listing characteristics of different types of AWS (Amazon Web Service) storage.

S3: Can be used from anywhere in the world. S3 is an Object Store. It does not have to be formatted before using it . Static pages and front end javascript can be served from S3. PHP, node.js etc. based dynamic pages are not supported. On S3 locking is not available the way we have the concept of locking for files in a file system when the file is in use.

To improve S3 I/O performance it is recommended that we would not use keys which starts with same letters. You see there is a similar consideration in Cassandra - Cassandra automatically tries to handle I/O performance by trying to prevent hot spot. Cassandra computes hash/token on primary key (instead of directly saving data based on primary key and then determines partition based on that hash to break hot-spot) but still manage to keep related rows (due to the clustering key part of the primary key) together. This is a good idea. Because even though in the English language some characters may appear in high frequency; by using good hash and storing the items based on the hash we can prevent hot-spot. When we use hash there is no simple direct mapping of what is going to be stored where. Just because the first letter in the names is the same for different objects — it does not imply that all such objects would be clustered together on disk when they are saved based on the hashes computed from the name of the objects.

EBS: It is a block storage. It cannot be used if not formatted according to the choice of a file system and not mounted to EC2. The reason is — to access file system we need OS as an intermediary and in case of AWS the OS resides on EC2. File system is a layer of abstraction on top of block storage. A file system manages information like where a file is persisted/stored on the block storage. At a particular time only one compute engine instance can use an EBS volume. Please note that an EBS in one availability zone cannot be used from another availability zone unless exposed via file system.

EFS: This type of storage can be accessed from any availability zone and multiple Compute Engine instances at the same time.

Instance store — (several GB to 48 TB) is the store that comes with EC2. It is suitable for caching but it vanishes/is lost if the EC2 instance is stopped for some reason. If we want persistent store beyond the event of virtual server being stopped then we should use EBS or EFS or S3. Instance store is an ephemeral storage.

We can have our own CDN like service (normally used to serve static files from areas near the client) using Cloud front and S3. On a separate note caching server like Varnish/Squid can be used to provide our own CDN.

Dynamo DB versus S3: Both of them are object stores. However S3 is for large objects (an object can be order of terabytes) whereas Dynamo DB is for small objects.

Further Reading:

[1] https://en.wikipedia.org/wiki/Block_suballocation

[2] AWS in Action. Chapter 6

[3] Quora — https://www.quora.com/What-is-the-difference-between-S3-and-Elastic-Block-Store-services-What-are-the-correct-use-cases-for-each-one

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.