Shared Storage Part 2 — Block Storage

Craig Yamato
FermiHDI
Published in
3 min readMar 31, 2023

As FermiHDI works with Hyperscale Data, the topic of block storage comes up a lot. While the concept of block storage is simple, its use can be complex and varied. In this short post, I hope to give you an idea about what Block Storage is and how it is commonly used. Perhaps in a future post, we can dive into the nuances of how Block Storage is implemented on different mediums, but that’s a bit deep for this quick post.

We first need to talk about shared storage and why it exists, which can easily be summed up. Most files (no matter the name you give them) are only a few hundred kilobytes to a few megabytes, with even the largest being only a few Gigabytes. On the other hand, storage devices often start in the several Gigabyte range and extend into Terabytes. This means that even a small storage device has the capacity to store many files. More importantly, it means you would not need a storage device for each of the thousands of files you need just to turn on your computer.

Now here is why that is important to know. Files change their size, are added, and removed. Not just user files like music, video, or work files but the application and operating ones. Think of your last OS update and the number of file changes it included. In such a dynamic environment, how do you keep track of where all those files are and keep the system efficient?

At first glance, you might think to keep an index of every byte in the drive mapped to which file it belongs and which byte in the file it is. But such an index would be at least three times as large as the drive itself:

Byte 1: File ID,
Byte 2: Byte In File,
Byte 3: Byte On Drive Used,

There are actually two common ways we handle this. The first is to leave enough room between files and hope no file grows larger than that space. In fact, several programs like databases do this by artificially making files larger than they really are to try and keep all the bytes together. This could work until we ran out of space on the storage device. At that point, while some files may have been removed, any new files would have to fit perfectly in the space left by the deleted file adding an additional complexity.

The other way is to divide the storage device into equal-sized blocks artificially. For example, dividing an 8 GB flash into 15,625,000 512 Byte blocks. Then if we think about a file as just a bunch of bytes, we can break the file into the same-sized blocks. Now any file block can be stored in any storage block on the device. This means we now only need to track 1/512 as many addresses in our Index.

Equally as important, the blocks do not have to be sequential; any open block at any location in the drive can be used to store an equally sized or smaller part of a file. The drawback to this method is still the storage capacity used to store the index and the resulting use of much slower random access to read and write blocks of a file stored all over the drive. The latter is why we often see storage devices marketed in sequential Read and Write Megabytes per second but provisioned by IOPS (Input/Output Operations Per Second). It’s also part of the underlying speed principles of storage solutions like RAID.

In the next few posts, I hope to cover how this concept is applied to file systems, object stores, and operating systems.

--

--