Shared Storage Part 4 — Object Stores

Craig Yamato
FermiHDI
Published in
3 min readMar 31, 2023

A rose by any other name… The general impression most people have about Object Stores is that they are inherently different from and have nothing to do with File Storage Systems. Ironically, I often hear people stating that the main differences are that they store objects that can be anything and can hold exponentially more data than a file system. While Object Stores are fundamentally different than File Stores, it’s not for those reasons.

First, let’s talk about size, which is easy. Think of Google Drive or DropBox for an idea of how large File Stores can get. While normally, File Stores range from a few Terabytes locally on a computer, some NASs often grow into the multi-Petabyte range. Regarding the idea that object stores can store more types of data than a File store, both can store anything in reality. In fact, the not-so-secret dirty secret of Object Sorts is that they actually use the local File System on each server in the cluster.

In a past post, I talked about how file constructs are just a way of organizing data at rest; For example, the ubiquitous comma-separated value (CSV) file format. CSV files are used to store records, like temperature readings or when employees clock in and out. The value for each field in a record is stored separated by commas. Each record is a row ending in a new line character. With this in mind, what is the difference between storing a CSV as a “File” or an “Object”? Or even as a file stored as an object.

So how are File Stores different from Object Stores? File stores are mostly confined to a single server, such as with a NAS, even when accessed across the network. However, NAS systems can grow into the Petabyte range with lots of redundancy. But that redundancy is fragmented with the compute (The actual NAS server software) separated from the storage.

For computing, it is usually just a basic HA config with dynamic access to the storage like any other service running in something like K8s. For storage, this is handled either at the file level with replication (think rsync) to a different drive array or even on another server or at the block level with something like RAID or SnapMirroring. My post on Files Stores is here if you have not read it already.

Object Stores are all about an abstraction layer at the network level. Like a Network File Server, Object Stores have a piece of software to front the connection with the client acting like a gateway. Where the Network File Server acts as a gateway to a “Local” to the NAS file system, the Object Store acts as a gateway to other servers. As part of this function, the Object Store gateway breaks the file up into a fixed number of shards. An encoding process, Forward Error Correction, is run on each shard along with a parity, just like in RAID. This process allows the File (sorry, Object) to be rebuilt from even a small number of shards giving it a high level of resiliency.

This Eraser Coding with shards distributed across multiple servers is the defining difference between Object Stores and File Stores. Ironically each shard is actually a file stored in each server’s “Local” File Store, with the Object Store keeping a list of them and where it stored them. With today’s stack of abstractions, in many cases, you will find an Object Store running storage nodes in a cloud (public or private), each node attached to a virtual block device backed by NFS File Store, which in turn is backed by a Storage Area Network or SAN running some sort of RAID/JBOD type of array.

So the next time you hear someone talking about how it’s a choice between Object Stores versus File Stores or how one is better than the other, know that it’s really a question about how many layers of abstraction are applied and at what cost and benefit. If you have any questions or comments, please feel free to email me at any time.

--

--