System Design — Storage

Storage concepts and considerations in System Design

Larry | Peng Yang
Apr 15, 2020 · 14 min read
Photo by Joshua Sortino on Unsplash

Table of Contents

  1. File Storage, Block Storage, and Object Storage
  2. Hadoop Distributed File System (HDFS)
  3. Storage comparisons
  4. Choose the right datastore
  5. Storage options in the Cloud

1. Disk — RAID and Volume

1.1 RAID

The standard RAID levels comprise a basic set of RAID configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs) or SSDs (Solid State Drives). A RAID system consists of two or more drives working in parallel. The following figure shows the main 5 RAID levels.

RAID
  • RAID 0 — striping. data are split up into blocks that get written across all the drives in the array.
  • RAID 1 — mirroring. at least two drives that contain the exact same data. If a drive fails, the others will still work.
  • RAID 10 — combining mirroring and striping. It consists of a minimum of four drives and combines the advantages of RAID 0 and RAID 1 in one single system. It provides security by mirroring all data on secondary drives while using striping across each set of drives to speed up data transfers. This means that RAID 10 can provide the speed of RAID 0 with the redundancy of RAID 1.
  • RAID 5 — striping with parity. requires the use of at least 3 drives, striping the data across multiple drives like RAID 0, but also has a parity distributed across the drives. In the event of a single drive failure, data is pieced together using the parity information stored on the other drives.
  • RAID 6 — striping with double parity. RAID 6 is like RAID 5, but the parity data are written to two drives. That means it requires at least 4 drives and can withstand 2 drives dying simultaneously.

The following table is the comparison for different types of RAID.

RAID comparison

1.2 Volume

Types of Volumes

Static Volume: A Static Volume is a simple and easy-to-use volume that covers all available space on the disks and RAID array selected to create the volume. A static volume does not have a storage pool and therefore can not support advanced storage features such as snapshot and Qtier.

Thin Volume: It must be created inside a Storage Pool and allocates space in the storage pool as data is written into the volume. Only the size of the data in the volume is used up from the pool space, and free space in the volume does not take up any pool space.

Thick Volume (Flexible): It allocates the total size of the volume upon creation. No matter how much data is actually stored in the volume, the total size of the volume will always be used up in the pool. On the other hand, this space is guaranteed to be available exclusively for this volume, even if other volumes used up all remaining pool free space.

2. File Storage, Block Storage, and Object Storage

Understanding different types of storage is essential to choose the right solution for your business. The main types of storage that are used widely nowadays are File Storage, Block Storage, and Object Storage.

Different types of storage

2.1 File Storage

File Storage is the oldest and most widely used data storage system for direct (DAS) and NAS systems.

File-based storage systems must scale out by adding more systems, rather than scale up by adding more capacity.

Summary: File storage is used for unstructured data and is commonly deployed in Network Attached Storage (NAS) systems. It uses Network File System (NFS) for Linux, and Common Internet File System (CIFS) or Server Message Block (SMB) protocols for Windows.

2.2 Block Storage

Block storage is often configured to decouple the data from the user’s environment and spread it across multiple environments that can better serve the data. And then, when data is requested, the underlying storage software reassembles the blocks of data from these environments and presents them back to the user.

Block Storage is usually deployed in a storage-area network (SAN) environment and must be tied to a functioning server.

The most common examples of Block Storage are SAN, iSCSI, and local disks.

Block storage is the most commonly used storage type for most applications. It can be either locally or network-attached and are typically formatted with a file system like FAT32, NTFS, EXT3, and EXT4.

Summary: Data is stored in blocks of uniform size, it is ideal for data that needs to be accessed and modified frequently as it provides low-latency. However, it is expensive, complex, and less scalable compared with File Storage. It also has limited capability to handle metadata, which means it needs to be dealt with at the application or database level — adding another thing for a developer or systems administrator to worry about.

2.3 Object Storage

Object storage volumes work as modular units: each is a self-contained repository that owns:
1. the data: images, videos, websites backups
2. a unique identifier (UID) that allows the object to be found over a distributed system
3. the metadata that describes the data: authors of the file, permissions set on the files, date on which it was created. The metadata is entirely customizable

To retrieve the data, the storage operating system uses the metadata and identifiers, which distributes the load better and lets administrators apply policies that perform more robust searches.

Object storage requires a simple HTTP API which is used by most clients in all languages. Object storage is cost-efficient: you only pay for what you use. It can scale easily, making it a great choice for public cloud storage. It’s a storage system well suited for static data, and its agility and flat nature means it can scale to extremely large quantities of data. The objects have enough information for an application to find the data quickly and are good at storing unstructured data.

Object storage uses erasure coding for data protection. Erasure encoding is a type of algorithm that operates at the object level, spreading data and parity across nodes in a storage cluster. It provides a similar or better level of data redundancy with far less overhead than the HDFS (we will cover it later) three-way replication standard.

Summary: Data is stored as objects with unique metadata and identifiers. Although, in general, this type of storage is less expensive, but the objects can’t be modified — you have to write the object completely at once. Object storage also doesn’t work well with traditional databases, because writing objects is a slow process and writing an app to use an object storage API isn’t as simple as using file storage.

3. Hadoop Distributed File System (HDFS)

HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance (HDFS requires Block Storage). The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time.

HDFS

4. Storage Comparisons

4.1 SAN vs. NAS

These devices are accessible over a network using an ethernet connection and file protocols like NFS (Network File System) or SMB/CIFS (Server Message Block/Common Internet File System). Often, they contain enterprise-grade NAS drives, hard drives built to withstand operating all-day, every-day, and provide better overall performance relative to their desktop counterparts.

Some small businesses and most enterprise-grade NAS devices ship with RAID support. Typically, the more high-end the NAS system, the more RAID configuration options are available.

A SAN is a block-based storage, leveraging a high-speed architecture that connects servers to their logical disk units (LUNs). A LUN is a range of blocks provisioned from a pool of shared storage and presented to the server as a logical disk.

Both SAN and NAS are methods of managing storage centrally and sharing that storage with multiple hosts (servers). However, NAS is Ethernet-based, while SAN can use Ethernet and Fibre Channel. In addition, while SAN focuses on high performance and low latency, NAS focuses on ease of use, manageability, scalability, and lower total cost of ownership (TCO). Unlike SAN, NAS storage controllers partition the storage and then own the file system. Effectively this makes a NAS server look like a Windows or UNIX/Linux server to the server consuming the storage.

4.2 NAS vs. HDFS

  • HDFS distributes blocks across all the machines in a Hadoop cluster. While NAS, data stores on dedicated hardware.
  • Hadoop HDFS is designed to work with MapReduce Framework. In MapReduce Framework computation move to the data instead of Data to computation. NAS is not suitable for MapReduce, as it stores data separately from the computations.
  • Hadoop HDFS runs on the cluster commodity hardware which is cost-effective. While a NAS is a high-end storage device that includes a high cost.

4.3 Block Storage vs. Object Storage

5. Choose the right datastore

The general idea is to store metadata in a relational database or Distributed Key-Value store like Dynamo (key-value) or Cassandra (wide-column). Since NoSQL data stores do not support ACID properties in favor of scalability and performance, we need to incorporate the support for ACID properties programmatically in the logic of our services if we choose NoSQL.

To store other contents such as photos, videos, texts, binaries, and messages, we have to choose the right storage based on our requirements.

Photo-sharing services like Instagram, also apply to Twitter

  • Store photos in a distributed file storage like HDFS or S3 (object storage).
  • Store data about users, their uploaded photos, and people they follow in RDBMS, but it is difficult to scale. So we may also do below:
    1. Store the schema in a distributed key-value store to enjoy the benefits offered by NoSQL. All the metadata related to photos can go to a table where the Key would be the PhotoID and the Value would be an object containing PhotoLocation, UserLocation, CreationTimestamp, etc.
    2. To store relationships between users and photos and the list of people a user follows, we can use a wide-column datastore like Cassandra. For the UserPhoto table, the Key would be UserID and the Value would be the list of PhotoIDs the user owns, stored in different columns. We will have a similar scheme for the UserFollow table.

URL shortening service like TinyURL

  • Since we anticipate storing billions of rows, and we don’t need to use relationships between objects — a NoSQL store like DynamoDB (key-value), Cassandra (wide-column) or Riak (key-value) is a better choice.

File hosting service like Dropbox, Google Drive, Onedrive

  • The metadata database can be a relational database such as MySQL, or a NoSQL database service such as DynamoDB.
  • To store files, we can use Block storage in which files can be stored in small parts or chunks (say 4MB).
  • Object Storage is used by Dropbox to store files.

Instant messaging service like Facebook Messenger

  • To store messages, we need to have a database that can support a very high rate of small updates and also fetch a range of records quickly. We cannot use RDBMS like MySQL or NoSQL like MongoDB because we cannot afford to read/write a row from the database every time a user receives/sends a message. Our requirements can be easily met with a wide-column database solution like HBase. We can store multiple values against one key into multiple columns.

Video sharing services like Youtube

  • Video metadata and user data: RDBMS
  • Thumbnails: Bigtable, as it combines multiple files into one block to store on the disk and is very efficient in reading a small amount of data.
  • Videos can be stored in a distributed file storage system like HDFS or GlusterFS.
  • Spotify uses object storage to store songs.

Real-time suggestion (auto-complete system) service

  • Use the Trie data structure. The storage can be an in-memory cache (Redis or Memcached), a database, or even a file.
  • Take a snapshot of the trie periodically and store it in a file. This will enable us to rebuild a trie if the server goes down.

Web Crawler

  • Use RDBMS to store the meta-data associated with the pages.
  • Store URLs on a disk for frontier.

Google Analytics (GA) like system

  • Apache Kafka is used for building real-time streaming data pipelines that reliably get data between many independent systems or applications, it allows:
    1. Publishing and subscribing to streams of records;
    2. Storing streams of records in a fault-tolerant, durable way
  • The ingested data is read directly from Kafka by Apache Spark for stream processing and creates Timeseries Ignite RDD (Resilient Distributed Datasets). Apache Ignite is a distributed memory-centric database and caching platform that is used by Apache Spark users to achieve true in-memory performance.
  • Use Apache Cassandra (Column NoSQL based on BigTable) as storage for persistence writes from Ignite. It has great write and read performance.

6. Storage options in the Cloud

File Storage

Block Storage

Object Storage

  • Azure Blob Storage: For users with large amounts of unstructured data to store in the cloud, Blob storage offers a cost-effective and scalable solution. Every blob is organized into a container with up to a 500 TB storage account capacity limit.
  • Google Cloud Storage buckets: Affordable object storage.

References

Other Topics for System Design

Computer Science Fundamentals

Various computer science topics every software engineer should know

Larry | Peng Yang

Written by

Software Engineer in Tokyo. Aim to understand computer science very well. LinkedIn: https://www.linkedin.com/in/peng-larry-yang-9a794561/

Computer Science Fundamentals

Computer science fundamentals including system design, software development, web, security, database, OS, networking, etc

Larry | Peng Yang

Written by

Software Engineer in Tokyo. Aim to understand computer science very well. LinkedIn: https://www.linkedin.com/in/peng-larry-yang-9a794561/

Computer Science Fundamentals

Computer science fundamentals including system design, software development, web, security, database, OS, networking, etc

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store