Storage for a database includes the hardware, disk, and file layout. Designing storage for a database project can be exhaustive job as a lot of factors need to be taken into account.
The most important aspects to be considered before desigining storage for a database are -
•Volume of data — Initial volume of data, and the rate of growth of data
• Data Retention Policy — Amount of historic data that needs to be maintained
•Number of Concurrent Users — How many users will try to read/write data at the same time.
•Workload — Refers to the number of user requests and the ability of a database system to handle these requests.
In order to measure the above aspects, we use the following metrics -
While designing storage for a database, the following aspects need to be considered –
- Latency — is the time taken by a database to respond to a single Read/Write request. It is also called Response Time.
2. Throughput — is the volume of data that the database is able to retrieve(read)/write in a certain time period. This is typically measured in Megabytes or Gigabytes per second.
3. IOPS — Input Output Operations per second — This is the number of I/O transactions that the database is able to handle per second.
4. Concurrency — The ability of a database system to handle multiple requests from multiple users at the same time.
Let’s see the considerations in each of the elements in Schematic 1.0.
We start with Hardware Server (See Schematic 1.0)
Hardware Server Configurations
There are 3 types of hardware server configurations. Let’s see each of them in detail -
- Direct Access Storage:
The storage devices are connected directly to the server via cables and controllers. 2 servers can also share the same storage device (Shared DAS). Servers connect to the Storage device via cables, host bus adapters (HBA) using SCSI, Fiber Channel, or SATA. Internal disks within the same server enclosure can also be considered DAS storage. e.g., old Pentium Processors, USB Drives.
- Network Attached Storage (NAS)
The storage devices are connected via a standard ethernet connection. These are dedicated file storage systems. e.g. Shared Drives in Organizations that let users copy and paste files.
- Storage Area Network (SAN)
The storage devices are connected to the server via Internet Protocol or via fiber cables and are regulated by switches (SAN Switches). Two or more storage devices can be accumulated together and lumped as a storage block/storage pool. This block/pool can be connected to one or more servers. Unlike the DAS and NAS storage, the SAN provides multiple paths (more than 2) to and from the storage device to account of failures.
High-level Network Diagram of a SAN:
Understanding cost implications
The hardware server configurations were compared and their cost implications analyzed. Schematic 1.2 shows the cost implications associated with hardware server configurations.
I suggest the Microsoft resource https://docs.microsoft.com/en-us/sharepoint/administration/storage-and-sql-server-capacity-planning-and-configuration for further reading.
Let’s move on to disk layouts (See Schematic 1.0)
Hard Disk Drives store data magnetically, optical disk drives store data optically and ‘Solid State Devices’ store data through ICs (Integrated Circuits).
Solid State Devices are the most sought-after storage mechanisms due to the following advantages over HDDs and ODDs –
1.They have the highest throughput and IOPS compared to the other 2 disk drives.
2.They do not require mechanical components to read data unlike HDDs.
3.Read/Write operation is not affected by fragmentation.
4.Low Read/Write latency.
Latest designs involve a mix of HDDs and SDDs to create a tiered database architecture (RAID).
When multiple disks are combined to create a single logical disk (as we had seen in the case of SAN architecture), it is called RAID — Redundant Array of Independent Disks.
There 6 standard ways of configuring RAIDS. They are RAID 0, RAID 1 ..RAID 10. The schematic below explains each type of RAID -
Of the above, RAID 5 and RAID 10 are the recommended disk drive configurations for SQL Server. RAID 5 offers 2/3rd of the total disk capacity as usable space. RAID 10 offers 1/2 of the total disk capacity as usable space but offers better fault tolerance and R/W performance than RAID 5.
Concept of Striping
Striping refers to the act of distributed data across two or more disks at the bit level/byte level or block level.
Storage Allocation Approaches for SAN
•Approach 1 — Take a set of dedicated disks from the storage pool and allocate it for a single server for exclusive use. This ensures that every server can have some amount of space for its workload (i/o operations). This also reduces latency from mixed workloads. The disadvantage is that it defeats the purpose of an SAN which is shared space.
•Approach 2 — Make logical disk partitions in one physical disk location and allocate the logical partitions to separate servers. This also guarantees every server, space for its minimum workloads; and also ensures that when a sudden i/o request comes in, the respective server is able to make use of the shared space.
Let’s move on to File layouts (See Schematic 1.0)
A common file layout suggested by Microsoft for SQL Server is -
•Files are the physical structures that store information on disks.
•A filegroup is a logical construct that groups files based on their categories. e.g., Data files are stored in the primary file group.
The more advanced the layout of the database, the faster its performance. Starting with a well-laid out database design document always helps in building optimal storage for databases.