Cloud Computing

As Networks Get Faster, High-Speed Storage Becomes More Important Than Ever

Part 1 of Intel’s Cloud Service Providers Solutions Series

Published in

Intel Tech

7 min readJun 4, 2021

Author(s): Jacek Wysoczynski, Intel Senior Manager of Product Planning and Andrew Ruffin, Intel Strategy & Business Development Manager

The demands on Cloud Service Providers (CSPs) are increasing rapidly. They need to provide more storage and more compute power, all while keeping network performance up and reducing downtime. As CSPs upgrade their networks to 100 Gigabit Ethernet (GbE), the need to protect incoming data as it’s being written to bulk storage is increasingly more important.

Currently, CSPs typically use nonvolatile DIMMs (NVDIMMs) or over-provisioned NAND solid state drives (SSDs) as write or power loss imminent (PLI) buffers. However, both of these hardware choices have limitations.

Intel Optane SSDs allow data center operators to gain the benefits of near-NVDIMM speed with the ease of upgrade and repairs of NAND SSDs, all in one system. And, as network speeds increase, Intel Optane SSDs improve the situation further, by helping to reduce overall operating costs and simplifying data center maintenance.

This blog is part of a two-part series about improving Data Center operations with Intel products. In the first part of this series we will review the advantages and disadvantages data center operators had to deal with in the past and in part 2 of the series we will look at the Intel Optane SSD options that address these challenges.

NVDIMMs are a great technology, combining the speed of Dynamic RAM (DRAM) with the non-volatile memory protection of NAND SSDs. Using NVDIMMs, servers experiencing power loss can have their data saved to the NAND flash on the NVDIMM, protecting that data from loss. Once power is restored to the server, the protected data can then be transferred to bulk storage. With that said, there are questions that should be asked with regards to which technologies should be used in a data center? Here are 5 questions we should ask ourselves when choosing memory and storage solutions for a data center:

1. Is NVDIMM capacity inadequate based on rising throughput?

The amount of data coming through network connections is increasing. 100 GbE network connections are becoming more and more common in data centers, and faster connections could be here soon. As the amount of information coming to a server increases, the current capacity of NVDIMMs will no longer meet write buffer sizing needs. Adding in additional NVDIMM capacity to a server is limited by multiple constraints, including the capacity of each memory module, the number of available slots for NVDIMM in the server, and the cost of the NVDIMM. For example, 32GB of DIMM can handle 2.5 seconds of data streaming through a 100GbE connection.

This data coming from the NVDIMM then needs to be de-staged to the bulk storage. If the server’s bulk storage is not able to keep up with the NVDIMMs de-staging, the system must reject new write operations. The data write requests must be repeated, which can cause network congestion, as more data can still come in while the prior data is still being processed.

Intel Photo of Data Center Network Cabling

2. Do NVDIMMs increase system downtime for upgrades and repairs?

NVDIMM is much more difficult to replace in servers than technology such as SSDs. SSDs are hot-swappable; that is, they can be added or removed from a computer or server without the need to shut down. When attached to servers, SSDs are typically also front-serviceable, where the bay holding the SSD can be accessed while the server is still on the rack. NVDIMMs, on the other hand, are neither hot-swappable, nor front-serviceable. Adding new or replacing old NVDIMMs requires the servers to be shut down and removed from the rack. This increases the time that servers are offline and unavailable for use. This increase in downtime and additional complexity for upgrades and repairs adds to the overall cost of maintaining data centers.

3. Are NVDIMMs less cost effective than SSDs?

In addition to the higher cost of service for NVDIMMs vs. SSDs, NVDIMMs themselves are more expensive than an SSD in terms of price per GB². NVDIMMs combine DIMM RAM with a battery or supercapacitor to protect data in the event of a loss of power, which increases the price further. While comparing the price of DIMM vs. SSD is not perfect, for data centers trying to control costs, using all NVDIMM for their PLI buffers increases the overall price of buying and maintaining servers.

4. What are the limitations associated with NAND SSDs for Cloud Storage?

SSDs have been in use for a long time. Data Center NAND SSDs can act as an ideal write buffer between DRAM and bulk storage. SSDs are typically much cheaper than NVDIMMs, and they are easier to upgrade and replace. Much like NVDIMMs, however, there are drawbacks to using them in data centers.

The latency on NAND SSDs is much higher compared to RAM and NVDIMM. Latency is the delay between when an action is requested and when it’s completed. This means a lower latency is more desirable than a higher one when processing data. RAM, including NVDIMM, has significantly lower latency compared to SSDs. Latency on RAM is typically counted in nanoseconds, while for SSDs it’s counted in microseconds. The latency of the fastest NAND SSDs is around 1000 times greater than that of RAM. When dealing with large amounts of incoming information, such as in a data center, this high latency can prevent increases in network speeds from being properly utilized.

SSDs, like most hardware, have a limited lifespan before repairs and/or replacements are necessary. With SSDs, the lifespan is controlled by the number of writes each data block can handle before data can no longer be safely stored there, estimated in Drive Writes Per Day (DWPD), or the number of times per day the entire SSD can be rewritten over the lifespan of the drive. A NAND SSD might, for example, be rated for three DWPD over the course of a five-year period. In a busy data center, if the data size is bigger than this number, then the SSD would need to be replaced sooner than this five-year lifespan.

RAM and NVDIMM has a much longer lifespan in comparison, as data can be written and rewritten to it without significantly degrading its function. So, while RAM is most often replaced due to a need for more speed or because of hardware failure, SSDs may be switched out because data can no longer be written to them. These SSDs must then be replaced to keep the server operating.

5. How does NAND SSD Data Write Protection and Overprovisioning work?

NAND SSDs are broken up into blocks and pages. Each page is 4KB, and each block typically contains 128 pages. Data is written one page at a time, but once data must be rewritten, it cannot be done on a page that already has data; that page must first be erased. However, individual pages cannot be erased, only entire blocks. Instead, changes made are written to empty pages, and the original pages are marked as invalid. Later, when space is needed, an entire block is rewritten to a new block, minus any pages that are marked as invalid, and the old block is then erased.

This process, known as garbage collection, is one of the reasons why NAND SSDs have higher latency and inconsistent performance versus RAM, as data can be moved multiple times on a SSD to accommodate these writes and rewrites. More data must be moved in order to allow for blocks to be erased and reused. It also requires the SSD to have additional space just to allow these extra write operations. This is commonly referred to as overprovisioning, and all NAND SSDs have about 7% of their total storage dedicated to this1. Users can increase overprovisioning, setting aside more of the SSD for write operations. These changes increase the SSD’s write speeds, as there is more space available to write incoming data. It can also increase the SSD’s lifespan by improving the DWPD, although this requires having more drive space, as more is set aside for write functions.

When more of the NAND SSD is set for overprovisioning, it reduces the amount of space available for usable storage. For example, if a 500GB SSD is set up with 50% overprovisioning, its actual storage space would be reduced to 250GB. This would mean that if the system needed at least 500GB in storage, then it would either require two SSDs set up this way, or one SSD with 1TB of storage. While overprovisioned SSDs would be faster than the non-overprovisioned ones, they would still not be as fast as NVDIMM, and the reduction in storage space would increase the costs.

In the next part of this two-part Cloud Storage solution series, we will review Intel’s solutions for the Data Center and how these solutions address the issues raised in this blog.

Notices and Disclaimers

Source(s):

¹ Seagate Over Provisioning and Its Benefits-Seagate.com

² NVDIMM vs. memory channel flash storage by Marc Staimer of Dragon Slayer Consulting

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Cloud Computing

As Networks Get Faster, High-Speed Storage Becomes More Important Than Ever

Part 1 of Intel’s Cloud Service Providers Solutions Series

Notices and Disclaimers

Written by Intel Tech