Memory and Storage Hierarchy gaps happen; fixing them delivers performance you can feel

Frank Hady

Published in

Performance at Intel

5 min readOct 1, 2019

Frank Hady, Intel Fellow

Sept 30th, 2019

Remember what it felt like when you got your first solid state drive (SSD)? I do. Back in 2008, my laptop with a hard disk drive (HDD) felt unresponsive. I was tired of the way it lagged and its near total grinding to a halt when the virus scanner started accessing the HDD. Then one day I installed an Intel® X-25M SSD. Instantly my laptop felt peppy and responsive. Gone was the lag. I didn’t even care (or notice) when the virus scanner ran.

Looking back, I was unhappy with my system because it had a big gap in the memory and storage hierarchy. The difference in latency between the DRAM and the HDD was a humongous ~80nanoseconds to ~3milliseconds! I was wasting CPU cycles waiting for data. Then along came my ~50usec NAND SSD to save the day. My immediate memory and storage hierarchy problems were fixed, and I felt it.

Fast forward to today, and we find ourselves again in need of a new technology in the hierarchy. NAND SSDs are great, but their latency is actually slightly longer than when they first arrived — a compromise to allow them to hold more data. And CPUs have continued to increase in performance. So, data on the NAND SSD is further away from the CPU than it used to be. This time, this is most acutely felt in the data center where data set sizes are increasing at an astounding rate of 2x every 3 years.

Last week I had the pleasure of presenting, along with a number of experts from Intel, on just this topic at an Intel-hosted memory and storage event in Seoul, South Korea, for global influencers. We showed why new gaps are appearing in the memory and storage hierarchy and how Intel® Optane™ technology and Intel® 3D NAND technology are filling these gaps to make the memory and storage hierarchy complete. Today, I’m focusing on the hierarchy and where we see performance and capacity gaps emerging as time passes.

The picture above shows the Memory and Storage hierarchy. Picture the CPU on top, with each layer of the hierarchy holding progressively “hotter” data (from bottom to top) making that data rapidly accessible to the processor. Following the 90/10 rule (10% of the data is accessed 90% of the time), we’d expect each layer to be 10x the capacity of the layer above, but only one tenth the performance. The system moves more frequently accessed data up and less frequently accessed down.

So why the gaps in the picture? These gaps result from the mismatches between the increasing needs for compute on growing amounts of data, versus the trends in the underlying memory technologies. Remember that data is increasing at 2x every 3 years — an incredible pace. But the capacity per die of DRAM is increasing at only 2x every 4 years. This mismatch means the we can’t store as much data in DRAM, close to the processor in terms of latency, as we’d like. Therefore, we have a gap just under DRAM — a memory capacity gap.

Graph 1: Read Latency over Time

Source: Intel meta-analysis based on multiple ISSCC, IEDM, IMW papers

Moreover, we find that NAND, while it is increasing its capacity at a fast-enough rate (2x every 2 years), has a relatively constant latency over time. So as the CPU gets faster, data in NAND SSDs appears further away. This makes another gap between DRAM and NAND SSDs, a storage performance gap.

Graph 2: Write Bandwidth over Time

Source: Intel metanalysis based on multiple ISSCC, IEDM, IMW papers

To further complicate matters, we find that memory technologies tend to increase in capacity more than they increase in throughput. This means that bandwidth per capacity decreases over time, increasing the time required to access big data sets at any particular layer. This reinforces the gaps.

Filling these two gaps requires a new technology, one with a higher capacity at lower cost than DRAM, but it does not need to have DRAM levels of performance (latency and throughput). Since this new memory must also appear as storage, it also needs to be persistent across power cycles. This new memory, of course, is Intel® 3D XPoint™ Media placed into the system in both Intel® Optane™ SSDs and Intel® Optane™ DC Persistent Memory.

Intel Optane SSDs have latencies of ~10µsecs, roughly 1/10th that of a NAND SSD. Better yet, they reliably return data quickly, much more so than NAND SSDs. This means Intel Optane SSDs fill part of the gap described, bringing more storage closer to the CPU.

Intel Optane DC Persistent Memory makes Intel Optane memory accessible directly through load-store instructions, which don’t require OS intervention. They also allow single cache line accesses. Data can be accessed in 100s of ns (~100ns to ~340ns depending on DRAM cache hit, to be specific). So as persistent memory, Intel Optane media fills the other gap in the hierarchy.

We also showed that NAND, increasing its capacity at 2x every 2 years, is able to store some of the data currently stored in HDDs, moving that data closer to the processor. Intel 3D NAND is leading this capacity per silicon area charge.

What does this all mean? It means the data center that is currently struggling with too much data too far from the CPU, just like we struggled in 2008 on our laptop, is catching a big break. Our advances in memory technologies and the system technologies needed to make them useful are delivering a game changer a lot like the first SSDs. At Intel, we’re working to complete the hierarchy to help make data centers feel peppy again.

Memory and Storage Hierarchy gaps happen; fixing them delivers performance you can feel

Written by Frank Hady