Are you leaving performance on the table? — A deep dive on PCIe Gen 4 storage platform performance

Author: Allyn Malventano, Storage Technical Analyst

Intel Tech
Intel Tech
7 min readApr 9, 2021

--

As someone who has spent over a decade comparing the performance of competing SSDs, I’ve learned that my results are only as good as the instrument used for those measurements. While a modern computer system is far more complex than a multimeter or even an oscilloscope, in this context they are one and the same. A multimeter strives to be as ‘ideal’ as possible, in that it should not influence the thing that it is measuring. It’s not so simple when using a computer to evaluate pure SSD performance, but the principle remains — the instrument should influence the items being tested as little as possible.

With the launch of the 11th Gen Intel® Core™ desktop processors (Rocket Lake-S), the time has come for me to re-evaluate my choice of storage testbed. I was previously limited in my choices for PCIe 4.0 support, but that is no longer the case.

Experiment design

While SSD testing is generally done with pre-fill and other prerequisites as to bring the drive to a more long-term steady-state condition, there are other times where it is useful to evaluate products in a more ‘ideal’ state. One example is when confirming a product meets its datasheet specification. More apt to this exercise, the intent is to get consistent and high performance from the SSD while using it as a reference to highlight any deficiencies in the test systems themselves. Here’s the plan:

  • Intel: 11th Gen Intel® Core™ i9–11900K processor, ASUS ROG Maximus XIII Hero Z590 motherboard (BIOS 0704), (XMP Enabled — 3200, MCE Disabled)
  • AMD: AMD Ryzen 9 5950X processor, ASUS ROG Crosshair VIII Hero Wi-Fi X570 motherboard (BIOS 3401), (DOCP Enabled — 3200)
  • Memory: G.Skill TridentZ RGB DDR4–3200 F4–3200C14Q-32GTZR 32GB (4x8GB)
  • GPU: Nvidia GeForce RTX 3090 (Driver 461.72)
  • Cooler: Noctua NH-U14S
  • OS Version: Windows 10 20H2 19042.906
  • OS SSD: Intel® SSD 760p 512GB
  • Test SSD: Samsung 980 Pro 1TB (FW: 2B2QGXA7)
  • Tests performed week of 3/29/2021

CrystalDiskMark (CDM)

For the first test I will keep things as simple as possible, with one of the more common and easy to run tests out there:

Since this test outputs results on widely ranging scales, I’ve normalized vs. the same result from the competing platform (AMD Ryzen 9 5950X results = 1.0x in this graph). Sequential performance is roughly equal here (within 1.5%), but we see some significant deltas in random performance. The results indicate that the 11th Gen Intel® Core™ i9–11900K system is faster at completing individual storage requests, starting with a 4% gain in random reads (on ~20K IOPS) and a 12% gain in random writes (on ~60K IOPS). NAND SSDs typically have higher random write performance, which tracks with those figures (test system latency has a greater impact as SSD performance increases since the SSD latency becomes a smaller portion of the total).

While QD1 (Queue Depth 1) is commonly seen in most ‘regular’ SSD usage, a storage-focused system needs to also push into power user territory as to evaluate the full spectrum of performance. The largest deltas here come in the form of a single thread running at a higher queue depth (32 simultaneous requests), where the competing platform is leaving nearly 30% on the table. This peak single threaded performance is more than just academic since a core with a far higher peak will also have more available overhead for other tasks when used in more varied scenarios.

Iometer

Moving on to a more complex tool, we can turn more dials and expand a bit more on what we saw above:

This is a ‘QD Sweep’, showing how IOPS ramps up as more demand is placed on the SSD. Eventually this ramp hits a plateau, but these values are far below what this SSD can deliver, so the max limits here are caused by the test system’s single threaded IO-handling capability and not the SSD. With this test we find the max delta occurs even earlier than previously indicated by CDM (at QD16).

For us to verify this SSD can meet its performance specification (and that these test systems can attain it), we will need to scale higher than a single thread:

Here I have built on the prior chart, adding in additional threads, and allowing the SSD to scale all the way up to its maximum performance rating. The expected behavior is to have a consistent climb and then a transition from SSD to core saturation (eventually the max limit shifts to the SSD — at its rated IOPS). Once the SSD’s maximum saturation point has been reached, adding more threads is fruitless, and in some cases can even result in regressions. Looking at the results, the Intel platform maintains a lead across multiple thread counts and reaches this SSD’s specified performance maximum with fewer threads and a lower total QD than the comp platform. Another point relevant to ensuring the product meets its spec is that even in this conservative test configuration (empty NTFS partition with an 8GB test file), the SSD in the competing system came close but never attained this SSD’s rated maximum IOPS, regardless of how many cores were thrown at the problem.

I noted an oddity with the 8-thread run on the AMD Ryzen 9 5950X system, but another pass of the test showed the behavior to be repeatable:

Quality of Service

So far, we have looked at what I would consider low and medium levels of detail, but there are times where we need to dive even deeper into the data. Quality of Service (QoS) is a term commonly used when evaluating enterprise SSD performance, but it can also be a useful tool when narrowing down jitters or stutters in more typical client use cases. These stutters are hard to pin down with tests that only show an average result, especially if there are only a few transfers taking significantly longer than the rest. QoS helps translate performance variability into something more visually quantifiable, and we have a few ways to look at this data. Let us ease into it with a coarse level of detail:

Here we have stepped back to the most common client workload (QD1 random read), as that type of work is where latency judder would have an impact on real-world tasks. For this chart, lower latencies are better, but we also care about consistency, which is expressed across the varying “9’s”. Moving from left to right we separate out a smaller and smaller proportion of the longest completion times and note the increasing latency. An ideal result here is a horizontal line, but no SSD is perfect. That said, the SSD is a constant in this exercise, so the variances we see are highlighting inefficiencies in the less performant test system. Let us zoom in a bit further:

Above is a higher resolution view, showing ‘between the 9’s’. Note that the axes have been flipped around here, so an ideal result would now be a line as far to the left (faster) and as vertical (consistent) as possible. Here we do see both systems trading blows a bit along the way, but deltas so close to each other would not be my primary focus in trying to evaluate SSD consistency. When reviewing one of these charts, I’m looking for an excessively ‘long tail’ (along the top of the chart), and the height at which that tail begins (meaning that a higher percentage of those requests took longer to complete). Large areas between the curves typically indicate a glaring issue. Given that we are not comparing SSDs but test systems here, we find that one system produces a far less consistent result — in this case the latency deltas are in excess of 3.5x. Such a large/long tail coming from the test system itself would be very likely to mask the identification of any real SSD issues that I might have tried to identify with this test.

Wrapping up, the 11th Gen Intel® Core™ i9–11900K system realized clear advantages in both storage performance and consistency across storage tests of varying complexity and levels of detail, making it the default option for my own SSD performance testing and the obvious choice for those seeking the best possible storage performance from their own system.

Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex​.

Performance results are based on testing as of March 29th 2021 and may not reflect all publicly available ​updates. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. ​

--

--

Intel Tech
Intel Tech

Intel news, views & events about global tech innovation.