Analytics Benchmark for Dual Samsung 970 EVOs in RAID 0

Mitch Swan
3 min readFeb 3, 2019

--

In my relentless pursuit of an ideal platform for Analytics work I needed to create a workstation with very high data throughput. In my experience of real world Machine Learning disk speeds, IOPS and quality RAM are the critical bottlenecks in processing.

There are some pretty extreme options out there for RAID setups these days (Intel VROC) but I also required a compact solution I can take onto client sites. The hardware in this setup:

  • Intel NUC NUC8i7HNK with 8th gen Intel i7–8705G
  • 2 x Samsung 970 EVO 500Gb NVMe SSD
  • 32Gb DDR4 2400Mhz RAM
  • Ubuntu 18.10 Desktop

METHODOLOGY

Much of the advertising on SSDs focuses on the Continuous Read speed — this is a bit of a vanity metric but we will capture it with the GNOME Disk Utility built into Ubuntu. This was set at 50Mb samples

The more useful metric is the number of random Input/Ouptut operations per second (IOPS). The following simulates a database use case of of 75/25% Read/Write with a 4k sample size (imagine table / index operations). To do this, I have used the FIO tool.

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

BENCHMARK FOR SINGLE SSD

The first set of tests were a baseline for single SSD performance with no RAID.

BENCHMARK FOR DUAL SSD IN SOFTWARE RAID0

So after a bit of an odyssey getting Ubuntu to play nice with RAID. I was not able to set up a bootable drive in Hardware RAID0 (it can be done for Win10 but was not suitable for a docker based analytics machine).

The good news is that software RAID with Ubuntu actually outperforms hardware in many cases.

To install you need to use the Server version of Ubuntu with the following config:

https://help.ubuntu.com/lts/serverguide/advanced-installation.html.en

After that to get the desktop run the following commands:

sudo apt-get update
sudo apt-get install ubuntu-desktop

And at last, the Benchmarks:

RESULTS AND CONCLUSION

Overall placing NVMe drives in RAID does not yield much extra performance. This is likely due to the fact each NVMe already consumes 4 PCIe channels worth of bandwidth and other bottlenecks in the hardware limit further gains in RAID0.

AWS Benchmarks are self reported here.

These are still pretty insane numbers compared to the kind of throughput you can get with an AWS instance but RAID0 does not provide sufficient benefit to be worth the effort.

My advice, buy larger single SSD and save the effort (and M2 port).

--

--

Mitch Swan

Deep down I am a techie, but I make my biggest impacts on the “Business Side of Data” through strategy and operating models for data.