Kubernetes Storage Performance Comparison v2 (2020 Updated)

Jakub Pavlík
volterra.io
Published in
9 min readSep 7, 2020

In 2019 I published a blog: Kubernetes Storage Performance Comparison. My goal was to evaluate the most common storage solutions available for Kubernetes and perform basic performance testing. I had results for GlusterFS, CEPH, Portworx and OpenEBS (with cStor backend). This blog has been popular and I received a lot of positive feedback. I’ve decided to come back with few updates on progress in the storage community and their performance numbers, which I promised in my last blog. I extended my testing scope to include 2 more storage solutions:

  • OpenEBS MayaStor
  • Longhorn

Let’s start with the storage backend updates and their installation description, then we will go over the AKS testing cluster environment and present the updated performance results at the end.

Storage

As of January 2019, the CNCF storage landscape and solutions have changed. It has grown from 30 to 45 solutions under the storage banner, there were also governance expansions of public cloud integrations such as AWS EBS, Google persistent disk or Azure disk storage. Some of the new solutions focused more towards distributed filesystem or object storage as Alluxio. My original goal, and continues to be the same, is to evaluate block storage options. Let’s revisit my original list.

GlusterFS Heketi was second worst in performance results and its improvements are zero and it is mostly a dead project (Heketi as REST orchestrator not GlusterFS itself). If you look at their official GitHub, you can see that they are placing it into a near-maintenance mode and there is not any update in terms of cloud-native storage features.

PortWorx remains still in the top commercial storage solutions for Kubernetes according to the GIGAOM 2020 report. However there hasn’t been a significant technology or architecture change claimed in release notes between versions 2.0 and 2.5 from a performance point of view.

The best open source storage, CEPH orchestrated via Rook, produced 2 new releases and introduced a new CEPH version called Octopus. Octopus brings several optimizations in caching mechanisms and uses more modern kernel interfaces (See more at the official page).

The only major architecture change happened in OpenEBS, where it introduced a new backend called MayaStor. This backend looks very promising.

I also received a lot of feedback from the community on why I did not test Longhorn from Rancher. Therefore I decided to add to my scope.

I evaluated Longhorn and OpenEBS MayaStor and compared their results with previous results from PortWorx, CEPH, GlusterFS and native Azure PVC. The following subsection introduces storage solutions added into the existing test suite. It also describes installation procedure and advantages/disadvantages of each solution.

Longhorn

Longhorn is cloud-native distributed block storage for Kubernetes, developed by Rancher. It was designed primarily for microservices use cases. It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. Longhorn creates a Longhorn Engine on the node where volume is attached to, and it creates a replica on a node where volume is replicated. Similar to others, the entire control plane runs and the data plane is orchestrated by Kubernetes. It is fully open source. It’s interesting that OpenEBS Jiva backend is actually based on Longhorn or at least initially it was its fork. The main difference is that Longhorn uses TCMU Linux driver and OpenEBS Jiva uses gotgt.

How to get it on AKS?

Installation to AKS is trivial

  1. Run one command and it install all components into my AKS cluster

2. Mount /dev/sdc1 with ext4 filesystem into /var/lib/longhorn, which is the default path for volume storage. It is better to mount the disk there before Longhorn installation.

Screenshot from node disk configuration in Longhorn

3. The last step is to create a default storage class with 3 replicas definition.

Advantages

  • Open source
  • Cloud-native storage — it can run on HW clusters as well as public clouds.
  • Easy to deploy — it requires a single command and “it just works” out of the box.
  • Automatic volume backup/restore into S3

Disadvantages

  • It uses mount point into /var/lib/longhorn with a standard filesystem (ext4 or xfs). Each volume is like a single disk file. It scales with a number of controller replicas, which can bring extra networking overhead. Similar to what I described for OpenEBS Jiva.
  • Mounting of volumes sometimes takes a long time (few minutes) and it is showing errors which it eventually recovers from.

OpenEBS MayaStor

OpenEBS represents the concept of Container Attached Storage (CAS), where there is a single microservice-based storage controller and multiple microservice-based storage replicas. If you read my previous blog post from 2019, you know that I was playing 2 backends — Jiva and cStor. I ended up using cStor and its performance results were really bad. However 1.5 years is a long time and the OpenEBS team introduced a new backend called MayStor.

It’s a cloud-native declarative data plane written in Rust, which consists of 2 components:

  • A control plane implemented in CSI concept and data plane. The main difference compared to the previous backend is leveraging NVMe over Fabrics (NVMe-oF), which promises to provide much better IOPS and latency for storage sensitive workloads.
  • Another advantage of this storage design is that it runs completely out of the kernel in the host userspace and removes differences caused by the variety of kernels available in different Linux distributions. It simply does not depend on the kernel for access. I found a nice design explanation of MayStor in this blog.

How to get it on AKS?

Installation on AKS is straight forward, and I followed their quick start guide.

  1. I had to configure 2MB Huge Pages with 512 numbers on each node in my AKS cluster.

echo 512 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

However I decided to enforce them via k8s daemonset below instead of ssh into every my instance.

2. I had to label my storage node VMs.

3. Then I applied all manifests specified in MayaStor repository.

4. When everything is running, you can start creating storage pools for volume provisioning. In my case I created 3 storage pools with a single disk per node.

5. It is important to check the status of each storage pool before you can proceed with StorageClass definitions. State must be online.

6. The last step in the process is StorageClass definition, where I configured 3 replicas to have same testing environment as for my previous storage solutions.

After I had finished these steps I was able to dynamically provision a new volumes via K8s PVC.

Advantages

  • Open source with great community support
  • Cloud-native storage — it can run on HW clusters as well as public clouds.
  • Usage of NVMe which is designed for high parallelism and can have 64K queues compared to SCSI which has only one queue.
  • It uses NVMe-oF as transport which can work on a variety of transports (nvmf, uring, pcie) and it is fully done in user space — target as well as the initiator. Running in user space can avoid a large amount of system calls, post spectere/meltdown, etc. Also it is kernel independent, so there is no difference between type of linux across cloud or physical environment.

Disadvantages

  • Early versions — OpenEBS MayaStor is at version 0.3, so it still has some limitations and stability issues. However they are on the right track and in a few months it can be top choice for storage in K8s.
  • It’s required to have support for 2MBs Hugepages on Kubernetes nodes. However compared to 1GB hugepages, this is available almost in all environments physical or virtual.

Performance Results

IMPORTANT NOTE: The results from individual storage performance tests cannot be evaluated independently, but the measurements must be compared against each other. There are various ways to perform comparative tests and this is one of the simplest approaches.

For verification I used exactly the same lab with Azure AKS 3 node cluster and 1TB premium SSD managed disk attached to each instance. Details you can find in the previous blog.

To run our tests I decided to use the same load tester called Dbench. It is K8s deployment manifest of pod, where it runs FIO, the Flexible IO Tester with 8 test cases. Tests are specified in the entry point of Docker image:

  • Random read/write bandwidth
  • Random read/write IOPS
  • Read/write latency
  • Sequential read/write
  • Mixed read/write IOPS

At the start, I ran Azure PVC tests to get a baseline for comparison with last year. The results were almost the same, therefore we can assume conditions remained unchanged and we would achieve the same numbers with the same storage versions. Updated full test outputs from all tests from 2019 plus new MayStor and Longhorn tests are available at https://gist.github.com/pupapaik/76c5b7f124dbb69080840f01bf71f924

Random read/write bandwidth

Random read test showed that GlusterFS, Ceph and Portworx perform several times better with read than host path on Azure local disk. OpenEBS and Longhorn perform almost twice better than local disk. The reason is read caching. The write was the fastest for OpenEBS, however Longhorn and GlusterFS got also almost the same value as a local disk.

Random read/write IOPS

Random IOPS showed the best result for Portworx and OpenEBS. OpenEBS this time got even better IOPS on write than native Azure PVC, which is almost technically impossible. Most probably it is related to Azure storage load at different times of test case runs.

Read/write latency

Latency read winner remained the same as last time. LongHorn and OpenEBS had almost double of PortWorx. This is still not bad since native Azure pvc was slower than most of the other tested storages. However latency during write was better on OpenEBS and Longhorn. GlusterFS was still better than other storages.

Sequential read/write

Sequential read/write tests showed similar results as random tests, however Ceph was 2 times better on read than GlusterFS. The write results were almost all on the same level and OpenEBS and Longhorn achieved the same.

Mixed read/write IOPS

The last test case verified mixed read/write IOPS, where OpenEBS delivered almost twice higher than PortWorx or Longhorn on read as well as write.

Conclusion

This blog shows how significantly an open source project can change in a single year! As a demonstration let’s take a look at comparison of IOPS between OpenEBS cStor and OpenEBS MayaStor on exactly the same environment.

Mixed read/write IOPS comparison between OpenEBS cStor and MayaStor

Please take the results just as one of the criteria during your storage selection and do not make final judgement just on my blog data. To extend my last summary from 2019 on what we can conclude from the tests:

  • Portworx and OpenEBS are the fastest container storage for AKS.
  • OpenEBS seems to become one of the best open source container storage options with a robust design around NVMe.
  • Longhorn is definitely a valid option for simple block storage use cases and it is quite similar to OpenEBS Jiva backend.

Of course this is just one way to look at container storage selection. The interesting parts are also scaling and stability. I will keep an eye on other evolving projects in the CNCF storage landscape and bring new interesting updates from performance testing and scaling.

--

--