Using NVM-e instances in Azure Kubernetes Service

Alessandro Vozza
Cooking with Azure
Published in
3 min readAug 28, 2020

--

Code available at https://github.com/ams0/aks-nvme-ssd-provisioner

NVM Express (NVM-e) is a high-performance, non-volatile type of storage and it’s available in Azure on the L-Series Virtual machines for more than a year. They offer outstanding performance in terms of latency and throughput (at a cost) and this article will show how to use them in the context of the Azure Kubernetes Service.

This solution is based on the upstream Local volume provisioner to create Kubernetes Persistent Volumes and present them to workloads that need them (thru PersistentVolumeClaims). It runs as a DaemonSet on each node (filtered in our case by a nodeSelector for a label on the nodes). Local Persistent volumes in Kubernetes are a particular class of volumes that represent a directly attached local disks for use in stateful workloads in Kubernetes.

Prerequisite for using NVMe enabled instances in a nodepool is to have an AKS cluster that supports multiple pools (any cluster created recently does). You can add a node pool with the NVMe-enabled instances like this:

az aks nodepool add -g <resourcegroup> --cluster-name <clustername> -n nvme -s Standard_L8s_v2 --labels kubernetes\.azure\.com\/aks-local-ssd=true -c 1

Note the label attached to the nodes. You can add a cluster autoscaler to the pool easily and now you can even autoscale the nodepool to and from zero nodes, effectively using capacity only when needed.

Check if the node is ready with kubectl get nodes -o wide --show-labels.

$> kubectl get nodes -o wide -l kubernetes.azure.com/aks-local-ssd=true
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-nvme-31968050-vmss000000 Ready agent 9m34s v1.18.6 10.240.0.6 <none> Ubuntu 18.04.5 LTS 5.3.0-1034-azure docker://3.0.10+azure

The only thing you need at this point is to apply the manifest here which will run first an initContainer to discover and format all the NVMe devices (in the presence on >1 device, it will create a RAID0 device with all the disks striped. You can check how many NVMe devices each VM size gets in this document).

kubectl apply -f https://raw.githubusercontent.com/ams0/aks-nvme-ssd-provisioner/master/manifests/storage-local-static-provisioner.yamlclusterrolebinding.rbac.authorization.k8s.io/local-storage-provisioner-pv-binding created
clusterrole.rbac.authorization.k8s.io/local-storage-provisioner-node-clusterrole created
clusterrolebinding.rbac.authorization.k8s.io/local-storage-provisioner-node-binding created
serviceaccount/local-storage-admin created
configmap/local-provisioner-config created
daemonset.apps/local-volume-provisioner created
storageclass.storage.k8s.io/local-storage created

Follow the logs for the provisioner:

kubectl logs -n kube-system -l app=local-volume-provisionerI0828 12:52:52.391348       1 main.go:64] Ready to run...
I0828 12:52:52.391470 1 common.go:396] Creating client using in-cluster config
I0828 12:52:52.586717 1 main.go:85] Starting controller
I0828 12:52:52.586751 1 main.go:101] Starting metrics server at :8080
I0828 12:52:52.586816 1 controller.go:45] Initializing volume cache
I0828 12:52:52.787382 1 controller.go:108] Controller started
I0828 12:52:52.787788 1 discovery.go:331] Found new volume at host path "/pv-disks/7524ba51-0bbc-4270-a1b8-6e4e77d803e8" with capacity 1889168896000, creating Local PV "local-pv-d7361622", required volumeMode "Filesystem"
I0828 12:52:52.799894 1 discovery.go:366] Created PV "local-pv-d7361622" for volume at "/pv-disks/7524ba51-0bbc-4270-a1b8-6e4e77d803e8"
I0828 12:52:52.800086 1 cache.go:55] Added pv "local-pv-d7361622" to cache
I0828 12:52:52.831276 1 cache.go:64] Updated pv "local-pv-d7361622" to cache

And check the volume just being provisioned:

kubectl get pvNAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS    REASON   AGE
local-pv-d7361622 1759Gi RWO Delete Available local-storage 3m10s

Bonus: performance test your new PersistentVolumes

Apply this Job template to performance test your disks:

kubectl apply -f https://raw.githubusercontent.com/ams0/aks-nvme-ssd-provisioner/master/bench.yamlpersistentvolumeclaim/dbench created
job.batch/dbench created

After job completion, check the logs for the fio container

kubectl logs -l job-name=dbench...
==================
= Dbench Summary =
==================
Random Read/Write IOPS: 124k/109k. BW: 1576MiB/s / 1131MiB/s
Average Latency (usec) Read/Write: 121.06/36.67
Sequential Read/Write: 1645MiB/s / 1178MiB/s
Mixed Random Read/Write IOPS: 87.5k/29.2k

As you can see the performance of the NVMe disks are really impressive, I obtained ~124k IOPS (both single disk and RAID) ~1500 Mib/s (single disk) and >3200 Mib/s for 2 disks striped (Standard_L16s_v2), with latencies in the 100 usec order.

--

--

Alessandro Vozza
Cooking with Azure

Full time Cloud Pirate, software engineer at Microsoft, public speaker, community organiser and mentor. Opinions are mine’s, facts are facts.