NFS benchmarks of Amazon EFS and NetApp ONTAP Cloud

Arseny Chernov
FAUN — Developer Community 🐾
7 min readSep 25, 2017

--

Disclaimer: the statements expressed in this article are those of my own, based on my background in scale-out object and network attached storage, and do not represent thoughts, intentions, plans or strategies of my employer. I neither claim this testing complete, and rather invite more collaborators.

So, when you need an Network File System (NFS) export for your project, working in Enterprise, — what do you do?

— I raise a ticket with the Storage Team so they carve one for me!..

Good on you! Probably you’re the kind of reader I need.

It’s good to have a Storage Team, right? Answer few of their questions, determine workload characteristics, initial capacity, locking, growth trajectory, backup requirements, and so on.

And your mountpoint is there!

But… But what if you’re on AWS?

There are multiple choices, actually.

Some prefer a DIY approach, where EC2 instances would act as a generic NFS server. A lot of best-practices could be found in articles by CloudBees and BitBucket , — as they rely a lot on NFS in their environments.

So far, I’ve only managed to get my head and hands around the following two:

  • AWS Elastic File System (EFS)
  • NetApp ONTAP Cloud for AWS appliance

In both cases, I’ve focused on magnetic disk use-cases. And, only NFS, because… NFS Client for Windows.

Invitation to Collaborate and Share Results

Do you plan to do a Proof of Concept of NFS in AWS?

Are you able to join the proposed Vdbench methodology, and publish your results? I’ll be happy to backlink / update this article.

Let’s collaborate.

Paste your feedback / experiences / results to comments on this page.

Oh. Did I just mention Vdbench?

Few words about Vdbench

It’s an Oracle-supported free command-line tool that generates I/O load. You can download it from Oracle’s website .

Back in my EMC Isilon days, Vdbench was one of the many tools we used, — among others were fio for Linux and fio.exe over Cygwin for Windows and others like IOMeter .

But, to establish the comparable baseline, I decided to use Vdbench.

And so hey, let’s use Vdbench, because… simple!

Preparation for AWS NFS Benchmarking

Configuration of Vdbench Client

  • m3.medium (1 x 2.5 GHz vCPU / 3.5GB vRAM)
  • EBS GP2
  • Red Hat Enterprise Linux 7.x with latest nfsutil
  • Vdbench test file size 15GB

Yes, I know what you’re thinking.

m3.medium is not “beefy” enough to pull through rigorous testing, but… I’ve picked up a representative member of the “cattle” (not a “pet” at all) which is down-to-earth small EC2. Now imagine you got hundreds of them (I didn’t!)

If you have chance to experiment further, please submit your results!

For NFSv4, on the client, create two mountpoints in fstab for traditionally-found transfer sizes of 32kB and 1024kB in hard mounts:

#for NFS4 32kB transfer sizerw,hard,sync,_netdev,lookupcache=pos,nfsvers=4,noatime,intr,rsize=32768,wsize=32768#for NFS4 1024kB transfer size 
rw,hard,sync,_netdev,lookupcache=pos,nfsvers=4,noatime,intr,rsize=1048576,wsize=1048576

For each of the mount-points, we’re interested is to run using Vdbench is the following:

  • Sequential Read (100% read)
  • Sequential Write (0% read)
  • Random Read (100% read)
  • Random Write (0% read)

…each for a single (1) threaded, and multi (20) threaded test of Vdbench. Here’s an example:

Single 1024KB Random Write Thread Parameter File
* File System Definition (FSD) Parameter Section
fsd=fsd1,anchor=/mnt/1024KB,depth=1,width=1,files=1,size=15360m
* File System Workload Definition (FWD) Parameter Section
fwd=fwd1,fsd=fsd1,operation=write,xfersize=1024k,fileio=sequential,fileselect=random,
stopafter=100,threads=1
* Run Definition (RD) Parameter Section
rd=rd1,fwd=fwd*,iorate=max,seekpct=100,rdpct=0,fwdrate=max,format=yes,elapsed=30,warmup=

Configuration of NetApp ONTAP Cloud

From the NetApp ONTAP Cloud Marketplace Page:

…ONTAP Cloud offers you the power of ONTAP software with flexible performance for EBS (I01, GP2, ST1, SC1)

Also, there’s a variety of different EC2 instance types that could be deployed:

source: ONTAP Cloud marketplace on AWS

Let’s proceed to test the ST1 version with a “reasonable” EC2 instance type

  • ONTAP 9.2
  • m4.2xlarge (8 x 2.4GHz, 32GB vRAM)
  • EBS ST14TB total
  • 20GB volume (to fill it up to 15GB total) per NFS4 export mount

Configuration of Elastic File System

There are not too many things to configure, really.

Some things I’d like to note and share:

  • Its baseline throughput scales with the size of the filesystem, and is subject to bursting. It actually reflects the nature of file system usage in a massively multi-tenanted environment, so it’s fair. It uses credit system. Baseline throughput is 50 KiB/s per GiB of storage (50 MiB/s per TiB of storage), rest is bursting:
source: efs performance page

…so, when I was doing the benchmarks, I kept in mind the ~300GB file share that falls in to North of 12.5 MiB/s baseline throughput. My requirements were lower than the baseline, actually.

  • in “General Purpose” mode EFS can pull off a ~6000 IOPS for entire fliesystem, but at a guaranteed low latency. I’ve got 6000 IOPS after talking to AWS team. You need to observe the PercentIOLimit metric when conducting Vdbench runs, if it’s closer to 100% you need to switch to “Max I/O” type EFS by re-creating the filesystem. There won’t be a cap, but at the same time latency may be unstable between threads and clients:
sample CloudWatch dashboard to observe benchmarked EFS PercentIOLimit
  • keep in mind that behind the scenes, it’s a 2-Phase Commit (2PC) across 3 availability zones for coherent consistency. It’s a durable and reliable scale-out storage system, — but at its scale, don’t expect it to peform as a transactional All-Flash NAS system.

Results of AWS EFS and NetApp NFS Benchmarks

Here’s the table for the test-cases above.

For each test, there were multiple (6…10) runs of Vdbench, so individual cells correspond to an average.

Also worth noting that 20 threads seems a bit too much for Java on 1-core 3.5GB vRAM instance, so some of the runs crashed and were discarded.

In no way I claim these results final. To be precise: EFS was definitely bursting. It wasn’t loaded up to 300GB assumed, — so, when at around 15GB, it showcases 80 MB/s and more, — I simply look up to the table above.

The EFS tests also were done over several days, for several hours a day max, — so the throughput credits were certainly replenished. For me, it was exactly representative.

For your workload, — you need to do a more precise approximation, or, better, re-do the testing.

In the end of the day, I really want you to do your own testing, and publish more results!

Anyway, let’s take a look and speculate:

  • NetApp definitely wins random write tests hands-down with its journalling within a beefy 32GB vRAM EC2 instance. The 2PC of EFS is… what you expect it to be.
  • EFS definitely wins hands-down with reads, as it aggregates (assuming) gigantic backend RAM for its read-ahead-cache, while NetApp only has 32GB
  • Clearly, NetApp’s 8-core CPU is suffering when 20 threads are doing random workloads, — the EC2 instance has to run ONTAP/WAFL_ex, whatever other daemons, — and also provide responses. Hence the sub-second long latency.

Conclusions and Call for Actions

First and foremost…

Please do your own testing and share!

As you could have imagined, it’s really quite an exhaustive set of benchmarks, with very much custom common denominator, i.e. Vdbench client EC2 instance type.

What worked for me clearly may not at all work for you, so post your results and let’s backlink each other across. Or, comment on this article, and I’ll publish your results here.

Figure what you really need!

NetApp has a hands-down advantage: it’s NetApp!

It’s got it all… SMB protocol support , SnapVault in the end of the day! So, on a block-level, incremental, full, daily, weekly, monthly and so on. And then straight in to S3.

On another hand, — have you seen the EFS Backup using Data Pipeline? Factor in the burst throughput exhaustion, spinning EC2 instances up and down… Totally different experience from NetApp, mind you. Just reading about it makes me raise an eye-brow.

Cost-wise, — two different beasts, as it’s Software+EC2 on NetApp side, and pure capacity on EFS side. I didn’t do extensive comparison over a period (say, one year) — probably worth a separate research.

But, EFS has its own hands-down advantage: it’s AWS! Integrated with IAM, fully deployed by Infrastructure-as-Code means of CloudFormation or Terraform, flexible on-the-drip usage by capacity.

Which one do you prefer, and why?

And again, — comments and opinions — more than welcome!

P.S. Also, have to send huge kudos to Pugalenthi for his support throughout this testing.

Join our community Slack and read our weekly Faun topics ⬇

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇

--

--