It can be tough to properly plan for capacity in a shared service. Photo credit

Shared filesystems in AWS, part 1: intro

Nate Aiman-Smith
RunAsCloud
Published in
6 min readJan 23, 2017

--

Shared filesystems have been a part of my IT world for as long as I’ve been working in the field. In layman’s terms, a shared filesystem allows multiple machines (“clients”) to connect to the files and folders on a single machine (“server”). In its simplest form, instructions for sharing a folder via Windows can be written in a 2–3 page document with the vast majority of that page real estate being occupied by screenshots. Similarly, instructions on how to share a folder via Unix’s NFS protocol can be written in less than 1 page (assuming fewer screenshots). This makes sense; both of these file-sharing protocols have been around for so long that the setup and usage process has become pretty simple.

Benefits

Having a shared filesystem solves a lot of distributed system problems that don’t necessarily have easy alternate solutions

  • Shared static content - imagine a site in which people upload images, do some basic manipulation like cropping, and then share the images. If you have multiple application servers with their own filesystems doing this work then you need to either replicate all those images to each server or have some way of knowing which server a given image sits on (and also hopefully some way of replicating that so that you don’t lose all those images if the server goes belly-up). A shared filesystem gets rid of this problem.
  • Shared application code - when each server has its own copy of the application code, the process to update that code needs to ensure that each server gets updated separately. You may even have a scenario in which a DB update and an application code update have to happen more or less simultaneously, which becomes very difficult when you’re dealing with multiple servers with multiple copies. A shared filesystem allows you to update that code in one place and have it immediately take effect everywhere.
  • Session state files - many application servers keep track of session state in a file — this file might have information such as who you are, at what time you logged in, when your session expires, and some method of shared secret to ensure that your next request can be validated. If you store these files on a local filesystem, you need to ensure that each request for the same user session always goes to the same server (and if the server dies, that session dies with it). Storing these files on a shared filesystem allows the application farm to be resilient against a single server failure.

These are only a few of the problems that can be solved by having a shared filesystem — that’s why they’ve been a cornerstone of systems architecture for decades.

Drawbacks

So, why not use a shared filesystem? Here are the main problems:

  • Scalability — this design can work just fine with a small number of clients, but at some point you may end up needing to share the same data with a very large number of clients. When you hit the scaling wall on this setup you usually hit it hard, and then you’re scrambling to try to come up with a band-aid. Anyone who has worked in IT for 10 years or more has run into this, and it’s absolutely no fun.
  • Latency — local disk is always going to have lower latency than networked disk, and depending on the profile of your application this could translate to a big performance hit, even for apparently small latency increases in your block storage.
  • Single point of failure (SPOF) — even if you cluster the system (or, more likely, purchase an dual-head appliance that has at least 2 of everything), you’ve still got a single point of failure from an application architecture standpoint. If the shared filesystem has a problem, nothing works.
  • “Black box” — shared filesystems are like the automobile of IT; almost everyone knows how to use one, but very few people have any idea what to do when things go wrong, or how to tune one for a specific environment, or even what all those buttons on the dashboard do.

Despite these drawbacks, many applications still use shared filesystems for their relative simplicity and usefulness from an architectural standpoint.

Enter AWS, and the paradigm shifts

Until recently, there was simply no easy and cost-effective way to have a highly-available shared filesystem in AWS. As an AWS Solutions Architect, I would often work with customers to refactor their infrastructures to not use NFS or CIFS. For the record, I don’t think the use of NFS or CIFS is good architectural design, and wherever possible your application should be refactored not to use it. That having been said, there are still situations in which it’s unavoidable.

Finally, in April 2015 AWS announced the EFS filesystem, which provides an infinitely scalable, highly-available, managed NFS filesystem. Unfortunately for most of us, EFS didn’t become generally available until more than a year later, but at least we knew that help was (theoretically) on the way.

Much as a customer’s #1 option should be to not use NFS whenever possible, the #2 option should be to use EFS. It’s simple, it’s cost-effective, it’s redundant across AZ’s, and it’s managed. Of course, it also has some potential issues:

  • It’s not available in all regions. Given the continued prevalence of shared filesystems in IT architectures, my educated guess is that AWS is working to deploy this in all regions — for example it’s already available in Ohio (but not the UK or Canada). However, keep in mind that EFS has been GA in 3 regions for more than 6 months now, so deployment to other regions will probably happen slowly.
  • Performance is relative to the amount of data stored in a mount point. If it’s not meeting your performance needs then there’s not a lot you can do about it (short of artificially increasing the size by putting empty files in it, but that’s a bad idea in general).
  • There is no managed backup option. For EBS volumes, AWS has always provided snapshot technology, which will create a block-level backup that gets stored in S3. EFS doesn’t have any backup functionality — if you want to back up data on EFS, you have to do it with backup software from a client.
  • It’s still relatively expensive. EBS GP volumes in us-east-1 cost $0.10 per GB/month. Double that for cross-AZ replication and you get $0.20 per GB/month. EFS costs $0.30 per GB/month. Granted this doesn’t come into play until you’re talking about large amounts of data, but depending on your use case (for example, tier 3 storage that still needs to be accessed via NFS) it could be significant.
  • It’s NFS-only. Theoretically this can be used by Windows via the Windows NFS Client, but like everything else the devil is in the details, particularly in the context of user mapping.

Besides refactoring and using EFS, there are other options:

  • Some marketplace vendors are working to fill this space. Of notable mention are SoftNAS, (a cross-AZ replicated fileserver)and Zadara, which provides SAN LUNs and NAS exports via Direct Connect.
  • There are multiple products that can use S3 as the backend but make it look like NFS or CIFS. For NFS, s3fs is the classic but there are many similar products like riofs. For Windows, you can use CloudBerry drive (not open source), and there are a few attempts at open source products to do the same thing.
  • You can roll your own highly available fileserver. More on this in part 2.

If you want to build an application that uses a shared filesystem but don’t know the best way to do it, please contact me. Taking into account your deadlines, budget, and engineering resources, I can help you determine the best course of action and ultimately be successful in building your application in AWS.

--

--