How the Filesystem Can Become a More Powerful Abstraction

Files are a fundamental building block of storage. Other storage systems like databases are usually implemented on top of the filesystem. If the filesystem is such an important abstraction, how can we make it more powerful and convenient for developers to use, while remaining backward compatible?

In this post, I’m not looking at replacing the core abstraction — a file is a sequence of bytes — with a different abstraction, like tables with rows and columns, key-value pairs, or something else. Whatever the merits and demerits of these alternatives, the fact that it’s a different abstraction breaks backward-compatibility. Even if you did build and launch such a system, files won’t disappear for years, or decades. Or your new abstraction may be implemented on top of the filesystem. So it’s worth looking at how to improve, rather than replace, the filesystem.

To begin with, the filesystem API can provide calls to insert a byte range in a file, not just append it. Since most filesystems support storing a file in different fragments on disk, insertion can be handled more efficiently than in application code, which has to copy the entire file [1]. Imagine having to copy a gigabyte of data to insert a megabyte in it. This makes no sense.

Filesystems can also support the opposite of inserting, which is deleting a byte range in the middle of a file.

And replacing a byte range with another, even if the size is different, which is a combination of insertion and deletion.

The filesystem should also become more efficient at handling tiny files, so that it can be used as a substitute for a database. It shouldn’t take much more space than encoding them all into SQL, JSON or some other format. And it should be no slower to access or modify.

For example, if you’re storing a million names and phone numbers, you should be able to store each person’s phone number in a different file, with the person’s name becoming the file name. This shouldn’t take much more space than the same information in JSON or an SQL database. Accessing this information, or adding a new entry, or deleting one, or modifying an existing entry, should be not much slower than a database.

There are other features the filesystem could provide to simplify app code that are already supported in some filesystems. But if they’re universally supported, app developers can rely on them.

One is checksumming data, to guard against corruption. These should be cryptographically secure. Not MD5, for example [2]. If the filesystem checksums data, file formats will have less reason to.

The filesystem should also transparently compress data. This is important in a world with 128GB SSDs, even on a thousand-dollar Macbook Air. App developers shouldn’t have to reinvent this for each app, and users shouldn’t run out of space if it’s easily preventable using 20-year-old technology [3].

Then there’s a search API, so that apps can easily have their data indexed and searched, not just by end-users but also by the app itself. An app like Lightroom that lets you filter your library by filetype could use a Spotlight query for this, instead of reimplementing what the OS provides.

Then come document packages, which are a way for applications to store multiple related files in a way that appears to the user as a single file. This lets apps skip the complexity and overhead of serialising multiple items into a single byte stream [4].

The filesystem could also support revision history of files and folders, not just in the UI like Time Machine, but also via an API for apps to use. Then, a notes app like Simplenote that lets users see revision history of notes can just invoke the OS API, rather than reimplementing it.

The filesystem should also support constant time copies of directories, with copy-on-write, so that apps don’t need to implement such a feature themselves, say by factoring out shared files into a common place.

These are all ways in which the filesystem can provide a higher level of abstraction to app developers, so that they don’t have to reinvent the wheel, and do so inefficiently.

[1] Even if the filesystem implementation ends up copying data, it can probably do so in a more optimised way than application code. Maybe in C rather than managed code, for example. Or copying whichever units is faster, whether bytes or 16-bit or 32-bit or 64-bit. Or using an API like sendfile. In general, since the filesystem code will be used by millions of apps, we can spend much more time optimising it.

[2] Which means the next version of the filesystem should be able to switch to a better checksum if the present one is found insecure. An upgrade path should be designed from day one.

[3] Domain-specific compression algorithms like audio and video still matter. The filesystem should try compressing the first KB or so of each file and if it doesn’t compress, give up. That way, app developers shouldn’t have to worry about a second compression algorithm causing expansion or slowing writes down.

[4] Forks and extended attributes accomplish the same goal.