How To Ruin A Perfectly Good Container

I am not aiming at a general audience. I assume you have some notion of what security is and how it is provided (to the extent it is), in the context of computers generally and operating systems specifically. Also, I’m only going to talk about Unix-like systems because that’s pretty much the world I am obliged to live in, and certainly the world in which I am trying to solve problems.

Historically, the main approach to security has been ACLs — that is, lists of permissions on objects. In Unix the ACLs apply to files. Or things that look like files (since, of course, everything is a file in Unix). In Unix, ACLs are very simple: read, write and execute for user, group and “everyone else”.

So what is wrong with this picture? The core problem is that it works for the era it was designed in — the 60s — that is, mainframes shared by various professional users of computers, who mostly wrote their own software, or at least understood pretty well what the software they used did — when the goal was to protect users’ files from other users (and share them when that was appropriate) and protect printers and things like that from users trying to use them without going through the appropriate system (which likely did some accounting, for example).

The modern world doesn’t look like this at all. All the files on a typical computer belong to a non-expert user (for simplicity I am ignoring shared devices — this doesn’t really undermine the argument as I hope you will see). Indeed, the whole computer typically belongs to a single user. Printers do not need accounting and similarly belong to the same user. The enemy is the software that is running on the machine. Users no longer have a good understanding of the software they run. Software is enormously complex and uses all sorts of resources, many distributed over multiple systems, to accomplish their tasks. And frequently their task is only superficially in service of the user.

In short, the old threat model was untrusted tenants, trusted software, unit of protection is files and devices. The new threat model is trusted tenants, untrusted software, unit of protection is individual data items.

Yet we are still trying to bend the security systems of the 60s to fit this new world. It doesn’t work very well, because the controls we need to exercise are no longer at the levels of files or devices. They are now really at the level of services and data.

ACLs in Unix have nothing to say about services and data. Even if they did, they would have to be very different kinds of ACLs from the ones we are used to.

So what do we do about this? We make containers. We say “well, there’s just too much going on on the whole computer to reason about what this software can do, so let’s isolate it from the computer and just give it the stuff it needs in a container”.

This is the start of a great plan.

But what happens next? Next we say “oh, but now none of my software works anymore — this container is too contained, libc doesn’t work”. But this is easily fixed! Let’s emulate the existing system. Now everything works again, and we didn’t have to rewrite anything. Genius!

Let me tell you something that will surprise you: this doesn’t solve your problem. Why not? Well, obviously, because now you’re back to the same system that you were trying to fix in the first place. Sure, you’ve arranged for it to definitely not have access to a whole bunch of stuff it shouldn’t have access to, but it turns out libc and friends need an awful lot of stuff you might not think they need. Look at this example for using a chroot jail to confine a user to their home directory, for instance. And in any case, confining access to files doesn’t really help: ultimately pretty much everything wants network access, which in POSIX you either have or don’t have. Or access to more abstract things like address books, photos and so forth (yes, these are files, but you try organising users and groups to give useful controls over multiple applications’ use of them — and, of course, using file-level controls to grant access to a subset of addresses, say, is even more painful) And once it has that stuff, it’s game over and the container did nothing for you.

And this happens at every level of containers. Virtual machines — great idea, let’s confine things by making it look like they have their very own computer. Oh noes! Linux doesn’t run anymore — but we can fix that: simples, just emulate every single piece of hardware in the original computer. What could possibly go wrong with that?

Enclaves — the tightest and bestest containers we know how to make. And what’s everyone’s first step? Yep, emulate POSIX.

WebAssembly — a portable virtual machine that starts with a great premise: no ability to do anything outside the VM by default. But obviously we’ve got to fix that. And so we have WASI. Not quite POSIX but really not far off. OK, it has incorporated some good ideas on how to make POSIX less bad (and since I worked on some of those ideas, I guess I should like it a little), but we have a clean slate. Why continue to beat this dead horse?

THIS HAS GOT TO STOP.

But what should we do instead? As I said, POSIX works for the era it was designed in, when the primary concerns were files and devices and the enemy was the users. In the current era, the primary concerns are around data and the enemy is the software. Given that framing, it seems obvious: APIs and control over them should be at the level of the data they handle. Software should not be given raw access to files containing databases but instead to APIs that understand what the data is and how it is found, consumed and manipulated. “ACL”s should be defined in terms of these APIs, not the low-level objects their providers consume.

Another way to think about this is we need human-centred APIs. That is, APIs that are at a level that is meaningful to those who need to exercise control over them — namely, mostly, end users. We also need ways to express how those APIs are used — I would hesitate to call these ACLs, which implies a very simplistic kind of control. I think we need policies that govern the use of these APIs, and those have to be quite flexible — they may talk about who can do what, but they might also say what kind of data can be consumed by the API (and hence the service behind it) or what can be done in the further processing of that data by the service.

Containers should offer only these high level APIs to the contained applications and services — and the principle of least authority should be applied: only the APIs needed to do the job should be available at all.

We already see this happening to some extent on mobile — iOS and Android talk about access to contacts, to photos, to location, not to files or devices. But they’re still pretty silent on services provided over the network — translations, image recognition, map tiles and so forth.

Of course, the downside of this proposal is that instead of a quite thin API (I used to have a copy of POSIX on paper, and “thin” is not really the way to describe it — if you dropped it on your foot, it would probably break it — but at least the subset ACLs apply to is fairly thin) you end up with an API for every kind of data or service, plus an ability to express policies for the APIs.

So how is this going to work? It seems clear that expressing policies in a way that is rigorous enough to make them verifiable and enforceable is going to be too technical to expose end users to directly. This suggests that there is going to be a role for expert review of policies. It also suggests that delegation to experts needs to be a first class aspect of the system.

Bottom line is: I don’t know how exactly we’re going to make this work, but it’s time we started thinking about it, because it’s the only way we’re going to fix our security problems. Certainly what we should not be doing is continuing to develop software that relies on an unsecurable substrate.

Acknowledgements: thanks to Maritza Johnson, Michael K. Sanders, Tiziano Santoro and Triona Butler for valuable feedback.