In defense of unikernels
It is no secret that I’ve been hanging around unikernels street gang for quite some time dabbling in things like OSv and Rump kernels, giving talks at conferences and helping evangelize this piece of technology as a very useful building block for cloud-native applications and distributed services. Or as a good friend of mine Bryan Cantrill would have put it: I’ve proven to be an incurable case of unikernel chain smoker. In fact, Bryan was one of the first notable system designers to come up with a very thoughtful analysis of tradeoffs that unikernel approach needs to grapple with. I have to say that his analysis is superb in identifying potential downsides to using unikernels for deploying arbitrary applications. At the same time, if all we have to care about is next generation cloud-native (12 factor) applications managed by a modern PaaS system the majority of his objections fade away. In fact, this is the biggest use case that I see for unikernels: a hugely valuable optimization technique for a PaaS layer allowing it to create much more effective units of deployment (or droplets in Cloud Foundry speak). In other words, if you think unikernels are meant to be run on bare metal as a replacement for a single-node OS than you clearly deserve a “Cantrill’s Therapy”. That much we can all agree on.
Now, since that first exchange with Bryan, I’ve come to realize that this change of perspective from a single node operating system to a massively distributed PaaS as a fundamental application execution environment that needs to be reasoned about as a whole has to be postulated upfront for the rest of discussion to make sense. So here it is. Postulated!
Another minor point is that outside of memory-managed code execution runtimes (JVM, Go, etc.) the usefulness of unikernels starts to rapidly decline. And while this does not, strictly speaking, change the rest of my argument, I just wanted it to be known for the record (since it does provide additional bit of guarantees even if in a belt-n-suspenders kind of a sense).
Randy is not shy about disclosing his biggest beef right from the get go:
Here’s the punch line: With unikernels, your application is running in the same space as the kernel (ring 0). Any buffer overflow could mean giving remote access to the system, and that’s probably a bad day for everyone.
Of course, nobody is disagreeing that a remote exploit of any piece of application functionality hardly qualifies for an awesome day. That said, the right question to ask while comparing unikernel-based application deployment with a traditional one, is whether that day gets any worse with unikernels.
So here’s the punchline: it doesn’t. It really doesn’t get any worse in a sense that the security implications you have to think about remain the same with or without unikernels being part of the equation.
Before I can convince you that unikernels don’t make things worse, lets recap some of the constraints that they and our assumed application architecture guarantee us:
- Applications run on the PaaS as a series of networked micro-services orchestrated by the platform. The PaaS “statically” links user-level code with the unikernel (and other libraries) into an immutable image (droplet) that ends up being the unit of execution. All the plumbing (network I/O, log and metrics collection, etc.) is completely outside of either user-level code or the unikernel.
- Fundamentally, unikernels do not multiplex between different, isolated processes running with different privileges. The very notion of a ‘local user’ is meaningless in unikernel land. Whatever the virtualized hardware that hypervisor (of any kind) exposes to the unikernel it can be accessed without any restrictions by any thread running within a unikernel-based image.
- There’s no persistent mutable state available inside of a unikernel image. An application may see something that looks like a filesystem or a block device, but those will either be immutable or transient. In other words, nobody in their right mind will have an NFS-client inside of a unikernel image.
- The way that application deals with state in such an environment is exclusively by network I/O.
So with that in mind, image yourself as an attacker who has just successfully perpetrated a remote exploit. What does it really get you? Since there’s no other “processes” of higher privileges you can’t really hijack anything but the microservice itself. Sine there’s no local state of any significance you can’t really snoop on any data, but the data that is specific to the microservice itself. And, of course, since there’s no persistence of any local kind you can’t even proliferate beyond the single instance of the microservice that you’ve gotten access to. There’s no denial that you do get access to the full memory image. That gives you access to any secrets that this particular instance of a microservice had access to. However, this is no different from what happens to a microservice deployed in a traditional way: once you’re in — that memory image is yours (even though other processes running on the system are still protected by the OS). In other words, even though you may see yourself having a full, uncontrolled access to the hardware (ring 0, remember?) you’re severely sandboxed and can’t do much damage. The situation is no better or worse compared to a traditional process-based microservice deployment.
Literally the only extra capability that exploiting a unikernel-based microservice may give you over exploiting a microservice deployed as a process is an ability to send an receive arbitrary traffic. Is it a big deal? Is Randy right after all? I actually don’t think so and here’s why:
Lets tackle send first. It is true that you can attempt to construct arbitrary packets possibly as low as Level 2 of the OSI model. You can try to use this ability to either flood the network or as a jump-off point for attempting further remote exploits in the system. In other words, you can either construct garbage traffic or traffic that looks legitimate (given the role that the exploited microservice is expected to play in an overall system) but is poisonous. And while both of these are legitimate concerns these are not new concerns in a multi-tenant distributed environment with potentially adversarial tenants. The former concern is why things like Amazon Security Groups exist and why they are enforced outside of the firewall built into the kernel running inside guest VMs. Unikernels or not, you have to have that type of a distributed firewall and once you do: unikernels bring no additional danger. The traffic that looks legitimate but is poisonous to the application has to, of course, be dealt at the application level itself. But once again, this is no different from a process-based (as opposed to unikernel-based) microservice being hijacked. It is a concern that distributed systems engineers have to solve for (and they try to!).
Receive is actually a much simpler case. Because unikernels don’t need to multiplex between different processes listening on the same interfaces whatever traffic reaches our virtualized NIC was guaranteed to be sent to our microservice. That guarantee of routing is supported by PaaS knowledge of the application topology and enforce by a virtual firewall that we agreed has to be there no matter what.
So there you have it! The day doesn’t really get any worse than the day on which an exploit of a traditionally deployed microservice happens. Provided, of course, that we assume a distributed system-level Infosec view and architect our overall platform accordingly.
And as a parting thought: I agree with Randy and Brian that unikernels are not a panacea. Nothing is. But they are a very useful building block that doesn’t need any additional FUD. If you really want to fight something that is way overhyped you know where to find linux containers.
P.S. The reverse applies to us, unikernel fan boys. Randy and Brian are right, we really have to dial-down the rhetoric and claims of what a traditional OS kernel can not do. If a few presentations on unikernels I’ve seen lately made me (a fanboy!) cringe we really need to pay attention to how we construct our ‘pro’ argument.