Running WebAssembly on the Kernel

This is the story of our journey running Wasmer on the Linux kernel

Heyang Zhou
May 16, 2019 · 5 min read
Image for post

We have been obsessed with making the Wasmer WebAssembly runtime faster: first by minimizing the compilation time by using caching and then by adding different compiler tiers into the runtime.

As time progressed, we started asking ourselves… What is the fundamental cause of VM-based programs being slower than native ones? Is there any way can we solve it for some specific use-cases?

In this article we will overview what we did to make Wasmer to (optionally) run on the Kernel… achieving over 10% speedup on a WebAssembly tcp echo server over native code! 🎉

Background

“The Second OS”

Many languages and runtimes, including WebAssembly (WASI implementations) and JavaScript (Node.js and browsers), have been trying to build another sandboxed “OS” on top of the real operating system. The second layer, however, incurs a significant overhead in performance.

Image for post
VM running in Ring 3

As shown above, in the traditional architecture, OS service requests (system calls) from a VM-based program have to go through two boundaries, before reaching the kernel.

Neither of those two boundaries are lightweight enough to cross. While a normal function call takes less than 5 nanoseconds, a system call originating from a program in the VM can waste hundreds of nanoseconds.

The Successor to Cervus

I wrote Cervus — another WebAssembly “usermode” subsystem running in the Linux kernel — about a year ago. Back then WASI didn’t exist and neither did any “production-ready” non-Web WebAssembly runtimes, though the Cervus project has proved that the idea was possible and had great potential.

Now, the WASM ecosystem is growing and the Wasmer runtime is in a very good place, so it’s time to build a complete in-kernel WASM runtime for real applications.

Why run WebAssembly in the kernel?

Mainly for performance and flexibility.

Since WASM is a virtual ISA protected by a Virtual Machine, we do not need to rely on external hardware and software checks to ensure safety.

Running WASM in the kernel avoids most of the overhead introduced by those checks, e.g. system call (context switching) and copy_{from,to}_user, thereby improving performance.

Image for post
VM running in Ring 0

Also, having low-level control means that we can implement a lot of features that were heavy or impossible in userspace, e.g. virtual memory tricks, direct hardware access, and handling of intensive kernel events (like network packet filtering).

Security

Running user code in kernel mode is always a dangerous thing.

Although we use many techniques to protect against different kinds of malicious code and attacks, it’s advised that only trusted binaries should be run through this module in the short term, before we fully review the runtime’s codebase for security.

Here are some known security risks and what we did to fix them:

  • Stack overflow: emit bound checking code from the codegen backend
  • Out-of-bound memory access: allocate a 6GB virtual address space for each WASM task so that out-of-bound load/stores cannot even be represented
  • Lack of signal-based forceful termination: set the NX bit on WASM code pages when a fatal signal arrives
  • Lack of floating point register state preserving: explicitly save FP state on preemption with kernel_fpu_{begin,end} and preempt_notifier
  • Red Zone not supported in kernel: avoid using Red Zone in the codegen backend

Examples and benchmark

We have created two examples: echo-server and http-server (living in the examples directory of Wasmer main repo).

When executed with the singlepass backend (unoptimized direct x86-64 code generation) and benchmarked locally using tcpkali/wrk, echo-server is ~10% faster (25210 Mbps / 22820 Mbps) than its native equivalent in userspace, and http-server is ~6% faster (53293 Rps / 50083 Rps).

Even higher performance is expected when the other two Wasmer backends with optimizations (Cranelift and LLVM) are updated to support generating code for the kernel.

Those two examples use WASI (for file abstraction and printing to console) and the asynchronous networking extension (via the kernel-net crate).
Take a look at them to learn how to do high-performance networking in kernel-wasm.

How to run it

Before running Wasmer on the kernel, ensure that:

  • Your system is running Linux kernel 4.15 or higher.
  • Your kernel has preemption enabled. Attempting to run WASM user code without kernel preemption will freeze your system.
  • Kernel headers are installed and the building environment is properly set up.

First, clone the repo:

git clone https://github.com/wasmerio/kernel-wasm.git

Then just run make in the root directory, and (optionally) networking and wasi:

make
cd networking && make
cd ../wasi && make
cd ..

Load the modules into the kernel:

sudo insmod kernel-wasm.ko
sudo insmod wasi/kwasm-wasi.ko
sudo insmod networking/kwasm-networking.ko

When running Wasmer, select the kernel loader and singlepass backend:

Make sure you are running on the latest version (0.4.2) by executing
wasmer self-update

sudo wasmer run --backend singlepass --loader kernel the_file.wasm
Image for post
Cowsay running on the Kernel!

Hope you enjoyed reading this article!

While running WebAssembly in the Kernel is certainly dangerous and not the recommended approach for most use cases, it helped us to experiment and learn from the experience first hand.

If you love WebAssembly as much as we do, please contact syrus@wasmer.io… your next job can be just an email away!

Wasmer

Universal WebAssembly runtime