Why fix Kubernetes and Systemd?

Glaciated mountain basin and remnants of glacial terminus and moraine.

Looking at Systemd

There is a lifetime of arguments both for and against systemd. I wouldn’t be foolish enough to try to pack all of those arguments into this article. The point I am trying to make here is that for the most part systemd is great.

  • Monolithic architecture.
  • High barrier to entry — proprietary tooling and interfaces.
  • It assumes there is a user in userspace. (EG: Exposing D-Bus SSH/TCP)
  • Bespoke/esoteric mechanics with the toolchain. There is a lot to learn, with bespoke client D-Bus stacks.
  • IPC mechanisms with D-Bus aren’t built for multi tenancy interaction.
  • Some of the assumptions systemd makes about an environment break down inside a container. EG: pid namespaces and running as pid 1.

Service Duplication

Now — a problem I do feel safe calling out is where Kubernetes started duplicating functionality of systemd. Kubernetes also approaches scheduling, logging, security controls, rudimentary node management and process management a cluster level. The outcome is yet another fragmented system where the true control plane is unknown. In my opinion Kubernetes has re-created many of the same semantics of systemd — only at the distributed/cluster level. This is problematic for both operators as well as engineers building on top of the system.

Venn diagram of Kubernetes (kubelet) and systemd.

The Kubelet

Kubernetes has a node daemon known as the kubelet that runs on each node in a cluster. The Kubelet is the agent that manages a single node within a broader cluster context.

So what schedules the kubelet and keeps it running?

That would be systemd.

So what schedules the node services that the kubelet depends on?

That would be systemd.

Diagram of a simple Kubernetes deployment with Systemd.

Simplicity

The node should be simple. Managing node services should be simple. We should manage node services the same way we manage cluster services.

Sidecars

While there isn’t necessarily anything intrinsically wrong with running sidecars themselves, I do wonder if the uptick in sidecar usage is remnants of an anti-pattern? Do we really need to inject logic along side an application in order to accomplish some lower level basics such as authentication, service discovery, and proxy/routing? Or do we just need better controls for managing node services from the control plane?

Sidecars should be Node Controllers

Aurae calls out a simple standard library specifically for each node in a cluster. Each node will be re imagined such that it is autonomous and designed to work during a connectivity outage on the edge. Each node gets a database, and will be able to be managed independently of the state of the cluster.

Diagram showing options of various workloads running with Aurae.

Networking

By simplifying the node mechanics and bringing container runtime and virtual machine management into scope we can also knock a few other heavy hitters that have been ailing the Kuberenetes networking ecosystem for some time.

  • IPv6 by default.
  • We can support network devices as the primary interface between a guest and the world.
  • We can support multiple network devices for each guest.
  • We can bring NAT traversal, proxy semantics, and routing into scope for the core runtime.
  • We can bring service discovery to the node level.
  • We can bring authentication and authorization to the socket level between services.

Security

I promised myself I wasn’t going to put the word “Security” in a box and say that was going to be enough. I want to explain how this system will potentially be safer.

More Possibilities

Imagining a “cluster aware” or “API centric” process scheduling mechanism in a cloud environment is exciting.

Pausing/Debugging with ptrace(2)

For example systemd integrates will with other low levels of the stack such as ptrace(2). Having a cloud-centric process manager like Aurae means we could explore paradigms such as pausing and stepping through processes at runtime with ptrace(2).

Cleaner Integrations with eBPF

We can explore eBPF features at the host level, and namespace virtualization level. All of this could be exposed at the cluster level and managed with native authn and authz policies in the enterprise.

Kernel Hot Swapping with kexec(8)

Even mechanisms like Linux’s kexec(8) and the ability to hot-swap a kernel on a machine could be exposed to a control plane such as Kubernetes.

SSH Tunnels/Services

SSH tunnels are a reliable, safe, and effective way of managing one-off network connections between nodes. The ability to manage these tunnels via a metadata service to enable point-to-point traffic is yet another feature that could be exposed to higher order control plane mechanisms like Kubernetes.

More than just Kubernetes

Having a set of node-level features exposed over gRPC would potentially enable more than just a Kubernetes control plane.

Written in Rust

So we started coding the mechanics of these systems out, and we decided to write Aurae in Rust. I believe that the node systems will be close enough to the kernel that having a memory safe language like Rust will make Aurae as extensible as it needs to be to win the Node.

Auraed (The Daemon)

Auraed is the main runtime daemon and gRPC server. The intention is for this to replace pid 1 on modern Linux systems, and ultimately replace systemd once and for all.

Aurae (The Library)

Aurae is a Turing complete scripting language that resembles TypeScript that executes against the daemon. The interpreter is written in Rust and leverages the gRPC rust client generated from the shared protobuf spec. We plan on building a LSP, syntax highlighting, and more as the project grows.

Client Libraries (gRPC)

The client libraries are auto generated and will be supported as the project grows. For now the only client-specific logic such as the convention on where TLS material is stored lives in the Aurae repository itself.

Kubernetes Shim

We will need to build a Kubelet and Kubernetes shim at some point that will be the first step in bringing Aurae to a Kubernetes cluster. We will likely follow the work in the virtual kubelet project. Eventually all of the functionality that Aurae encapsulates will be exposed over the gRPC API such that either the Kubernetes control plane, or a simplified control plane can sit on top.

Aurae Scope

Aurae is liable to turn into yet-another monolith with esoteric controls just like systemd. More so Aurae is also libale to turn into another junk drawer like Linux system calls and Linux capabilities. I want to approach the scope cautiously.

  • Authentication and Authorization using SPIFFE/SPIRE identity mechanisms down to the Aurae socket.
  • Certificate management.
  • Virtual Machines (with metadata APIs) leveraging Firecracker/QEMU.
  • Lightweight Container Runtime (simplified cgroup and namespace execution similar runc, podman, or the Firecracker jailer).
  • Host level execution using Systemd style security controls (Seccomp filters, system call filtering).
  • Network device management.
  • Block device management.
  • stdout/stderr bus management (pipes) (logging).
  • Node filesystem management (configuration files).
  • Secrets management

Getting Involved

The project is small, however the project is free and open. We haven’t established a formal set of project governance yet, however I am sure we will get there in time — especially as folks show interest in the work. For now the best way to get involved is to follow the project on GitHub and read the community docs.

References

Prior art and inspiration for my work includes the 9p protocol from plan9. As well as my previous work with COSI as a cloud alternative to POSIX. Perhaps the most influential inspirations for my work have been around a decade of my life managing systemd and the kubelet at scale.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Kris Nóva

Kris Nóva

Kris Nóva is an author, investor, and computer scientist best known for her work on Linux and Kubernetes. Latest book: https://hackingcapitalism.io