How to Throw a Tantrum in One Blog Post
This post has been updated to reflect feedback from the NetworkManager team that is has maintained a stable ABI/API for years now. I’ve also opened up comments to non-friends. And now I’m moving this to Medium from Facebook.
Another couple updates: Gentoo is a major distribution, and the graduation date is about experience, not age.
The systemd team has recently patched a local denial of service vulnerability affecting the notification socket, which is designed to be used for daemons to report their lifecycle and health information. Some people have used this as an opportunity to throw a fresh tantrum about systemd.
What makes it a tantrum? It’s a tantrum when you use a minor security issue as justification to rant about everything remotely related to systemd and insist on radical changes (throwing out systemd) to address what are mostly fixable quibbles — at least the quibbles that were based on facts or good judgment in the first place.
Not only is the current security issue among the lowest risk classes by being local-only and denial-of-service (versus information disclosure or privilege escalation), but most of Ayer’s claims are either wrong or misleading. Let’s dissect this post, starting with the sensational headline. Then, we’ll look at how Ayer is missing the point.
How to Crash Systemd in One Tweet
Twitter is a network service, implying a remote vulnerability of some sort. The only substance to “in one tweet” is that the attack is under 140 characters in most forms. It’s otherwise just clickbait.
After running this command, PID 1 is hung in the pause system call. You can no longer start and stop daemons. inetd-style services no longer accept connections. You cannot cleanly reboot the system. The system feels generally unstable (e.g. ssh and su hang for 30 seconds since systemd is now integrated with the login system). All of this can be caused by a command that's short enough to fit in a Tweet.
Edit (2016-09-28 21:34): Some people can only reproduce if they wrap the command in a while true loop. Yay non-determinism!
The bug is remarkably banal. The above systemd-notify command sends a zero-length message to the world-accessible UNIX domain socket located at /run/systemd/notify. PID 1 receives the message and fails an assertion that the message length is greater than zero. Despite the banality, the bug is serious, as it allows any local user to trivially perform a denial-of-service attack against a critical system component.
Most of these claims have accuracy and substance, but “feels generally unstable” is, of course, subjective. There are some services that attempt to use systemd but will time out in 30 seconds (by default) if it is unavailable. These facilities are degrading gracefully, which is exactly what should happen.
The immediate question raised by this bug is what kind of quality assurance process would allow such a simple bug to exist for over two years (it was introduced in systemd 209). Isn't the empty string an obvious test case? One would hope that PID 1, the most important userspace process, would have better quality assurance than this.
The systemd project applies both unit testing and static/dynamic analysis to systemd. We’ve done this for years; I ran the first Coverity scans myself. Testing inputs of empty strings, excessively large data structures, and other invalid permutations is the realm of fuzz testing, which is a recent project even for the Linux kernel. Despite Linux being used for critical systems for decades, fuzz testing only began as side-projects “in beta” in 2007 and more earnestly in 2013. It’s clearly a valuable technique, but implying that comprehensive testing of invalid inputs is “obvious” is misleading about the state of major projects.
Unfortunately, it seems that crashes of PID 1 are not unusual, as a quick glance through the systemd commit log reveals commit messages such as:
coredump: turn off coredump collection only when PID 1 crashes, not when journald crashes
coredump: make sure to handle crashes of PID 1 and journald special
coredump: turn off coredump collection entirely after journald or PID 1 crashed
Engineering crumple zones on cars isn’t evidence that they handle poorly. Likewise, improving coredump handling for systemd isn’t evidence that it crashes frequently. Considering that systemd runs by default on every major distribution (with the exceptions of Slackware and Gentoo, the latter of which still makes systemd an option), even hardware failures will cause some quantity of coredumps. So, even if we’re perfect, we still have to handle them properly.
My company runs hundreds of servers with systemd, and coredumps of it haven’t been an issue despite years of heavy production use. Nor are Fedora’s crash-reporting systems frequent recipients of data for systemd PID 1. Nor is the Red Hat support team here at the systemd conference concerned about widespread crashes as the primary escalation from RHEL users. Nor is PID 1 stability what Facebook’s infrastructure team raised in their presentation yesterday about running systemd at scale.
Did you know Cisco is running systemd on all of their XE devices now? The issue they highlighted yesterday at this week’s conference relates to propagation of service restarts when a particular service fails. That’s where their mind is at, not PID 1 crashing.
The issues of heavy users tend to be more subtle: scaling well for many units, adding options to support new use cases, and supporting good integrations with other utilities for logging and orchestration. I presented on my company’s challenges in these area’s at last year’s conference.
It always amazes me how little overlap there is between posts like Ayer’s and the issues real uses actually experience.
Systemd's problems run far deeper than this one bug. Systemd is defective by design. Writing bug-free software is extremely difficult. Even good programmers would inevitably introduce bugs into a project of the scale and complexity of systemd. However, good programmers recognize the difficulty of writing bug-free software and understand the importance of designing software in a way that minimizes the likelihood of bugs or at least reduces their impact. The systemd developers understand none of this, opting to cram an enormous amount of unnecessary complexity into PID 1, which runs as root and is written in a memory-unsafe language.
These accusations are true for every major production kernel (Windows, Linux, and BSD) and every alternative to systemd (in the sense that they’re almost all written in C and run many of their operations as root). Whether systemd has an “enormous amount” in PID 1 is, of course, subjective, but Ayer spends the rest of the post muddling various scopes:
- Things actually in PID 1. I think the denial-of-service vulnerability and the umask criticism are the only things actually in PID 1.
- Things that interact regularly with PID 1 over a bus or socket but are isolated by process and privileges. This includes a handful things, like logind and the journal.
- Things that literally only share the git repository with the rest of systemd. That is, they don’t run in PID 1 nor are they even expected to be installed, available, or enabled by the core parts of systemd. Most of Ayer’s criticisms are about components in this category.
Some degree of complexity is to be expected, as systemd provides a number of useful and compelling features (although they did not invent them; they were just the first to aggressively market them).
The systemd project has always been open that the service-management design is based off of Apple’s launchd and Solaris.
Whether or not systemd has made the right trade-off between features and complexity is a matter of debate. What is not debatable is that systemd's complexity does not belong in PID 1. As Rich Felker explained, the only job of PID 1 is to execute the real init system and reap zombies.
Felker’s post is riddled with misleading suggestions and outright errors, like saying systemd has a requirement for “reboot to upgrade” either for itself (which isn’t the case because of daemon-reexec) or creates such a requirement for other parts of the system (which is different from systemd supporting restart-for-update features at the request of distributions). I’m not the first to point out Felker’s errors, but he still hasn’t corrected them. Ayer’s unqualified reference to Felker’s discredited post doesn’t make me trust his judgment on this topic much.
Furthermore, the real init system, even when running as a non-PID 1 process, should be structured in a modular way such that a failure in one of the riskier components does not bring down the more critical components. For instance, a failure in the daemon management code should not prevent the system from being cleanly rebooted.
Daemon management (via orderly shutdown of each one) is the main thing that differentiates a clean from a dirty reboot. It’s inaccurate to suggest a clean reboot is possible if daemon management has failed.
In particular, any code that accepts messages from untrustworthy sources like systemd-notify should run in a dedicated process as an unprivileged user. The unprivileged process parses and validates messages before passing them along to the privileged process.
This is a reasonable request, and I’ve been pushing for more modularity around parsing for a while.
At the same time, Ayer doesn’t acknowledge the severe issues around the ambient authority required for an isolated service that can mark any daemon as failed. In practice, such authority creates another vector for a denial-of-service on PID 1, just like the issue we’re talking about today. Linux lacks a proper capability-based security framework to mitigate those vulnerabilities, and it’s not systemd’s place to invent one. (Even then, some systemd folks are working on one outside systemd.)
Let’s say systemd took an alternate approach by receiving the message itself but invoking another process to parse it. Because the issue was an assertion that the message was more than zero length, it’s still likely such an assertion would have occurred before handing over the string for further parsing. And then we’d still be here today with Ayer complaining about PID 1. (I’m not saying it wouldn’t reduce the attack surface overall, just that we’d probably still have this particular vulnerability.)
This is called privilege separation and has been a best practice in security-aware software for over a decade. Systemd, by contrast, does text parsing on messages from untrusted sources, in C, running as root in PID 1.
It’s a stretch to use the label “parsing” for what is mostly a string comparison against a fixed number of possibilities. At most, PID 1 converts the part of the string after the equals sign to a number. We’re not exactly talking YAML or XML here. Passing more advanced data structures to systemd (that it wouldn’t have to “parse”) comes with its own vulnerabilities, as we’ve seen for some kernel syscalls.
If you think systemd doesn't need privilege separation because it only parses messages from local users, keep in mind that in the Internet era, local attacks tend to acquire remote vectors.
No one in systemd is claiming that local attack vectors aren’t important, but they’re usually less severe.
Consider Shellshock, or the presentation at this year's systemd conference which is titled "Talking to systemd from a Web Browser."
That presentation is from the Cockpit project, which isn’t part of systemd and doesn’t even share its main developers. It’s as much “remote access to systemd” as using SSH and then invoking systemctl.
Systemd's "we don't make mistakes" attitude towards security can be seen in other places, such as this code from the main() function of PID 1:
/* Disable the umask logic */
if (getpid() == 1)
This is a fair point, but it also isn’t a fundamental design element worthy of calls to abandon systemd. As most, it would justify a call to fork systemd and reverse the umask default.
There’s also a complex juggling act between Linux’s process-centric umask model and the threading that systemd uses; it’s not as simple as using a 777 umask and making explicit changes everywhere a file gets opened. But, I don’t think Ayer’s ever implemented an init daemon or another major project with low-level, production Linux code, so he’s probably not familiar with the issues there.
Go and Rust are compelling, safe languages for writing the type of systems software that has traditionally been written in C.
Go is not a compelling language for writing something at the PID 1 level because systemd has to support embedded devices and early boot. Rust has serious questions around its long-term sustainability, given how Mozilla both presides over it yet barely uses it. Higher-level orchestration systems like Kubernetes and CoreOS do use Go effectively, but they don’t attempt to run on small, embedded hardware.
Ayer graduated in 2012, and his largest non-academic project is SSLMate. It often takes a bit of experience to understand why you would might use something like C over a language like Rust. And, having never worked on anything in the vein of init, PID 1, or anything similar in even those few years since leaving academia, I don’t think he’s yet in a position to say what the most compelling language and framework choices are for low-level system programming.
Update: Many people think this is an attack on the correctness of Ayer’s claims, which would be an ad hominem attack; that is not the intent of the preceding paragraph. My point is that safety isn’t the only reason you pick a language, and Ayer should not be saying something akin to “you need to pick a safe language, end of discussion.” If he were more experienced — especially in init systems — I could possibly buy that he’s considered the necessary pros and cons, and I’d find his behavior less egregious. Again, this is not about whether his claims’ correctness but his premature and unjustified shutdown of any alternative views. He should be more open to his opponents having an argument he hadn’t considered. Inexperience doesn’t make someone wrong, but it does mean they should show openness to being wrong, which he’s shut down.
He hasn’t even demonstrated sensible choices for his own projects; his website’s CMS is a custom one written in C++. So, he’s not using “safe languages” for even his public network services. It’s not like he’s trying to have other people run his CMS, so it’s apples and oranges, but it does seem a little hypocritical.
It’s also strange for Ayer to suggest, with his inexperience, some very modern languages systemd should use while referencing Felker’s post, which calls us to look to experience for tried-and-true designs — and provides its own example in C.
Ayer’s work shows lots of promise — he’s clearly a talented security and software engineer — but there are a lot of realities in picking languages that don’t come down to being academically the best.
Systemd is far more than an init system: it is becoming a secondary operating system kernel,
Calling it a “second kernel” implies one process space, at least in the monolithic world of Linux…
providing a log server, a device manager, a container manager, a login manager, a DHCP client, a DNS resolver, and an NTP client.
…but all of these (with the exception of the device manager, udev) run as less-privileged, separate processes when they’re enabled — and most aren’t enabled by default. What they do share is a git repository with systemd’s PID 1 implementation. Is this guilt by git association? BSD manages its kernel, init, and other core utilities all as one project, but that doesn’t say anything about security.
These services are largely interdependent and provide non-standard interfaces for other applications to use.
Seemingly aware that most of what he’s mentioned above is actually isolated (both through processes and privileges), Ayer goes for the weasel word “interdependent” and the term “non-standard.” But, in all of the mentioned cases, systemd:
- Also supports the standard interface if there is one. In the case of the DNS resolver, for example, the normal blocking API will still go to systemd’s resolver (assuming someone turns it on). It’s just that almost everyone agrees that blocking DNS lookups are awful.
- Provides a “non-standard” interface because no standard exists for, say, asynchronous DNS resolution. Ayer is misleading here because the term “non-standard” suggests there was a standard to use in the first place.
Through all this, he doesn’t provide any concrete reasons why the APIs systemd has created are wrong or insufficient or prevent an alternative implementation. He just says that they’re non-standard — despite the absence of any applicable standard.
This makes any one component of systemd hard to replace, which will prevent more secure alternatives from gaining adoption in the future.
Systemd delivers these “non-standard” interfaces over a bus or socket in a way that opens up alternative implementation more than before, where there was usually just a library or an in-project implementation, like with GNOME sessions. There’s a reason systems like GNOME have chosen to build on systemd’s session APIs, and it’s not because there was a good, existing, modular choice.
Consider systemd's DNS resolver.
This is one of systemd’s newest projects, isn’t used by any major distribution yet, doesn’t ship as enabled by default, and is still getting hardened for security (e.g. the recent addition of DNSSec). Implementations like Unbound are great, but they’re not good libraries. Distributions have also been unsuccessful in integrating more secure DNS resolution via clever daemon configuration.
An offhand remark from Poettering about the code being “pretty complete” doesn’t change the fact that project still doesn’t think the resolver is hardened enough to ship it as enabled by default. In fact, the project is still asking what gaps we need to close before we’re willing to take that step. And, like most of systemd’s bundled services and utilities, it’s modular and has no effect on the security of other components when disabled. This is more guilt by git association.
Although systemd doesn't force you to use systemd-resolved, it exposes a non-standard interface over DBUS which they encourage applications to use instead of the standard DNS protocol over port 53.
“Encourage applications to use” is a really weird interpretation of a documented interface to a component that isn’t enabled by default. MySQL ships a low-level, non-SQL socket for data access that’s documented and disabled by default, and I don’t think anyone would suggest they encourage applications to use it.
Ayer Misses the Point
In some ways, Ayer makes bigger mistakes with his omissions, particularly for all the ways systemd has improved service hardening. I think a lot of the gap comes from the academic perspective on there being no major distinction between what’s possible and what’s easy. A lot of what systemd delivers isn’t “new” in the sense of systemd inventing it or even being the first to make it possible on Linux systems. What systemd does is make best practices in daemon operation easy, which makes a remarkable difference in the real world. This is not an accomplishment to gloss over.
For example, did anyone seriously do privilege separation for low-port binding before systemd? I don’t mean just inetd. I mean support for multiprocess, persistent servers. With systemd’s socket activation, it’s not only possible to avoid starting everything from the Apache HTTP server to Samba as root. It’s now easy. We don’t have to trust every low-port daemon to properly drop starting privileges any more. Distributions have started shipping multiple network services using this model, which they weren’t doing before.
Services built with systemd are also finally choosing to drop unnecessary capabilities, namespace themselves (beyond just chroot), and consider thresholds where resource consumption may be out of control. People could do these things before systemd, but did anyone — I mean, outside VMs or the current generation of containers that postdate systemd — actually do so?
And how many daemon authors implemented proper mandatory access control (e.g. selinux, Smack, etc.) when, absent the journal and other tools, they would have to write rules and labeling requirements for many parts of the file system?
Getting down to gritty administration realities, how many system administrators were dropping the root privileges at all in their ad hoc daemon setups? Ayer acknowledges that what systemd has replaced was generally awful, but much of systemd’s actual PID 1 complexity comes from putting powerful sandboxing tools directly in the hands of daemon authors and system administrators. Even my company’s mitigation response (which I won’t publish here) for the notify socket vulnerability was made possible by adding a configuration line to a few units.
Many of the services systemd helps to harden don’t just have local attack vectors that might become remote ones. Many of them are network services, and they’re not just a foot in the door for other attacks but the keepers of data attackers want. Ayer is fighting one of the most powerful tools we have to harden the front lines against the real attacks we see every day.
Finally, touching on modularity, systemd has implemented networkd, which provides network configuration via structured files and D-Bus. This has the potential to replace the lower levels of multiple systems that implement their own configuration application layer, bringing us into a nicely decoupled design where one project (systemd) owns the configuration API, application, and persistence and another (NetworkManager) owns the user-facing experience — which is really hard to get right just on its own. This promises to unify Red Hat-style environment scripts, interactive server administration, and the desktop Linux experience.
If Ayer cares as much about modular design and replaceable components as he claims, then he should be cheering on at least some of systemd’s work in making security more usable and major subsystems more modular. Instead, he’s cherry-picking what he considers negatives (some of which, I admit, are legitimate criticisms) and then calling for a complete replacement of systemd. This is what turns a critique into a tantrum.