Ayer vs. systemd, Part 4

Update: Added a section on importance of it all.
Update 2: Added concrete reasons why implementing init in Go or Rust isn’t practical.

This is in response to Ayer’s latest post.

Systemd maintainer David Strauss has published a response to my blog post about systemd. The first part of his post is replete with ad hominem fallacies, strawmen, and factual errors. […] This doesn’t deserve a response beyond what I’ve called out on Twitter.

There is no free pass of that sort, especially without even enumerating what is supposedly a strawman or factually wrong. At least I have some idea what Ayer probably thinks the ad hominem is.

My point about Ayer’s experience is that he’s in no position to shut down the discussion of which language is best and insist on his own criteria. The veracity of the factor he’s offered (safety) doesn’t automatically make it the most important, and Ayer’s inexperience means it would be especially irresponsible to assume he just knows what the most important factor is without more justification. The only ad hominem here is me not giving him a free pass or the assumption that he knows what he’s doing — because he hasn’t shown that through his work. No one gets to have their arguments accepted uncritically, that’s just especially important if you’re not experienced in an area.

Getting back to the actual point, safe languages are great, but only insofar as they get the intended job done. My preferred way to pick a language is first by asking, “What will meet my use case?” followed by “What is the finest language in that first list?” Ayer has effectively jumped to the second without adequate consideration of the first. (I do applaud anyone who finds great languages and makes them available for more use cases; it’s just far too much to expect that of a single project.) And, assuming he at least partly agrees with the methodology above, he should probably know that Go and Rust were precluded from being on the first list by barely existing at the time of systemd’s initial implementation.

Moreover, neither Go nor Rust supports the ability to fork():

So, what are we left with? Having one of them exec() a daemon or supervisor directly, and then handling the actual work of double forking and dropping privileges in C or another low-level language. It’s kicking the can down the road (a theme that happens again for another of Ayer’s suggestions) because the privilege-manipulation code still ends up being in C (or a handful of other languages Ayer wouldn’t like) that has to start as root and either parse complex data structures to figure out what to do or parse the daemon’s configuration itself.

Ayer didn’t even begin to consider these implementation requirements before claiming that his preferred languages would be better choices. He just resorts to a pithy quote about “safe” languages being the future.

he betrays an ignorance of how the very project which he works on uses threads and umasks

Just to ensure I wasn’t missing an opportunity for improvement, I talked with Poettering about options for moving to the restrictive umask this week, and he explained that his primary concern continues to be locking and threading, which I think is justified. No matter what Ayer insists, systemd does launch threads as part of managing parallelism, and those places are some of the same places umask manipulation occurs. Because of the umask scopes on Linux (process-level), manipulating a umask when you are running multiple threads requires locking around both the umask change and the file opening to avoid race conditions.

Right now, there are zero POSIX-style locks in systemd PID 1. Just as Ayer is so concerned with the correctness of memory management, we’re concerned with lock management. Explicit lock management is tricky and can easily introduce denial-of-service issues, the very sort of vulnerability that started this debate.

However, since sandboxing a whole application cannot protect one part of the application from a compromise of a different part, it is ineffective at securing benign-but-insecure software, which is the problem faced on servers.

The boundary of an “application” is arbitrary; the different scopes of applications make that clear. For example, I can run Apache with mod_php, or I can use nginx with PHP-FPM. The second is two “applications” but handles the same tasks. It doesn’t even have to be two projects to use this technique; systemd itself has multiple daemons operating under multiple separate users.

You can start Apache as a non-root user provided someone else binds to ports 443 and 80.

Absent an implementation like systemd’s socket activation, this just kicks the can down the road to something like a proxy that still operates (and is vulnerable) at the protocol level, which does not remove the class of vulnerabilities affecting things like MySQL right now.

Socket activation doesn’t involve systemd processing the traffic, which makes a remote vulnerability:

  1. Unable to affect PID 1 because PID 1 never reads from the socket or connects to clients, at least in the Accept=false case. (systemd does read/write to the sockets in less-common Accept=true mode, but it treats the data as opaque bytes, which is still at far more distance than most proxies, even ones like haproxy.)
  2. Only able to affect the daemon, which never has the opportunity to execute as root for any duration. This is the key to preventing a MySQL-style arbitrary root execution vulnerability.

Or, maybe I’m misreading Ayer’s post and he does mean something designed like systemd’s socket activation. If that’s the case, then I’d like to know about any implementations other than launchd and systemd.

Privilege separation has been used effectively by OpenSSH, Postfix, qmail, Dovecot, and over a dozen daemons in OpenBSD.

It’s interesting that Ayer mentions qmail, because it basically uses its own implementation of socket activation to isolate the low port listening from the daemon implementation. It’s a good example of “possible, but so much effort that few daemons do it” which has now turned into a fairly widespread configuration (at least on Fedora) with systemd.

OpenSSH ships at least on Fedora using systemd’s socket activation. For the others, I’m not sure what they do, but if it’s anything less than a one thing reading the socket configuration plus binding it in a way isolated from reading/loading the main configurations and modules/libraries for the daemon, it’s still vulnerable to the this week’s flavor of MySQL vulnerability.

While systemd’s PID 1 is big, it doesn’t offer the ability to dynamicly load modules or libraries (via configuration or at runtime), preventing it from becoming a victim of the same sort of attack at MySQL.

Having configured Samba and professionally administered Windows networks

Is this a joke, or is Ayer actually appealing to experience after he said my concern about his experience for init implementations “doesn’t deserve a response”?

The reason why Samba needs privilege is not because it binds to privileged ports, but because, as a file server, it needs the ability to assume the identity of any user so it can read and write that user’s files.

No, that’s only one reason why one component would need such access. This line of argumentation is particularly strange given how critical Ayer is of how much systemd has packed into PID 1 and his advocacy of separation-of-privilege. There’s no reason Samba should get a pass for running most things as root, and the very first thing I would not run as root would be the protocol handler (which, as Ayer points out for even systemd’s minimal parsing, happens in an “unsafe” language).

In any case, it does not help to have more reasons (low port binding) that require parts or all of Samba to run/start as root. With socket activation and the right permissions, no part of Samba would have to start or run as root with the sole exception of a file reader/writer (or even just file owner adjuster), and even that is assuming Samba persists files to a POSIX-style backend, which isn’t strictly necessary for providing Windows-compatible file shares.

Samba doesn’t matter, anyway. I picked it because it binds to a well-known low port and mostly doesn’t need to run as root. You can literally pick almost any other daemon running on a well-known port under 1024 and apply the same argument.

Even under systemd’s most restrictive sandboxing, an attacker who gains remote code execution in Apache would be able to read your entire website, alter responses to your visitors, steal your HTTPS private keys, and gain access to your database and any API consumed by your webapps. For most people, this would be the worst possible compromise, and systemd can do nothing to stop it.

Ayer seems to have a limited imagination for the “worst possible” attack, as the one above is still categories below the top NIST ones. Perhaps this is why he thinks the systemd vulnerability was so severe, despite NIST’s criteria scoring putting it at an overall 3.5 out of 10.

To truly hit severe NIST territory, a vulnerability must become a gateway to more systemic compromise or system integrity loss. The sandboxing supported by systemd prevents the sort of attack Ayer mentioned above from escalating privileges to root, which can lead to those deeper compromises. This isn’t theoretical territory; without early sandboxing, you could face a remote root escalation via a MySQL vulnerability from this week.

But, it’s not just about data leaks from the initial, compromised stack. Escalating to root, finding additional credentials sitting on the box (beyond those used by the web application), and then attacking other systems is how several of the email dumps from WikiLeaks happened, which the sort of multi-stage attack NIST includes in their most severe categories.

Those web applications didn’t access those email systems over “APIs consumed by the webapps”; privilege escalations on the hosts enabled more general network and credential access beyond what the Apache user would have had, even with arbitrary code execution. Were the admins sloppy? Sure, but admins make mistakes, and defense-in-depth is what prevents catastrophe. When admins make mistakes, reuse passwords, or forget to segment networks properly, you’re a lot worse off once an attacker gets root, not just arbitrary unprivileged execution.

Here’s a more likely scenario: your MySQL box uses Percona XtraBackup to generate snapshots and upload them to S3. But, the admin setting up the backups didn’t ensure the S3 key on the machine is write-only (or maybe assumed being able to read backups was necessary for restores or restore-testing). Now, the root exploit has put even your backups in danger.

Privilege escalations are also an issue for “farm” deployments like database-as-a-service or Heroku. You have to be in a specific sort of market to want tools like that, but there are efficiency gains from not using VMs for each, and the result of any hosted instance gaining root with code execution would be dire.

but in today’s world of single-purpose VMs and containers, that protection is increasingly irrelevant.

The Rocket container system is built on systemd, and others, like Docker, employ the same kernel facilities as systemd (but directly). Docker also integrates with systemd components like the journal.

For protecting the “data attackers want,” systemd is far from a “powerful” tool.

It certainly has provided me with tangible mitigations to vulnerabilities, including both the aforementioned MySQL vulnerability (via early privilege dropping) and the systemd one Ayers found (via namespacing). Again, what makes its powerful isn’t its originality but that it puts it at my fingertips, which is important when people like Ayer skip responsible disclosure and admins have to act quickly before package (or even upstream) updates are available.

Systemd’s own documentation says “Usage of this API is generally recommended to clients.”

Ayer deeply misunderstands the docs here. That recommendation is in the context of “If you’ve chosen to use systemd’s resolver, this first API is recommended over the other two.” All the reasons it’s recommended are distinctions versus the other two ways of accessing systemd’s resolver (which both work but either block or don’t support the full feature set).

This is the equivalent of recommending InnoDB over MyISAM, and Ayer presenting it like we’re recommending MySQL. Or like we said a scoop of ice cream is better for you than a shake, and Ayer comes back saying we recommend eating ice cream (at least that conclusion wouldn’t be wrong, despite the flawed logic to get there).

Moreover, this recommendation occurs within the bullet point for that first API method, not in the general resolver docs. It’s completely reasonable, even necessary, for the systemd project to make recommendations on best practices for systems before those systems are used by default (which the resolver continues to not be for /etc/resolv.conf or anything similar, even by systemd’s default build).

The recommendations mentioned don’t even make sense versus other resolvers because they don’t highlight the substantive distinctions between the systemd resolver and more mature ones, only the distinctions between the three APIs supported for the systemd one. DNSSec validation information isn’t a distinction of systemd’s resolver (it’s supported by plenty of other resolvers); it’s something you get by using the recommended API for systemd’s resolver but won’t if you resort to the traditional method.

I’m not sure Ayer is stooping to cable news-style sound bites taken out of context, but it’s getting pretty close. He clearly has an ax to grind: he wouldn’t take my word that we don’t generally recommend or enable the resolver yet (backed up a a link to the session at the conference that literally discussed the topic of when we’d feel okay recommending it) and then used a tortured interpretation of the documentation to suggest I was lying. That’s intellectually dishonest.

And while systemd doesn’t preclude alternative implementations, systemd’s specifications are not developed through a vendor-neutral process like the IETF, so there is no guarantee that other implementers would have an equal seat at the table.

This critique has some substance, though I’d counter that several of the APIs are co-developed with their consumers (like sessions for GNOME), so they’re not being created just unilaterially by systemd. Ayer also doesn’t specify what’s wrong with the designs, so it’s essentially an ad hominem against systemd or its developers. That is, he doesn’t like the APIs because of who’s creating and implementing them.

The importance of it all?

At this point — multiple posts back and forth — I’m still not sure why Ayer sees this as such an important issue, given his view on the “worst possible compromise” being the point of data loss:

Let’s start now, and stop following systemd down the primrose path.

Yes, the attack he described earlier as a “worst possible compromise” may not be preventable by systemd, but it’s also unclear how any of the weaknesses he sees in systemd could have contributed, either. Ayer’s “worst possible compromise” has already happened by the time the attacker is positioned to target systemd.

And, at that point, attacking systemd should be irrelevant, given how he sees systems increasingly deployed in single-purpose configurations:

Systemd’s sandboxing would prevent the attacker from gaining access to the rest of your system (absent a vulnerability in the kernel or systemd), but in today’s world of single-purpose VMs and containers, that protection is increasingly irrelevant. The attacker probably only wants your database anyways.

No unique remote vulnerabilities from PID 1 itself

I certainly don’t buy his original argument (which he hasn’t repeated in his second post) that tools like Cockpit should change the view of systemd. Just like SSH or Remote Desktop in Windows, Cockpit is the tool providing the remote, privileged access; systemd is just on the server listening locally. Blaming systemd for a Cockpit-based compromise is like blaming the “dnf” utility for removing the web server packages after someone gets root over SSH. (Cockpit is neither developed by the systemd team nor installed by default in any distributions.)

Privilege escalation opportunities via PID 1 are generally past the Ayer’s “worst possible compromise” point

In terms of how systemd could be a vector for privilege escalation, the local access that would be required to mount such an attack would generally be past the point of getting “your database” a.k.a. the “worst possible compromise.”

The example command Ayer used for crashing PID 1 could have just as easily been a fork bomb — a still reliable way to take down a box — or, worse, a command to harvest credentials, putting us right back in Ayer’s “worst possible compromise” territory. (A web application can generally read its own database configuration.)

It’s trivial, anyway, to use namespaces to prevent a networked service like nginx from seeing the necessary sockets to interact with systemd or D-Bus, making it implausible as a privilege-escalation vector via the C-based string handling Ayer hates so much. With such namespacing, world-facing daemons wouldn’t be able to talk to systemd, let alone force it to maliciously parse something. (And, yes, I realize namespaces aren’t primarily a security tool, but it’s a third layer on top of systemd operating properly and the web application not being compromised in the first place.)

Even for the worst case bug around umask, you would already have needed an attack with arbitrary local file access to make use of a bug there, which is well into the “worst possible compromise” territory of being able to read the database off the server or obtain the database credentials from the web application.

Stepping briefly away from security, Ayer still hasn’t provided evidence that systemd is affecting the general (not under attack) reliability of servers. His only evidence there was the sloppy attempt to suggest improved coredump handling is evidence that coredumps happen frequently. This means, at least empirically, systemd’s language choice remains a security concern more than a reliability one.

Local means kernel syscalls, too

Even if you have namespaced away systemd’s interfaces from a daemon, there will still be syscall interfaces available to an attacker who’s gotten past the front door, and syscalls have historically provided more opportunities for privilege escalations than systemd’s PID 1 has, even in the last two years. This isn’t to say systemd (or the kernel) shouldn’t fix local issues, but there’s just a lot more surface area than just PID 1 once someone has local code execution.

Hardening systemd with a language like Go would reduce the amount of root-or-higher C code performing parsing and other work on data from unprivileged processes by less than 5%. Simply using systemd’s directives to whitelist necessary syscalls (which is actually filtered in the kernel), would remove more root-or-higher C code from facing malicious input than removing all of systemd.

So the big question: what would a PID 1 meeting Ayer’s criteria supposedly prevent or fix by his own standards of what matters for in an attack? Why is Ayer spending time critiquing systemd when the attack vectors emerging from the weaknesses become available well past his “worst possible compromise” point?

Either systemd isn’t a substantial vector to the attacks that Ayer worries about, or he’s worrying about the wrong attacks.