A story of broken systemd and dbus

George Shuklin
OpsOps
Published in
2 min readJan 26, 2022

This is a story from trenches. Fortunately, it wasn’t an important server, so I took a lazy approach to the problem.

There was an old server, on Debian Sid. It survived at least three Debian releases (yes, it’s a very old server), and, generally, it is in a messy state. It’s from old days of ‘manual system administration’ and I more or less treat it as a pet without a reason.

So, once again, there was an upgrade time, and my ssh connection hungs in the middle. After reconnecting I found hung dpkg and killed it and it’s parent, but second attempt to run upgrade runs into trouble. All systemd-related operations starts to craw and log was littered with messages like this:

Setting up nfs-common (1:1.3.4-6) ...
Failed to reload daemon: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Failed to get unit file state for nfs-utils.service: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Failed to reload daemon: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=25000ms)
Failed to get unit file state for resolvconf-pull-resolved.path: Failed to activate service 'org.freedesktop.systemd1': timed out (service_start_timeout=
25000ms)

Any attempt to do anything with systemd via systemctl failed (like stopping or killing services).

Even attempt to restart systemd (via systemctl daemon-rexec) failed, or, to be precise, hung.

The source of the problem was something broken in dbus. (systemctl uses dbus to communicate with systemd).

I killed dbus. That made systemctl more responsive (returning failures on all requests). dpkg was able to continue somehow.

dpkg is amazingly stubborn and tenacious, so with few repeats of apt install --fix-broken I got my system to just few broken packages (and hordes of non-restarted units).

I decided to reboot, and, boy, modern reboot is complicated.

# reboot
Failed to connect to bus: Connection refused
Failed to open initctl fifo: No such device or address
Failed to talk to init daemon.

Every admin knows how to do ‘sudo reboot’:

# sync
# sync
# echo b >/proc/sysrq-trigger

I was able to log to the server back. The rest was simple:

apt install --fix-broken

This time every package was able to finish configuration with no issues. There was a new initramfs at the end, so, one more reboot (simple reboot without any tricks), and my vivisectionalized pet got to the proper state.

Cheers!

--

--

George Shuklin
OpsOps

I work at Servers.com, most of my stories are about Ansible, Ceph, Python, Openstack and Linux. My hobby is Rust.