forkstat’s Pretty Cool

Over the last few days, I’ve been working automating my team’s server building process and speeding up launch and provision times. Just when I thought I was nearly done with the process, I ran into a couple of gnarly issues.

First, I started getting emails every time one of the new servers would boot:

Even worse, it was going to my coworkers, who didn’t hesitate to let me know:

Identities hidden to protect the guilty.

This was at the pre-provisioning stage, so the email made sense, but I had no clue what was running sudo at this stage. On top of this, I discovered puppet was being run on startup, even after disabling puppet directly as well as disabling it with upstart. I needed some other things to happen in the provisioning process before the puppet run could take place, so this was no good.

I double checked our user data script (one way of passing initialization commands to cloud init on boot) — nope nothing running sudo there. These shenanigans had to be running somewhere else.

I wondered what syslog might have to say:

sSMTP[1400]: Sent mail for foo@bar.com (221 2.0.0 Bye) uid=0 username=root outbytes=440

Super useful.

How was I going to figure this out? I sat and stared at my monitor for a bit, but that didn’t work. After googling around, I came across forkstat. forkstat uses netlink connector, a special socket API provided by the kernel to get information on processes launching and exiting, among other things (a cool minimal python example that uses this directly is here). Because it doesn’t use anything fancy like ptrace, it’s overhead is low enough that I decided to give it a shot.

I created the following /etc/init/forkstat.conf file, baked it into the server image, and launched a new server:

start on startup
task
exec forkstat

After startup, I checked the newly created forkstat logs and found a couple of juicy tidbits:

11:17:13 fork  1447 child           /bin/sh /etc/init.d/rc 2
11:17:13 exec 1447 /bin/sh /etc/rc2.d/S21puppet start
11:17:13 fork 1468 child /bin/sh /etc/rc2.d/S21puppet start
11:17:13 exec 1468 start-stop-daemon --start --quiet --pidfile /var/run/puppet/agent.pid --startas /usr/bin/puppet -- agent

Ah! So there was an rc.d script launching puppet. One google search later and update-rc.d -f puppet remove did the trick.

Then, I found this:

11:17:13 fork  1419 child           ruby /usr/bin/foo -c /etc/foo.conf
11:17:13 exec 1419 sudo /usr/local/sbin/foo-script
11:17:13 exec 1419 sudo /usr/local/sbin/foo-script
11:17:13 fork 1419 parent sudo /usr/local/sbin/foo-script
11:17:13 fork 1424 child sudo /usr/local/sbin/foo-script
11:17:13 fork 1424 parent sudo /usr/local/sbin/foo-script
11:17:13 fork 1425 child sudo /usr/local/sbin/foo-script
11:17:13 exit 1424 0 0.000 sudo /usr/local/sbin/foo-script
11:17:13 fork 1425 parent sudo /usr/local/sbin/foo-script
11:17:13 fork 1426 child sudo /usr/local/sbin/foo-script
11:17:13 exec 1426 sendmail -t

Ah, our foo script was launching the sudo subprocess, and ultimately calling sendmail. I realized I didn’t actually want that script to be running on boot, since the machine wasn’t provisioned yet, so I disabled it launching on startup.

So yeah, forkstat is a pretty cool, simple investigative tool. I have a feeling I’ll be using it again in the future.