In which ptrace is weird (episode 1 of N)

Mark Williamson
Time Travel Debugging
3 min readMay 3, 2022

--

ptrace is the API usually used on Unix-like systems to implement process tracing functionality. Usually that’s debuggers — e.g. GDB, UDB, etc. It can also be used where some other type of process needs to monitor / record another — e.g. strace, LiveRecorder or even User Mode Linux.

It’s a fairly old and complex API, with some well-known issues (it took me a long time before I stopped being afraid of the man page — even now, I couldn’t claim to understand it fully).

The Impossible Thing

My work gives me lots of opportunities to dig into weird corner cases of this API, which can sometimes be useful and sometimes frustrating. Most recently, we were investigating a surprising case of an -ENOSYS return from a clone system call. This happened very rarely in a debugged process when we were transitioning from an ordinary debug session to a Time Travel Debugging session.

As per standard debugging practice, we carefully thought of every possible cause and then ruled them all out. Good news, the bug was Impossible. Sadly, nobody had told the bug this and it insisted on recurring.

Thanks to a colleague crafting a reproducer we recently got to the bottom of this. The answer lies in the intersection of two powerful ptrace behaviours: tracing cloneevents and tracing syscalls.

Threads under ptrace

When debugging threaded code, it’s usual to set the PTRACE_O_TRACECLONE option. This ensures that:

  1. Newly-created threads are also traced (surprisingly, it’s not actually required to debug all the threads in a process — in fact, that has to be specifically requested).
  2. The tracer (i.e. the debugger) gets notified when new threads start.

When this event is reported to the tracer it also supplies the ID of the new thread so debugger bookkeeping can be updated. Here’s GDB notifying the user of a new thread’s arrival:

[New Thread 0x7ffff7db1640 (LWP 2331113)]

Naturally, since we support multi-threaded code, we set this option.

Sounds all good, right?

The Root Cause

So, how does this relate to our -ENOSYS issues? The bug comes down to not understanding quirk that’s hinted at in the ptrace(2) man page:

PTRACE_EVENT_CLONE 
Stop before return from clone(2).

The key word here is “before” — when we receive our notification of clone the caller thread hasn’t really exited the system call. There’s a term for this elsewhere in the documentation: syscall-enter-stop.

The syscall-enter-stop (and its friend syscall-exit-stop) state is documented as a behaviour of PTRACE_SYSCALL— which provides “run to the next system call” behaviour (very useful if you’re strace ).

One quirk noted in the man page is that some platforms will show special register values specifically during syscall-enter-stop— on x86, this is -ENOSYS .

Aha!

A bit of investigation and, yes, it looks like PTRACE_EVENT_CLONE actually leaves the traced thread in syscall-enter-stop, even if you weren’t using PTRACE_SYSCALL.

A prototype of the obvious solution (use PTRACE_SYSCALL to get the thread back out of the syscall) worked, making the syscall’s true result value visible instead of the intermediate value that confused us.

With some additional refinement, this provided the correct fix for the bug.

Why doesn’t everyone have this problem?

This behaviour is, thankfully, the kind of thing only debugger developers really need to worry about.

For most debugger uses, even this behaviour isn’t really a problem — as soon as the traced process is resumed, the kernel will ensure the correct register values are reinstated. The -ENOSYS is never visible to the process being traced.

So, for a “normal” debugger, this problem is mostly invisible. For Undo’s tech we have to capture and hand over initial process state to our record / replay machinery. On a particular race condition, we could hoover up the intermediate -ENOSYS value as part of our starting state. This would result in us later restoring it into the traced process.

At that point, we’ve effectively told the kernel that we really want this value there, removing its ability to magic it away for us. The fix avoids this by having the kernel do its magic immediately, before the rest of our code gets a chance to inspect the register state.

The fabric of reality is safe … until next time.

--

--

Mark Williamson
Time Travel Debugging

CTO and bad movie specialist at undo.io - working on our LiveRecorder time travel debugger.