Calling functions in the past

Published in

Time Travel Debugging

8 min readSep 13, 2022

Debuggers can call functions in your programs directly and tell you what they returned — how does this interact with time travel debugging?

Calling functions from a debugger

In this article I’m going to concentrate on GDB as an example debugger, combined with C / C++ code. Similar principles could be applied to other debuggers and languages.

GDB can call functions in the debugged process (which it calls the inferior). This is most straightforwardly accomplished through:

print my_function()

Which will print the returned value from my_function().

The effect is to cause my_function() to immediately be called, within the context of the inferior process, and its return value is then fetched and displayed within GDB. The function can perform arbitrary operations and may have side-effects on the state of the running process.

To provide this functionality, the debugger must:

Save the initial register state of the inferior.
Alter the initial register state (and potentially stack contents) of the inferior so that it will enter my_function() when continued, then trap back to the debugger when it returns. If the function receives arguments then these need to be poked into registers / the stack.
Continue the inferior until my_function() returns.
Retrieve the return value of the function.
Restore the original state of the inferior’s registers / stack so that execution will be able to continue as if the function call had not happened. Changes to global state will still be visible.

Example

Let’s try this with a simple program that increments a statically-allocated counter from inside a loop and prints out the pre-update value each time:

#include <stdio.h>/* Advance counter by specified amount and return previous value. */
static int advance_counter(int adv)
{
    static int counter = 0;
    int ret = counter;
    counter += adv;
    return ret;
}int main(void)
{
    for (int i = 0; i < 5; i++)
    {
        printf("%d => %d\n", i, advance_counter(1));
    }
    return 0;
}

Building and running this program gives us this:

$> gcc -g counter.c  
$> ./a.out  
0 => 0 
1 => 1 
2 => 2 
3 => 3 
4 => 4

OK, so far this is straightforward. Now we’ll complicate things!

We start the program within GDB and step through a couple of loop iterations:

$> gdb ./a.out  
[...]
(gdb) start 
Temporary breakpoint 1 at 0x401154: file counter.c, line 14. 
Starting program: /home/mwilliamson/blog_posts/a.out  
 
Temporary breakpoint 1, main () at counter.c:14 
14          for (int i = 0; i < 5; i++) 
(gdb) next
16              printf("%d => %d\n", i, advance_counter(1)); 
(gdb) next
0 => 0 
14          for (int i = 0; i < 5; i++) 
(gdb) next
16              printf("%d => %d\n", i, advance_counter(1)); 
(gdb) next
1 => 1 
14          for (int i = 0; i < 5; i++) 
(gdb)

We can see the program is producing the same sequence of output lines as it did when run outside the debugger. We’ll do the call manually to find out what the next returned counter would be:

(gdb) print advance_counter(1) 
$1 = 2

Just as we expected! We’ve been able to call a function ourselves, as if the program had called it, then see the result. What happens if we do that again?

(gdb) print advance_counter(1) 
$2 = 3

We’ve retrieved the next value — our calls are updating the global state of the program, just as they did during normal execution. What, then, happens when we run the program to completion?

(gdb) c 
Continuing. 
2 => 4 
3 => 5 
4 => 6 
[Inferior 1 (process 75329) exited normally]

We continued the program until it completed. The output produced was similar to our original run, except that the remaining rows of output showed a counter value offset by two, corresponding to our pair of calls to advance_counter(1) .

Fancy tricks

Inferior functions can be called in all sorts of situations. For example:

Call internal utility functions to dump internal state from the program at key points in execution without modifying the program’s source.
Call internal functions from a breakpoint condition, using their return value to determine whether to stop (see also our article on conditional breakpoints).
Call internal consistency checks from debugger scripts in order to compare / validate program-internal state during a debug session.

Inferior calls and time travel debugging

Time travel debugging allows a developer to rewind a process — potentially at instruction granularity — to any previous state of its execution.

This gives the developer immense power, providing them the ability to find interesting points in history and then use all our familiar debugger functionality to diagnose the issue there.

time travel debugging relies on the ability to re-run a program exactly the same, every time

As we described previously in the write up of our parallel search optimisation, time travel debugging relies on the ability to re-run a program exactly the same, every time.

This is because most time travel debuggers (ours included) re-execute program code to reconstruct previous program states. Point-in-time snapshots are used as a “jumping off point” to bound the amount of re-execution required for any time travel.

Diagram illustrating a single supervisor process within the Undo Engine controlling multiple snapshot processes, situated at different points along a timeline of execution history. — The Undo Engine maintains snapshot processes at intervals throughout execution history

But we’ve just seen that inferior calls can change the program’s state, which in turn could affect re-execution behaviour — how can we make these available in the presence of time travel debugging?

We need to somehow sandbox the code while we’re calling inferior functions — and then revert their side-effects before we continue our replay. The mechanism for this, in the Undo Engine, is called volatile mode.

Volatile mode — sandboxing inferior function calls

When time travel debugging under UDB (which is LiveRecorder’s debugger, building on the functionality of GDB), the Undo Engine can inspect the state of a recorded program at any point in its execution by replaying history to reconstruct the exact state (memory, registers, etc) at that point.

This gives us a state that we could execute inferior calls against and produce the answer that we would have seen, if that function had been called at that time. We are already debugging this process (internally, we do this using the ptrace API), so it seems like we could just let GDB issue its standard commands to build and execute an inferior call.

As usual with software, the answer is not that simple.

We will later need the precise state in this process image to continue our replay of history. As we’ve seen above, inferior calls can modify process state — doing so could either mean that further replay is altered (unacceptable) or that we have to discard this process image and rebuild it again (inefficient).

Volatile mode is the mechanism the Undo Engine uses to address this problem. Conceptually it’s pretty simple — we can cause the replay process to fork() a new process (which we will also control using ptrace). This process contains the same state (shared with copy-on-write semantics) so changes to it will not harm our replay process.

A diagram showing a single supervisor process in the Undo Engine controlling a snapshot and also a “volatile child” that has been forked from it. Debug operations are redirected from the original snapshot to the volatile child. — While in Volatile mode, debug operations are directed to a forked placeholder process, instead of the snapshot we were previously debugging.

With this set up, we can allow GDB’s inferior call mechanism to proceed as normal and produce an answer for the user.

Catching inferior calls

One question remains — how do we get into and out of volatile mode? We need to activate it just before letting GDB do an inferior call, then clean it up afterwards so we can go back to normal time travel behavior.

The answer is in GDB’s Python event API — the events.inferior_call event registry can invoke a Python callback before and after an inferior call occurs. In concept, we simply hook these notifications to switch modes and GDB is able to otherwise proceed as normal.

In practice we do use the above functionality but volatile mode is activated more lazily. This minimizes any performance overheads, avoiding the mode switch in some cases.

Putting it all together

Having seen how this works in principle, let’s see the actual effect on our example:

$> udb ./a.out  
[...]
 
Reading symbols from a.out... 
not running> run 
Starting program: /home/mwilliamson/blog_posts/a.out  
Invalid cast. 
warning: Probes-based dynamic linker interface failed. 
Reverting to original interface. 
0 => 0 
1 => 1 
2 => 2 
3 => 3 
4 => 4 
 
Program received signal SIGSTOP, Stopped (signal). 
 
The program has exited, but is still being debugged. 
You can use UDB reverse commands to go backwards; see "help udb" for details. 
 
__GI__exit (status=status@entry=0) at ../sysdeps/unix/sysv/linux/_exit.c:30 
Downloading source file /usr/src/debug/glibc-2.33-21.fc34.x86_64/posix/../sysdeps/unix/sysv/linux/_exit.c... 
30            INLINE_SYSCALL (exit_group, 1, status);

We’ve allowed the program to run to completion here because we can easily rewind to inspect its state after the fact. We’ll wind back to the start of main() so we can repeat our earlier inferior calls:

end 11,428> tbreak main 
Temporary breakpoint 1 at 0x401154: file counter.c, line 14. 
end 11,428> reverse-continue 
Continuing. 
 
Temporary breakpoint 1, main () at counter.c:14 
14          for (int i = 0; i < 5; i++) 
93% 10,685>

OK, now we’re ready to step through the loop and see what happens:

93% 10,685> next
16              printf("%d => %d\n", i, advance_counter(1)); 
93% 10,686> next
0 => 0 
14          for (int i = 0; i < 5; i++) 
96% 11,024> next
16              printf("%d => %d\n", i, advance_counter(1)); 
96% 11,025> next
1 => 1 
14          for (int i = 0; i < 5; i++) 
97% 11,109>

So far, this matches what we saw above. Now let’s try calling advance_counter(1) :

97% 11,109> print advance_counter(1) 
$1 = 2

Again, this matches the behaviour we saw with GDB. Let’s try it again:

97% 11,109> print advance_counter(1) 
$2 = 2

Now this is different. Our calls to advance_counter() are not building up side-effects — this is because the memory changes from each call are being caught and cleaned up by the Undo Engine’s volatile mode.

Even if we specify a huge increment, the rest of our debug session will unfold the way we originally recorded:

# This would increment all subsequent values by 1000
97% 11,109> print advance_counter(1000) 
$3 = 2
97% 11,109> continue
Continuing. 
2 => 2 
3 => 3 
4 => 4

Inferior calls can thus be used to ask “what if?” questions at replay time without disrupting the recorded history.

Gotchas

Due to the implementation of volatile mode, there are a few things to watch out for:

Volatile mode is always used for inferior calls — this means inferior calls in UDB will not have memory / register side-effects, even whilst recording the process for the first time.
Functions that rely on operating system resources (e.g. file descriptors other than stdin / stdout / stderr) existing will probably fail as these do not exist at replay time.
Functions that explicitly change external system state may still do so (e.g. an inferior function that invokes remove(pathname) to remove a path will attempt to do so on the local replay system — this is outside the sandboxing capabilities of volatile mode).
The system will not remain in volatile mode between inferior calls, so it is not possible to have separate commands use inferior calls that rely on each other (e.g. to allocate an object using one inferior call then pass it to a subsequent one).

Conclusions

Inferior function calls are very powerful in GDB and can be extremely useful during time travel debugging.

Look out for the gotchas above and make the most of the power available.