Using Strace for performing fault injection in system calls.

buffer0x7cd
3 min readJan 20, 2020

--

Gracefully handling failures in system calls is one of the most important parts of writing any low-level system code. While many of the Linux systems calls are well defined and have easy to reproduce failure conditions, some of them still lack an easy way to mimic a failure condition.

For example, let’s say your application uses the fork (Note, In modern Linux systems fork, is a wrapper around clone system call) system call to create a new process and do some processing in the child process. Now to test that your application handles the fork failure properly, you have written an appropriate error handler, but now the question remains how are you going to test this code path?

lest’s take the below code snippet as an example.

A simple wrapper around fork()

As we can see in the above code snippet, we are checking if the call to fork fails and then doing some post-processing in case it fails(albeit a very simple one, where the program prints the fork error message and then exit).

As we can see in the above code, lines 4 to 7 are only executed if the call to fork fails. One way of achieving this to run a program and create lots of processes so that any further call to fork starts failing, but this approach is very destructive and is very expensive to implement. Another alternative is to use the Linux kernel's PID cgroup functionality and put a very low limit on the running process(you can get more details about PID cgroup here), But the issue with this approach is that we don’t have flexibility where we can specify things like Make every 4th call to fork fail, etc. Also, there are a lot of system calls that can’t be controlled with cgroups. examples like connect, wait, etc.

One very easy to use tool for these kinds of requirements is Strace. While Strace is a very popular tool to inspect any kind of system calls that are being performed by the process. It also has the ability to inject errors for system calls inside a target process.

Taking the below code snippet as an example, let’s see how Strace can be used to make a system call fail inside a target process.

A simple program that uses fork to launch a child process.

Inline 35 to 39, we are creating a child process, printing some information messages and then immediately exiting out.

On line 42, we are calling wait on the child process, so that the exited child process can be cleared from kernel’s process table.

Let’s check a normal execution of the above program.

> gcc -o fork-example fork-example.c
> ./fork-example
Waiting for the child to terminate in Parent with PID 2745
Running in child process with PID: 2746
Child with PID 2746 exited

Looks good. now let’s test our Fork function and ensure that it’s handling the errors in fork system call properly by causing the call to fork fail

> strace -e trace=clone -e fault=clone:error=EAGAIN ./fork-exampleclone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb29819d790) = -1 EAGAIN (Resource temporarily unavailable) (INJECTED)
Error, while calling fork
: Resource temporarily unavailable
+++ exited with 1 +++

Here, as we can see that the call to fork is failed in the target process and our error handler code got executed successfully. Now let’s breakdown the above Strace command to understand how does the fault injection works.

  • - e trace=clone this instruct Strace to only trace the clone system call (Note: on a modern Linux system, fork is only a Glibc wrapper function which uses the Clone system call to create new processes ( This will be covered in another blog but you can read more about the working of Fork and Clone here.)
  • - e fault=clone:error=EAGAIN This instruct Strace to inject the error in all invocations to clone system call and set the error response to EAGAIN. Definition of EAGIN from the clone(2) man page
EAGAIN Too many processes are already running; see fork(2).

One another useful option while performing the fault injection is the when directive, which can make the nth invocation to a system call fail. for example

strace -e trace=wait4 -e inject fault=wait4:when=4:error=EINVAL TARGET_PROCESS

This will make every 4th call to wait to fail inside the TARGET_PROCESS

While in most of the example above, the process is invoked through Strace, The above-mentioned technique works for any other long-running process that Strace can be attached to like Nginx, python, or JVM, etc.

--

--