A way to minimize errors and make your C code easier to read — Multi-threading

Markus Gothe
HiMinds
Published in
11 min readSep 17, 2020

In the previous article, I mentioned reentrant functions which led us in media res into multi-threading issues and the related concept of multi-thread safe functions. This article will expand the concept of multi-thread safe functions and take a more in-depth look at POSIX threads and common pitfalls. Unless stated explicitly, I will below use the word ‘thread’ as equivalent to an instance of a thread function.

As with the previous article, this is targeting an experienced audience. Having at least 3 to 5 years of professional (or equivalent) experience and fundamental knowledge of multi-threading concepts is expected from the reader. Keep on reading!

Multi-threading bugs are usually very subtle compared to other bugs and might only be noticeable due to specific timing or memory constraints. Hence it’s essential to use reentrant and multi-thread safe functions inside threads. One of these two properties doesn’t imply the other property, so it’s best if the function has both and it’s always better if it has one of them than none.

Writing correct multi-threaded code is magnitudes more difficult than writing valid single-threaded code. Don’t use multi-threading as a golden hammer! If you need one thread for driving events, consider using alarm() and a signal handler that re-triggers itself. Just remember to make the signal handler block and unblock the alarm signal, so the code is also guaranteed to be synchronous. It’s a traditional and non-complex approach to solve specific problems. Below is a trivial example of how to implement this construct.

/* Background "thread" */
void AlarmFunction(int signo) {
signal(signo, SIG_IGN);
… /* Critical code */
signal(signo, AlarmFunction);
alarm(5); /* Re-trigger ourselves in 5 seconds */
return;
}
/* Foreground "thread" */
int main(int argc, char *argv) {

signal(SIGALRM, AlarmFunction);
alarm(1); /* Trigger the signal handler */

}

Threads, on the other hand, are better when you need to scale up, instead of using fork()/execve() or when you need to implement more complex logic; it’s, however, certain that one will run into issues sooner or later, even when following good practices.

I’ve seen horrible issues when cut ‘n pasting non-reentrant code (and code that’s not multi-thread safe) into functions running in threads, don’t do that; bugs will sooner or later wind up, maybe not for the reader but a third party.

One typical example of these issues is using inet_ntoa() inside a thread. The GNU C library variant is multi-thread safe but not reentrant. The variants in the uClibc and musl C libraries are strictly conforming to the POSIX standard and are not multi-thread safe at all. uClibc provides a reentrant variant called inet_ntoa_r(), but for musl, this is not the case! There is, however, inet_ntop() which is designed to be reentrant and handles both IPv4 and IPv6 addresses. Better use it when writing code, even if it’s IPv4-only!

Pthreads — Debugging

Debugging multi-threaded code with classic printouts might work, but most of the time it doesn’t. Especially not if the locking mechanisms are part of the problem.

Personally, I would use a debugger like GDB or Helgrind, especially the latter, to find out if there are any data races or deadlocks happening between multiple thread instances. To invoke it make sure Valgrind is installed and then invoke Valgrind with the correct option: ‘valgrind — tool=helgrind’. You can also try using DRD: ‘valgrind — tool=drd’. Both tools are very similar, but DRD uses fewer resources and is not as advanced as Helgrind. It does, however, support detached threads that are not supported by Helgrind. Learn to use both tools and use the one you find appropriate.

Also, note that most linters are not multi-thread-aware, and hence relying on static code analysis alone will not help to find this kind of bugs. The best way to debug multi-threaded code is to do it dynamically with a multi-threading capable debugger.

If using traditional printouts then don’t forget that you probably want to declare any static variable used in the printouts on the ‘Thread Local Storage’. I will provide an example of how to do this after diving into the basics of the thread stack.

Pthreads — The virtual stack

The exciting world of pthreads, where too many threads might eat up all the virtual memory in your embedded system! How come one might ask.

Because of the special virtual stack used by threads, being allocated on the heap, the standard C libraries usually implement it, and it is left to the implementation to specify default values for it. Both uClibc and the GNU C library allocates megabytes by default for the stack when using one thread. With ten threads, an application will easily occupy 40–50 MB of virtual memory. The only sane C library I’ve seen is musl which allocates 128 kilobytes per thread for the stack. Luckily this behaviour can be controlled from the pthreads API and should be implemented if you code any multi-threaded program with pthreads.

A simple example contributed by Oracle on how to implement this:

#include <pthread.h>
#include <limits.h>
pthread_attr_t tattr;
pthread_t tid;
int ret;
size_t size = PTHREAD_STACK_MIN + 0x4000;
/* initialized with default attributes */
ret = pthread_attr_init(&tattr);
/* setting the size of the stack also */
ret = pthread_attr_setstacksize(&tattr, size);
/* only size specified in tattr*/
ret = pthread_create(&tid, &tattr, start_routine, arg);

Nota bene: You should use pthread_attr_destroy() after pthread_create() and of course check the return values.

If someone, however, didn’t read my previous articles and used the alloca() function call in multi-threaded code, the result when applying this memory optimization will be unpredictable at worst, and at best it will yield a segmentation fault. It’s just another reason not to use alloca().

Just remember that since it’s the virtual memory that is affected. The behaviour of it depends on how the OS deals with the mapping between virtual and physical memory, the size of the thread stack might not be an issue until the code is used on another hardware configuration or when using the same hardware with a different OS or different kernel settings.

Pthreads — TLS

Each thread also holds a TLS (Thread Local Storage), think of it as a virtual stack per thread for global and static data. Like the thread stack, the TLS is implementation-dependent and usually allocated on the heap. The two are very similar, but they have different address spaces and are implemented differently under-the-hood, having the same purpose but for different data.

The most common use for the TLS is to avoid unnecessary locking complexity when using global or static variables that don’t need to, or shouldn’t, be shared between the threads; e.g. the ‘errno’ variable is usually allocated on the TLS and is shared between functions within the same thread. The TLS can be used to guarantee that a function or library is thread-safe. It’s also great for fixing buggy code without modifying the behaviour of the code.

To place data on the TLS, there is the convenient ‘__thread’-specifier in GCC. During compile-time, GCC will create some extra ELF attributes which require support from the ELF loader and the standard C library. One can of course use the more portable and complex pthread_setspecific()/pthread_getspecific() function pair in tandem with the pthread_key_create()/pthread_key_delete() function pair, especially if coding for portability; e.g. the QNX operating system. The following example will, however, use the former and simpler method. Let’s look on how to use it to fix a broken instance-specific counter that is using a static variable (which we don’t want to be a local variable); it is super simple as we can see below.

void *thread(void *arg)
{
// static unsigned int thread_counter; /* Will have concurrency issues */
static __thread unsigned int thread_counter;
while(1){
thread_counter++;
...
}
return NULL;
}

Pthreads — Cancellation and cancellation points

Threads should be used in such a way that there is no need to cancel or kill them explicitly. This seems to be very easy in theory and shows to be more difficult in practice, partly because of the nature of threads.

It’s not inherently a bad thing to do, but it has got some peculiar adverse effects that one must consider. If we kill a process, the resources will be cleaned up (by the OS if not clean-up handler is installed) and the program will exit. This is not the case when working with threads; the resources might be cleaned up at what’s called a “cancellation point”. It’s usually at some specific function calls, defined in the POSIX standard. However, they are known to be at risk for concurrency issues themselves. Remember as well that the thread must be set to use deferred cancellation for this to be true as well, it can be that the thread is set to not use deferred cancellation and that’s even riskier since the thread will cancel instantly; pthread_setcanceltype() can be used to switch between the two types, deferred cancellation being the default.

There is also the risk that the thread is in a critical code path and that cancelling it will bring instability to the rest of the program, unlike when killing the whole program. So if you want to stop a thread of running you better send a signal to it and handle that signal with some extra care.

So why does the POSIX standard provides the pthread_cancel() function call? Well to be honest, when the API was developed in the ’90s it was created so that you could port existing code using different OS-specific APIs and there wasn’t really much multi-threading at all back in those days. So my qualified guess is that they focused on a unified API based on existing functionality rather than creating a clean API.

However, when Google created the Bionic C library, they realized that forced thread cancellation would be an issue and deliberately left it out, leaving it to the programmer to solve it in another way if needed. One way to achieve the same behaviour in a slightly better way is to use pthread_kill() to send a signal and have a signal handler set a flag, which in turn makes the tread stop execution after making sure resources are given back to the system and that the program is in a stable state.

If you, for some reason, need to cancel a thread there is at least one thing you can do to prevent havoc. By using the pthread_cleanup_push()/ pthread_cleanup_pop() functions we can tell the thread to clean up after itself when canceled by calling a clean-up handler. This way, we can make sure that critical resources are given back to the system.

Since the functions above are using classic push/pop semantics, we can have more than one clean-up handler for a thread. If we have two different instances, of two different threads, we might use 1–2 of the clean-up handlers for one of the instances and all of the clean-up handlers for the other instance. The push/pop semantics used by the function pair above allows us to be innovative when trying to clean up after ourselves. Think and grow rich!

Pthreads — Concurrency and locking

Try using as little global and static data as possible and if needed, use the thread stack or the TLS. Avoid sharing data between threads. Don’t define static data in your threads, unless you use the ‘__thread’-specifier mentioned above.

Sometimes the approaches above are not sufficient, especially when working with code from a third party which we cannot rewrite due to time and cost constraints. Then we need to ensure the shared data is accessed by one thread at the time. If not, we might get unexpected behaviours when reading back the data. If accessing memory with pointers there can be segmentation faults, however, it’s more likely to happen when allocating/de-allocating on the heap from asynchronous signal handlers (yes, Virginia, people are doing that, and the mileage vary). The list of potential pitfalls involves virtually anything that might go wrong.

However, when using locking mechanisms (mutexes and semaphores), there might be issues with deadlocks. Usually, they are easy to identify, but the more complex the functionality, using the same lock, is the more problems there will be. Use locks as much as possible but ensure that the code they surround is as small as possible. A typical example is taking a lock and then returning in a corner-case from the function, without unlocking the lock. Usually (unless you know where this might be happening) the most convenient way to debug this is using one of the debuggers described, especially if the code-base is large. The best way to avoid this as the code-base expands is to do it correctly from the beginning. In the example below, we will use mutexes and an integer to solve this.

pthread_mutex_t our_lock = PTHREAD_MUTEX_INITIALIZER;
int break_out = 0;
pthread_mutex_lock(&our_lock);
… /* Code that requires locking */
switch(error_code) {
case 1: // Non-critical error
… /* More code that requires locking */
break;
default: // Critical errors
break_out = error_code;
// return error_code; /* Will cause a deadlock */
break;
}
pthread_mutex_unlock(&our_lock);
if (break_out != 0)
return break_out;

Simple, isn’t it? By always taking this approach on implementing locking, we can guarantee that the lock is always being unlocked. Let’s now apply the principle of having as small as possible code under the lock, on the example.

pthread_mutex_t our_lock = PTHREAD_MUTEX_INITIALIZER;
int break_out = 0;
pthread_mutex_lock(&our_lock);
… /* Code that requires locking */
pthread_mutex_unlock(&our_lock);
switch(error_code) {
case 1: // Non-critical error
pthread_mutex_lock(&our_lock);
… /* More code that requires locking */
pthread_mutex_unlock(&our_lock);
break;
default: // Critical errors
break_out = error_code;
// return error_code; /* Would cause a deadlock, but not anymore */
break;
}
if (break_out != 0)
return break_out;

One can easily be under the impression that we don’t need to use the ‘break_out’ variable here, but that’s just a false sense of security. If other people will be using the code or if we need to maintain the code and expand it, we better keep the variable as-is to illustrate that there might be a potential deadlock issue. Try to make a habit of avoiding your own and others’ future mistakes. Hope for the best and prepare for the worst!

Summary and moving forward

What I decided to write in this article is just a cherry-pick of issues originating from the multi-threaded code. Writing correct multi-threaded code is as said difficult, even for the experienced programmer.

We shouldn’t use multi-threading as a “golden hammer” like many novice programmers tend to do with the programming languages and concepts they know of. We’ve all been there. It’s an anti-pattern. We should be more flexible than that and know when to use it and not use it to solve a problem. And when solving that problem, reducing the possibility of errors and code for readability.

Always avoid using multi-threading for problems that could be solved by other means. If you need one thread for driving events, then you should use the alarm() function together with a synchronous signal handler (since the alarm signal is per se asynchronous, this is a misnomer). Don’t over-complicate things and if you really need to over-complicate, then try to keep it as simple as possible. I stress the importance of this because it is best for all of us.

The main ideas from the article can be summarized as follows:

  • Always think twice when designing a multi-threaded program, shoe-horning multi-threading into an existing single-thread code-base will create lots of issues in the long run as will trying to fix broken multi-threaded code.
  • Don’t reuse code from single-threaded programs in threads unless you are sure the code is thread-safe.
  • Don’t forget to use the ‘__thread’-specifier for thread-specific static or global data.
  • Make sure to implement locking mechanisms correct to avoid race conditions and deadlocks.
  • Avoid forced thread cancellation! There are no safe cancellation points.

In the next article, I will try to give a brief overview of unusual C constructs and an advanced introduction to security and secure programming.

--

--

Markus Gothe
HiMinds
Writer for

Avid SGI/IRIX enthusiast and embedded MIPS specialist...