Learning C++ Thread Support Library Through System Calls

C++11 thread = native thread + language specifications

Published in

The Startup

6 min readJan 18, 2021

In an application, sometimes we need faster processing. One of the many ways to achieve faster processing is by doing many things simultaneously. This is called parallelism. This can be done by creating multiple processes or multiple threads. We can call it an execution unit. An execution unit is assigned to a hardware processor by the operating system scheduler. Even if a system has only a limited number of processors, we still can create more threads or processes to take advantage of the fact that many things in an application are IO-bound. Doing IO operations are slow and the CPU needs to wait until the IO operation is finished. Instead of waiting for something to finish, the CPU can do something else.

Both processes and threads can run independently. But multiple threads in a process share the same memory space, whereas multiple processes can’t share the same memory.

Threads are lightweight processes.

Like a process, each thread runs independently of other threads. Each thread has it’s own stack and flow of control.
In a process, all threads share same global memory, heap and code segment. That’s why sharing data between multiple threads is easy. It’s difficult to share data between two processes.
Process creation is expensive. When a process is created, various attributes like page tables and file descriptor tables need to be duplicated. All threads belonging to a process share page tables and file descriptors. So, thread creation is cheap and fast(usually 10 times faster in a typical Linux system).

Every C/C++ program has at least one thread, that thread is called main thread. Then multiple threads can be created to branch out multiple flow of control.

C++11 provides std::thread library to create thread. std::thread is a thin wrapper around the platform threading library(pthread in linux and win32 threads in windows) plus some standard behaviors guaranteed by C++ standard committee.

Let’s create our first thread in C++.

#include <iostream>
#include <thread>
#include <chrono>void f(int n)
{
  for (int i = 0; i < 5; ++i) {
    std::cout << "Thread 1 executing\n";
    ++n;
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
  }
}int main()
{
  int n = 0;
  std::thread t(f, n + 1); 
  t.join();
  return 0;
}

Output

Thread 1 executing
Thread 1 executing
Thread 1 executing
Thread 1 executing
Thread 1 executing

I just copied the example provided here https://en.cppreference.com/w/cpp/thread/thread/thread.

Here the t is a thread object, which takes a function and an argument to the function. After the thread is created, it starts executing. Then we wait till the thread is finished, then we exit our program.

std::thread is a wrapper around pthread library and pthread is a c library which creates thread by making system calls. So, we can track down all the calls by using strace utility.

strace -f ./main

In this article, I will discuss about a few system calls which allows creation of threads.

clone()

The clone system call creates a new process, on success it returns the child process id. There is another system call called fork() that is used to create process. Even if both clone() and fork() create processes, clone() allows us to create light weight processes.

In case of fork(), the cloned child continues from the point of call. But clone() starts by calling the function specified in the argument list.
The cloned child process terminates either when the func returns or when the child process makes a call to exit().

In the strace output, we can see a clone() system call. The first argument is the address of the child process’s stack.

The 2nd argument is a bunch of ORed flags.

CLONE_VM :- the calling process and the child process run in the same virtual memory pages(address space). If either process allocates or de-allocates memory(mmap() or munmap()), it will be visible to all.

CLONE_FS:- the calling process and the child process share the same file system information, umask, current directory and root directory. If the child changes any of these information, it will be visible to the parent.

CLONE_FILES:- the calling process and the child process share the same file descriptor table. This means that file descriptor allocation or deallocation
(open(), close(), dup(), pipe(), socket(), and so on) in either process will be visible in the other processres.

CLONE_SIGHAND:- the calling process and the child process share the same table of signal handlers. If either process changes the signal disposition by using sigaction() or signal(), it will be visible to all processes.

CLONE_THREAD:- the child process is placed in the same thread group as the calling process. All processes in a thread group share the same process id. Each thread in a thread group has a unique thread id. Now on, I will use thread id instead of child process id.

CLONE_SETTLS:- the tls (4th argument) is the new TLS (Thread Local Storage) descriptor.

CLONE_PARENT_SETTID:- store thread ID at location parent_tid(3rd argument) in parent and child memory. Even if the clone() system call returns the child thread id, writing into a memory location is more reliable. Because the return value of clone is assigned only after the system call returns. It can happen the child thread exits before the return, the handler for its termination signal is invoked.

CLONE_CHILD_CLEARTID:- Erase child thread ID at location ctid in child memory when the child exits, and do a wakeup on the futex at that address. The address of ctid is passed as child_tidptr(5th argument). This flag allows the the implementation of thread join.

The other flags are not so important. CLONE_VM, CLONE_FS, CLONE_FILES, and CLONE_FILES make the thread creation lightweight. CLONE_THREAD, CLONE_SETTLS, and CLONE_PARENT_SETTID provide thread related functionalities. CLONE_CHILD_CLEARTID makes joining a thread more robust.

set_robust_list()

The next system call is set_robust_list. This function is called by the child process(id=429227).

The purpose of the robust futex list is to ensure that if a thread accidentally fails to unlock a futex before terminating , another thread that is waiting on that futex is notified that the former owner of the futex has died. A thread can inform the kernel of the location of its robust futex list using set_robust_list() system call. set_robust_list() takes the head of the list as an argument.

futex()

The third system call is the call to futex.

This system call provides a method for waiting until a certain condition becomes true. Here the operation is FUTEX_WAIT, which tests that the value at the futex word pointed to by the address uaddr(the 1st argument) still contains the expected value val(the 3rd argument), and if so, then sleeps waiting for a FUTEX_WAKE operation on the futex word. In this case, it will wait untill the value at the memory address 0x7f417b62d9d0, is 429227.

exit()

After the thread finishes execution, it exits. Remember, when we created the thread, we passed a flag CLONE_CHILD_CLEARTID. This flag asks kernel to do a FUTEX_WAKE on the futex at child_tidptr address, in this case 0x7fcbcbdf39d0. After the exit of the child thread, scheduler will wake up the main thread.

Thanks for reading !!

References:-

clone(2) - Linux man page

clone, __clone2 - create a child process clone() creates a new process, in a manner similar to (2). It is actually a…

linux.die.net

futex(2) - Linux manual page

The futex() system call provides a method for waiting until a certain condition becomes true. It is typically used as a…

man7.org