Exploring the Practical Use of Threads with Starbucks

oceanO
16 min readSep 15, 2023

--

Threads aren’t just theoretical constructs — they have tangible benefits that can be witnessed in real-world applications. For instance, if you’ve ever used a word processor, you’ve benefited from multi-threading. Ever noticed how you can type continuously while the software checks for spelling mistakes? This is multi-threading in action.

Why Use Threads?

  • Performance Boost: By dividing tasks among multiple threads, you often achieve faster application performance. Any serious application is multithreaded, in the real world performance is crucial.

“Slow“, in real world, means “not working”

  • Resource Sharing: Threads within a process share the same address space (contrary to processes), allowing for efficient data access and interchange. IPC is not that difficult for threads, here the tricky part is synchronization! We will see.
Process address space

Dive Into a Real-life Scenario

Imagine you are running a Starbucks coffee shop (your program). A customer (a task) walks in and orders a latte. If you were a single-threaded coffee shop, you’d have to make the latte, hand it to the customer, and then take the next order.

Now imagine if you had multiple baristas (threads) working simultaneously. While one makes the latte, the other could be taking another order, preparing a pastry, or processing a payment. This parallel processing ensures customers are served faster, enhancing their experience.

“ A thread is a micro-process inside the process itself, a worker in an assembly line”

Henry Ford understood threads.

How to Implement this in C?

Setting up Pthreads

  • Including the necessary header: Before you start, make sure to include the Pthreads header in your code. P stands for POSIX
#include <pthread.h>
  • Compiling: When compiling your C code, make sure to link the -lpthreads library. For instance, with the GCC compiler, your command might look something like:
gcc your_source_file.c -o output_file -lpthread  

/* macOS uses the POSIX threads (pthreads) library as part of its libc,
so you don't need to explicitly link against another library to use pthread
functions. This is why your program compiles and runs correctly on macOS
even without -lpthread. However, if you were on a different platform
(like certain Linux distributions), you might need to include -lpthread
to link correctly. */

Creating a Thread

  • Threads run functions (e.g. make_latte( ) | serve_customer( )). So, before creating a thread, define the function it will execute. If a thread is just a little program,

“we can consider the function given has the thread main function, its access point!”

void *make_latte(void *arg)  
{
// Your function logic here
}
  • To create a thread:
pthread_t my_thread;  
int status = pthread_create(&my_thread, NULL, my_function, NULL);

When working with threads in programming, each thread gets a unique identification. This identification is crucial when it comes to managing threads.

  • pthread_t represents the ID of a thread in UNIX-like systems.
  • At its core, it’s typically an unsigned long (for many systems).
#include <stdio.h>
#include <pthread.h>

void *print_thread_id(void *tid)
{
printf("Thread ID: %lu\n", *(pthread_t *)tid);
return NULL;
}

int main() {
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, print_thread_id, &thread1);
pthread_create(&thread2, NULL, print_thread_id, &thread2);

pthread_join(thread1, NULL);
pthread_join(thread2, NULL);

}

This code creates two threads, and each thread prints its ID.

Functions like pthread_create and pthread_join use this ID for thread management.

Inside a thread, you can obtain its ID using the pthread_self function.

Important Caveats

Even though it may often appear as an unsigned long, pthread_t should be treated as an opaque data type. It means that while you can interact with it, you shouldn't make assumptions about its inner workings.

Why? Because its actual type might vary across implementations. It could be an unsigned long, an int, or even a structure.

pthread_join( ), the wait( ) function for threads

A good piece of code is worth 1000 words.

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

void *make_coffee(void *arg)
{
printf("Making coffee...\n");
sleep(2);
printf("Coffee ready!\n");
return (NULL);
}

void *make_pastry(void *arg)
{
printf("Baking pastry...\n");
sleep(3);
printf("Pastry ready!\n");
return (NULL);
}

int main(void)
{
pthread_t coffee_thread;
pthread_t pastry_thread;

pthread_create(&coffee_thread, NULL, make_coffee, NULL);
pthread_create(&pastry_thread, NULL, make_pastry, NULL);

//Before handing the order to customer, i nees to WAIT both threads
pthread_join(coffee_thread, NULL);
pthread_join(pastry_thread, NULL);

printf("\n\n\tThx for coming to Starbucks!\n"
"\there's the ☕️ 'n 🥐\n\n\n");
return (0);
}

Now, If you’ve dealt with processes, think of pthread_join as the thread counterpart of the wait() function.

pthread_join indeed ensures the main program waits for all threads to finish before proceeding.

pthread_join(my_thread, NULL);

Key Takeaways:

  • Threads are like workers in a coffee shop (process), serving multiple customers (tasks) simultaneously.
  • They allow for faster and smoother (if handled properly) execution of programs, think about a bar with only one barman doing all, to get your cappuccino it would be a nightmare.
  • Always ensure to join threads to the main process to ensure proper termination, we want the cook to serve the customer, but first of all he have to wait the waiter order!

Processes vs Threads

At a cursory glance, they seem quite similar. Both allow for code execution in some manner, but are they truly interchangeable?

NO.

..and understanding their distinct characteristics is crucial.

What is a Process?

A process is an independent, self-contained unit of execution consisting of its own address space, code, data, and system resources.

Processes have their own dedicated memory, which means they don’t share their variables. Each process runs in its own memory sandbox.

Try to guess the value of x both in father and child.

#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int x = 42;
int main()
{
pid_t pid = fork(); // Creates a new process
// child
if (0 == pid)
{
x++;
printf("Child process: Value of x = %d\\n", x);
}
//parent
else
{
wait(NULL); // Wait for child to finish
printf("Parent process: Value of x = %d\\n", x);
}
}

now..how about Threads?

They share the same memory space as the parent process, which means they share variables. This shared memory concept is where threads differ significantly from processes.

Multiple threads inside a process can execute different parts of the program.

For instance:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

int x = 42;
void *thread_task(void *arg)
{
++x;
printf("Thread: Value of x = %d\\n", x);
return NULL;
}
int main()
{
pthread_t thread1, thread2;
pthread_create(&thread1, NULL, thread_task, NULL);
pthread_create(&thread2, NULL, thread_task, NULL);
pthread_join(thread1, NULL);
pthread_join(thread2, NULL);
return 0;
}

In the above code, threads modify the shared variable x. Since they share the same memory, any change in one thread is reflected across all threads of the process. Run this, x will be 44 and not 43 . You can also appreciate the output order is random, if you run multiple times.

The 2 previous codes, prove one among many important difference between processes and threads.

Distinguishing Features

Memory:

  • Processes have separate memory spaces.
  • Threads share the same memory space as the parent process.

Mental hook: Process is the Starbucks building, thread is a worker inside the Starbucks Cafe.

Process IDs:

  • Every process has a unique Process ID (PID).
  • All threads within a process share the same PID.

Mental hook: Every Starbucks cafe has it’s own address (PID), all the workers live inside the same Starbucks (all have the same PID)with a specific personal ID retrieved with pthread_self() which we can consider like the badge.

Every thread has a different BADGE, they all work in the same PID process.
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>

// This function represents the routine (task) our workers (threads) will execute.
void *worker_routine(void *arg)
{

// All workers (threads) inside the factory (process) will print the same address (PID)
printf("Worker ID->%lu: My factory's address (PID) is %d\\n", pthread_self(), getpid());
return (NULL);
}
int main(void)
{
pthread_t worker1;
pthread_t worker2;
// Let's create two workers (threads) in the same factory (process).
// Hiring (creating) the first worker.
if (pthread_create(&worker1, NULL, worker_routine, NULL) != 0)
return (1);
// Hiring (creating) the second worker.
if (pthread_create(&worker2, NULL, worker_routine, NULL) != 0)
return (2);
// Wait for both workers to finish their shift (tasks).
pthread_join(worker1, NULL);
pthread_join(worker2, NULL);
return (0);
}

Creation & Termination:

  • Processes require more overhead during creation and termination (fork and wait), making them heavier.
  • Threads are lightweight, making their creation and termination faster in comparison.

Mental hook: Fork is like creating a new Starbucks Cafe, threads is like hiring a new worker.

Communication:

  • Inter-Process Communication (IPC) is complex due to separate memory spaces.
  • Threads can easily communicate with each other since they share memory.

Mental hook: Processes are like different Starbucks scattered around in the world. All the Starbucks locations communicate but they all have their own responsibilities and local resources. Threads are like teams within a specific Starbucks; they collaborate closely, share resources, and aim for a common goal. The Starbucks Cafe is a process, filled with thread workers. Once upon a time there was only one Sturbucks Cafe…that forked many many times, making someone crazy billionare.

Considerations:

While threads offer speed and shared memory advantages, they come with the challenge of potential data corruption if not handled properly. Indeed, the division of labor is tricky even in real world.

One must be cautious about simultaneous operations, especially when multiple threads are accessing and modifying the same memory space. Mutual exclusion (Mutexes), semaphores, and other synchronization techniques are essential to avoid “race conditions.”

Understanding Race Conditions

Race conditions are among the most elusive bugs in programming in systems that rely on concurrency.

Imagine in our starbucks there’s a shared counter (or “ledger”) that keeps track of the total drinks served, and our crappy greedy boss wants us to keep track of every drink to avoid money leaks. In real worlds, like in code, multiple waiters are taking and serving drink orders simultaneously.

Each waiter follows these steps:

  • Check the register to see how many drinks have been served so far.
  • Make a mental note of that number (save it in its brain_CPU register).
  • Serve the drinks to the client
  • Update the register by adding one or more to the previous number

Removing the analogy. This is the real code to update a variable.

One “simple” operation translates to more assembly instructions.
.section __TEXT,__text,regular,pure_instructions  # This declares a section named __TEXT. This is where executable instructions are stored.
.build_version macos, 10, 15 sdk_version 10, 15, 6 # Specifies the version of macOS the code was built for and the SDK version used.

.globl _main # This makes the '_main' symbol global, which means it can be accessed outside this file.
.p2align 4, 0x90 # Aligns the next instruction to a boundary. In this case, it aligns to a 16-byte boundary and fills with 0x90 (NOPs).

_main: # This labels the start of the main function.
.cfi_startproc # This is a directive for the debugger. It marks the beginning of a function.

# Prologue of the function, used to set up the stack frame.
pushq %rbp # Push the value of rbp register onto the stack. It saves the old base pointer.
.cfi_def_cfa_offset 16 # Directives for debugger, defines the current CFA (Canonical Frame Address) offset.
.cfi_offset %rbp, -16 # Tells the debugger that the value of rbp is saved at offset -16 from the CFA.
movq %rsp, %rbp # Set rbp to the current stack pointer value. This establishes a new base pointer for this function.
.cfi_def_cfa_register %rbp # Tells the debugger that rbp is now the base register.

# Actual code of the function begins.
movl $0, -4(%rbp) # Move the immediate value 0 into the memory location 4 bytes before where rbp points.
movl $42, -8(%rbp) # Move the immediate value 42 into the memory location 8 bytes before where rbp points.

🚨 this is the code
movl -8(%rbp), %eax # Load the value from 8 bytes before rbp (which is 42) into the eax register.
addl $1, %eax # Add 1 to the value in eax. Now, eax will contain 43.
movl %eax, -8(%rbp) # Store the value in eax (43) into the memory location 8 bytes before rbp.
🚨 this is the code

movl -8(%rbp), %eax # Load the value from 8 bytes before rbp (which is now 43) into the eax register.

# Epilogue of the function, used to restore the previous stack frame.
popq %rbp # Pop the top of the stack into rbp, restoring the old base pointer.
retq # Return from the function.

.cfi_endproc # This is a directive for the debugger. It marks the end of a function.

So..Behind a simple “++ledger;” there are assembly instructions that are the real thing to take care. The scheduler can switch processes in between these operations!

🤖 ASSEMBLY code to update ledger

This is the equivalent in the video of:
- **Check the register to see how many drinks have been served so far.**
- **Make a mental note of that number (save it in its brain_CPU register)**
- **Serve a drink.**
- **Update the register by adding one to the previous number.**
This can create...🥁🥁🥁...**RACE CONDITIONS.**What..?
The thing is that a waiter_1 can make these operations:
- **Check the register to see how many drinks have been served so far.**
- **Make a mental note of that number (save it in its brain_CPU register)**
and at that exact same moment, waiter_2 does the same.
- **Check the register to see how many drinks have been served so far.**
- **Make a mental note of that number (save it in its brain_CPU register)**
...they both will save in their mental note the same number!When the 2 waiters come back they will write the old value + drink_served..
Let's pretend they both read 42, and served 1 drink. The will both update to 43.
1 drink is lost, and money leaks for our boss due to race conditions.

This can be messy, exponentially messier as the number of waiters(threads) or servings increases.

Let’s represent this in code:

#include <stdio.h>
#include <pthread.h>

#define RACE_CONDITION 100000
// 0 by default
int g_drinks_served;
void *serve_drink(void *arg)
{
int i;
i = 0;
while (i++ < RACE_CONDITION)
{
//These a 3 machine instructions, LOAD-ADD-STORE
//as you saw in assembly, our waiter is slow
g_drinks_served++;

/*
Scheduler can switch programs whenever it wants
1)LOAD <-
2)ADD <-
3)STORE <-
The RACE_CONDITION value is proportional to the Probability of it happening
Try to run the code with different values
*/
}
return (NULL);
}
int main(void)
{
pthread_t waiter1;
pthread_t waiter2;

pthread_create(&waiter1, NULL, serve_drink, NULL);
pthread_create(&waiter2, NULL, serve_drink, NULL);
pthread_join(waiter1, NULL);
pthread_join(waiter2, NULL);
printf("Total drinks served: %d\\n", g_drinks_served);
return (0);
}
COMPILATION to see racecc -fsanitize=thread test.c

In an ideal world, if each waiter serves a million drinks, the total should be 2 million. But because of race conditions, you’ll often find the number to be less.

TL;DR

If both waiters check the register at the same time, they might both

  • see the same number
  • serve a drink
  • both update the register to the same “next” value (i.e., they both saw 10 drinks served, served a drink, then both set the register to 11, even though 2 drinks were actually served).

The Mutex Solution

A Mutex, short for MUTualEXclusion, acts like a lock 🔒. It ensures that only one thread (waiter) can access the critical section (our register, the code that leads to the race condition) at a time.

Flipping a coin or throwing a dice produce a “mutual exclusive” behaviour. I can be exclusively head or tail. Not both…until we talk quantum mechanics, there we can 😂.

#include <stdio.h>
#include <pthread.h>

#define RACE_CONDITION 100000
int g_drinks_served;
// Global mutex for synchronizing access to g_drinks_served
// This is a struct, think about a 🔒 that can be closed or open.
pthread_mutex_t mutex;
void *serve_drink(void *arg)
{
int i;
i = 0;
while (i++ < RACE_CONDITION)
{
// Locking the mutex before updating the global variable
// 🔒 closing the lock
// before the waiter didn't do its stuff, nobody writes here
pthread_mutex_lock(&mutex);
/*
🚨 critical section 🚨
"scheduler just leave me do my job here"
1)LOAD <-
2)ADD <-
3)STORE <-
*/
g_drinks_served++;

// 🔓 opening
pthread_mutex_unlock(&mutex);
}
return (NULL);
}
int main(void)
{
pthread_t waiter1;
pthread_t waiter2;
// Initialize the mutex
// setting the value to open 🔓 for example
if(pthread_mutex_init(&mutex, NULL) != 0)
{
printf("Mutex initialization failed!\\n");
return 1;
}
pthread_create(&waiter1, NULL, serve_drink, NULL);
pthread_create(&waiter2, NULL, serve_drink, NULL);

pthread_join(waiter1, NULL);
pthread_join(waiter2, NULL);
// Destroy the mutex after its use
// it is a struct to clean, no LEAKS
pthread_mutex_destroy(&mutex);
printf("Total drinks served: %d\\n", g_drinks_served);
return (0);
}
cc -fsanitize=thread test.c

This program creates two threads that increment a counter (our drink register). Thanks to the mutex, we can ensure that at any time, only one waiter is incrementing the counter, thus avoiding a race condition.

Easy Peasy Analogy:

Think that the coffee owner decided to put the register g_drinks_served in a little room to avoid race-conditions mess.

When a waiter wants to enter, he has to get the key if available (pthread_mutex_lock(&mutex)). If the room is already occupied (locked) there’s no key available, he has to wait outside until the previous waiter comes out and places the key back in its designated spot.

Upon entering, the waiter swiftly closes the door behind, ensuring nobody can enter, puts the key in his pocket (indeed mutexes allow ownership!). Then, he quickly serve the drinks , goes back in the room and add the drinks to the register(g_drinks_served++).After he’s done, he unlocks and open the door, stepping out and placing the key back for the next waiter (pthread_mutex_unlock(&mutex)).

  • pthread_mutex_lock waiter closes the room and do its stuff.
  • pthread_mutex_unlock waiter exit and puts the key back for the next waiter.

A Caveat!

While Mutexes offer solutions to race conditions, they come with a cost — performance. Locking and unlocking operations consume time, just try to run the code. So, while they’re beneficial , your coffee service will for sure slow down.

what is busy waiting?

This is a rudimentary implementation that employs a busy-wait (spinlock) for simplicity. Busy-wait is like waiters continuously scrolling the insta feed to check for notifications outside the drinks register room, such a waste of time right? You are paying them for what?!

In the real world code, you would want to avoid busy-waiting as it wastes CPU cycles. Instead, threads that can’t acquire the mutex (enter the drinks register room) would be put to sleep (these minutes won’t be paid) and woken up when the mutex is available. This often involves more advanced OS-specific mechanisms.

Writing a correct and efficient mutex from scratch requires a deep understanding of concurrency, the specific hardware, and the OS’s capabilities.

Does Mutex make the Critical Section Atomic?

Using a mutex ensures that the entire critical section is executed by one thread at a time, making it appear atomic from the perspective of other threads. It ensures that the operations in the critical section, taken together, are not interrupted by other threads.

However, it’s essential to understand that while the operations inside the critical section appear atomic to other threads, they are not executed in a single, uninterruptible step at the machine instruction level. Instead, the mutex ensures mutual exclusion, which gives the appearance of atomicity at a higher level.

In conclusion, while a mutex doesn’t make the operations inside the critical section atomic in the strictest sense (at the machine instruction level), it ensures that they are executed in a way that appears atomic to other threads, thereby preventing concurrent access and ensuring data consistency.

TL;DR

Mutex (Mutual Exclusion):

  • Ensures that only one thread (waiter) can access a critical section (drinks register room) at a time.
  • Has two states: locked and unlocked.
  • Provides ownership: Only the thread that locked the mutex can unlock it (The waiter keeps the key while doing its job).

Simplifying: Using Static Initializers

The Traditional Approach

Traditionally, when working with the PTHREAD API, initializing objects like a mutex involves a few steps:

  1. Declare the mutex.
  2. Call the initialization function.
  3. Finally, when done, you’d invoke the destruction function.

For instance, initializing a mutex would typically look something like this:

#include <pthread.h>

int main()
{
pthread_mutex_t my_mutex;
pthread_mutex_init(&my_mutex, NULL);
// ... other code using the mutex
pthread_mutex_destroy(&my_mutex);
return 0;
}

A Quicker Way: Static Initializers

Back to Starbucks. You know that while you can customize your drink with numerous options, sometimes you just want a standard black coffee — quick and straightforward. Similarly, for cases when you don’t need any special attributes or customization for your PTHREAD objects, you can use what are called “static initializers.”

“Just give me a black static coffee!

A static initializer is akin to ordering that default black coffee. It’s a macro that provides default values for the object, letting you bypass the explicit initialization call.

Let’s see how we can use this in our mutex example:

#include <pthread.h>

int main()
{

pthread_mutex_t my_mutex = PTHREAD_MUTEX_INITIALIZER;
// ... other code using the mutex
// While it's good practice, you don't necessarily have to call destroy in this case
return 0;
}

The Advantages

  • Simplicity: No need to call the initialization function.
  • Speed: It’s faster because it’s just a direct assignment.
  • Fewer lines of code: The code is more concise and easier to read.

The Caveats

However, like that standard Starbucks coffee, it may not suit all tastes:

Not Suitable for Arrays: Say you wanted an array of mutexes. You can’t use a static initializer for the entire array.

#include <pthread.h>
#include <stdio.h>

#define ARRAY_SIZE 5

int main()
{
// 🚨 WRONG approach for initializing an array of mutexes
pthread_mutex_t mutexes[ARRAY_SIZE] = PTHREAD_MUTEX_INITIALIZER;


// 🚨 this is also wrong
// PTHREAD_MUTEX_INITIALIZER is for static initialization at compile-time.
// For dynamic initialization at run-time, use pthread_mutex_init() instead.
for (int i = 0; i < ARRAY_SIZE; i++)
{
mutexes[i] = PTHREAD_MUTEX_INITIALIZER;
}

}

Static Initializer Misuse: The line with PTHREAD_MUTEX_INITIALIZER is trying to initialize the entire array of mutexes with a single static initializer. But this isn't how it works. The static initializer is meant for individual mutexes, not arrays.

Here’s the correct way to do it.

#include <pthread.h>

#define THREAD_NUM 10

int main() {
pthread_mutex_t mutexes[THREAD_NUM];

for (int i = 0; i < THREAD_NUM; i++)
{
pthread_mutex_init(&mutexes[i], NULL);
}

// ... your code ...

// Optionally, destroy the mutexes once done
for (int i = 0; i < THREAD_NUM; i++)
{
pthread_mutex_destroy(&mutexes[i]);
}

return 0;
}
  • We declare an array of mutexes called mutexes.
  • We then use a for loop to initialize each individual mutex in the array using pthread_mutex_init.
  • After the mutexes have been used and are no longer needed, another for loop is used to destroy each mutex.

Now, think of this like ordering a Starbucks coffee. If you have five friends, you can’t order one cup and expect it to magically duplicate into five separate coffees. Each friend needs their own order. Similarly, each mutex in an array requires individual initialization.

Remember: In coding (and coffee), the details matter!

Concluding Thoughts

While static initializers provide a simplified way of initializing PTHREAD objects, it’s essential to know when to use them. Think of them as tools in your toolkit, perfect for certain situations but not one-size-fits-all. And just like that standard black coffee, sometimes simplicity is all you need to get the job done.

--

--