Process vs. thread: which one can handle tasks better?

Published in

Reverse Engineering

7 min readDec 26, 2019

Photo by Jaime Dantas — Minas Gerais — Brazil

It’s no wonder why so many programmers and engineers struggle when it comes to deciding whether to use threads or processes on their applications. For this reason, I’ve decided to write a post going through the main differences and advantages of each of these software features. In order to make the theory less boring, I’ll use a real-world example where we’ll be able to see how threads and processes work up close.

There are two types of concurrent programming in computing: multithreading and multiprocessing.

Each process or thread runs in a single core of a multicore processor. Commun computer processors nowadays have anywhere from 2 to hundreds of cores. Your personal computer will probably have 4 cores, and if you’re trying to develop a software that takes full advantage of each of those cores, we’ll need to understand how multithreading and multiprocessing work.

Processes

Each process can have multiple threads. We usually use processes when we need to process or perform an extremely heavy task. The reason for that is because the overhead time required to create a process is quite big, leaving its counterpart, the threads, a huge advantage when it comes to runtime in some cases.

Another point to be made is that communication among processes is quite costly since they don’t share memory of any kind.

Threads

Just off the shelf, a thread can do anything a process can do. The main difference here is memory sharing. Threads are usually used for performing small tasks.

When it comes to memory sharing, a thread shares the same memory with its creator process. In other words, threads can read and write the same data structures and variables as their creators.

Communication among threads is also easy to achieve since they share the same address space when they are within the same process.

However, not everything about threads is perfect. Whenever you’re dealing with threads, you’ll need to pay special attention not to step on each other, which can lead to all sorts of problems like a race condition.

To avoid issues like that, thread-locks should be used whenever necessary.

Application of processes and threads

As I told at the beginning of this post, I intend to solve a quite interesting problem using threads and processes. The problem in question consists of implementing a digital low-pass filter. To begin with, let’s familiarize ourselves with some terms before deep into the problem itself.

filter: a filter is a device (hardware) or process (software) that removes some unwanted components or features from a signal.

low-pass filter: a low-pass filter (LPF) is a filter that passes signals with a frequency lower than a selected cutoff frequency and attenuates signals with frequencies higher than the cutoff frequency.

Now that you already know what a LPF is, let me explain the problem we’ll solve.

The graph below shows 1000 samples for a given sign. Our goal is to remove all noise from this sign so the final result is the original sign with almost no noise at all. To get this data, you should donwload the file input.dat from my github:

jaimedantas/low-pass-filter-parallel

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

If you look carefully, you’ll notice that this sign is a sine function.

In digital systems, we use equations to process digital signs. In our case, the LPF equation we’ll use is the arithmetic mean defined bellow:

This formula is not complex to understand. It basically computes the overall mean of a given sample where N is the total number of samples and y(n) is the filtered sign. We need to filter this sign for the following values: N=3, N=6, N=10, and N=20 samples.

The file “input.dat” contains all the data necessary for this question. The first column is the time, the second one is the original sign and the last column is the noisy sign which is going to be used in our problem.

Solution using processes

What we’ll do now is to solve this problem using processes. We’ll create 4 processes that will be responsible for processing a given number of samples. Each of these processes will store their results in a file called “saida[N].txt”. The output, as well as the input, will be handled as vectors as shown below:

std::vector<float> sinal_entrada;
std::vector<float> saida;

Note that by the time all child processes had finished processing, we’ll have a total of 4 files containing all data we’ll need to create the original sign.

First, let's see how our processing function will look like:

Notice that we’re writing every filtered sign only after finishing each processing cycle which depends on the number of N. I also decided to use the push_back() function to push elements into the vector saida from its back.

When it comes to writing the output, it’s pretty straight forward. You can jump to the source code to see how it’s done.

Now, we need to create 4 child processes where each one will receive a different number of N. We will begin by reading the file “input.dat” and storing it in a vector. After that, we need to create and initialize all 4 child processes as shown below.

As you can see, you’re using pid_t to represent the PID of each process.

Notice that we need to close all processes as soon as they had finished. In order to do so, we use the wait() function in a loop at the end of the parent process. This approach will make the parent wait for all his children to finish processing.

Notice that each child processes will inhere the same attributes from their parent process. It is also important to mention all child processes will run in parallel. Ideally, each core will handle one process, but in reality, the processes can run in a single core or even in serial if they have a short lifetime. The image below shows the order each child process is initialized and finished.

The parent process will finish uppon receiving confirmation signs from his children processes.

Finally, we need to measure the runtime of our program. For that, I'm using the high_resolution_clock from the chrono library.

auto t2 = std::chrono::high_resolution_clock::now();

I run this code for 5 times in an Intel Core i5 processor with 4 cores @ 2.5 GHz.

The average runtime for our application based on processes was 22.95 ms.

I then used the output data and with the help of Excel ploted all filtered signs.

The red graph is the best result we came with. It’s crystal clear from the graph that the higher the value of N, the better our reconstruction process will be. Of course, there is a limitation for that which won’t be discussed here. If you’re interested in finding out more about digital sign processing, take a look at the Nyquist–Shannon theory.

Solution using threads

Now that we’ve completed our problem using processes, it’s time for doing it using threads instead.

For creating our solution based on threads, we’ll keep the same code structure as before where the function responsible for filtering will be called doFiltering(). Each of our threads will contain a value of N in which it’ll perform the filtering process. This time, to make our lives easier, we’ll create a single file with the output. Because threads share main memory, we’ll simply create a vector for each output and combine them at the end of the processing.

We’ll use the pthread_t from the pthread library.

Our doFiltering() function will receive a pointer for its input N. Notice that we’ll send the memory address of our variables for each thread.

When it comes to initializing each thread, we’ll use the pthread_create as follows:

pthread_create(&tid_2, NULL, doFiltering2, (void *)N);

After the execution of each secondary thread, the main process will join the output and write the result in a file named “saida_TOTAL.txt”. The execution below shows each step done by our program.

The method for timing used for the process program was also applied here. The runtime of our thread solution was only 5.70 ms.

Conclusion

We saw that concurrent solutions can be implemented using both threads and processes. In our implementations, we noticed that the one using processes is way slower than the one based on threads.

This four-times difference is due to the overhead which comes with processes.

I hope this post was helpful to you somehow, and if you have any question at all, don’t hesitate in reaching me by email.

jaimedantas.com