Rust is a language aimed among other things at improving the story around concurrency — the borrow checker will prevent the most egregious data races from occurring in the first place.
That is the often hailed “Fearless” approach to concurrency. But, as I have written before, that gets you only half of the way. You, the programmer, still need to ensure that the concurrent business logic of your code is correct.
Data races are only half of the story. The other consists essentially of structuring concurrent logic around the ability to have one thread of execution “wait for something”, while having another thread of execution “signal for something”.
In this article we’ll gradually build up a system through five distinct stages — exploring common sources of non-determinism and methods to ensure robustness to those along the way.
We’ll cover using channels — to achieve a kind of “coarse grained” parallelism, mostly matching the concept of task parallelism. And also the use of condvar and locks around shared data — tools giving us were appropriate a more “fine-grained” type of parallelism, corresponding to data parallelism.
We’ll end-up with a model system consisting of one “main component” sending work to — and awaiting results from — a parallel “worker component”. The worker will fork-off work to a thread-pool, and make a cache available to the pool — to avoid having them perform the same work multiple times.
Full code examples can be found at https://github.com/gterzian/rust_five_easy_pieces.
See also notes on how to improve the reliability of the threading at https://github.com/gterzian/rust_five_easy_pieces/pull/1
Some relevant links to docs:
The Std condvar: https://doc.rust-lang.org/stable/std/sync/struct.Condvar.html
The channel from
What are we looking at?
We have the “main thread” of the test acting as what we’ll refer to as the “main component”, and the spawned thread represent the “parallel worker component”.
Both the “main” and the “parallel” component run what we’ll call an “event-loop” — another way of saying that they run the following steps in a loop.
- Block on receiving on a channel.
- Wake-up when a message is received.
- Handle the incoming message.
- Go back to blocking on the channel.
The “parallel” component runs through this until the
WorkMsg::Exit has been received, and the main component does this until the
ResultMsg::Exited has been received.
The actual “work” is nothing — requests contains a number, and corresponding results contain that same number. This is all we need because we care about the concurrency aspect of the workflow, not the actual work being performed.
The workflow consists of having the main component send two requests for work, followed by a request to exit. The test makes the following assertions with regards to the results received:
- They should come in the same order as they were sent, and
- the exit confirmation should come as last.
The fact that the test passes tells us that the concurrent aspect of the logic is robust to the parallelism involved — the results are not in some random order.
This determinism is simply a result of the guarantees given by channels — messages sent from a single thread are received in the order they were sent — combined with the fact that both side of the workflow run a “handle one message at a time” event-loop.
As we shall see in the next piece, this determinism breaks down once we introduce more threads.
We have now introduced a thread-pool, which the “parallel” component uses to fork off work to workers whenever it receives a
WorkMsg::Work request. The parallel component still immediately exits once the
WorkMsg::Exit message has been received.
The test now fails, because:
- We cannot know in which order the results will be received,
- We cannot know if the parallel component will exit after or before some last piece of work has been performed by the thread pool.
This is a result of having introduced another layer of parallelism — the parallel component now owns a thread-pool, and uses it to fork work in parallel to itself.
With regards to the work being performed on the thread-pool, the non-determinism of the ordering is probably a good thing — after all we want to parallelize those operations. On the other hand, we’d like for the parallel component to only exit when all the work on the thread pool has been finished.
In the latest iteration, four changes have been introduced:
- The parallel component owns a boolean flag indicating whether a request to exit has been received.
- The parallel component also owns a counter of ongoing work.
- Workers on the thread-pool have access to a new channel, which they use to signal to the parallel component when they have completed a unit of work.
- The parallel component now selects over two channels — one for messages from the main component, and one for messages from workers on the thread-pool.
We are now able to add an assertion back into our tests — that the the parallel component will only exit when all units of work have been completed, and the exit requests has been received.
Note that the
select!itself is non-deterministic, but the logic with the flags is written so that this doesn’t matter, which gives a good example that there will always be parts of your system that are non-deterministic, hence the need to “choose your determinisn”, and ensure your logic is robust to certain non-deterministic aspects of your system.
Another good example is the need to wait on a condvar(as we’ll see later) inside a
whileloop: spurious wake-ups are non-deterministic, hence the need for the loop to make the logic robust to it.
Next, we will introduce a piece of shared data to the workers on our thread-pool — a cache of “work” — and we will see how this brings about a new form of non-determinism.
Workers on the pool now have a “cache of work” at their disposal, and in their results they will indicate whether the work was in the cache or not. We also send an additional request for work — for the same number as one that was already sent — so that the caching logic can be tested.
However, the tests are now failing — sometimes, the result for the second equivalent work request is found in the cache, and sometimes it is not.
The reason for this is that workers race on the cache. While one worker is “performing work” related to a given cache key, the other worker can already check the cache for the same key, find nothing, and start “performing work” as well.
One potential solution to this problem could be to simply “perform work” inside the critical section on the cache — ensuring no other thread can check the cache and find it empty in the meantime — however this would mean workers could only perform work serially.
A more effective solution is to introduce yet another piece of shared data, and this time pair it with a condvar for signalling. This is what we’ll do in the fifth — and final — example.
The addition in this round is another piece of shared data — a “cache state” that tells us on a per-key basis whether the cache is ready to be read from. If the cache is not ready for the given key, the worker thread blocks on the condvar until it is signaled on.
This means that if a worker is performing work related to a given key, another worker about to do related work will find the relevant cache state to be “not ready” — and wait for it to become “ready” — while other workers doing unrelated work can do it in parallel.
As a side note, this also shows that “parallelism” is usually about structuring your algorithm so that work is performed without needing to enter a critical section on shared-data. The time to enter a critical section is when some result needs to be “published” via the shared-data, after the result has already been computed, or before performing work, to obtain some necessary initial data.
These examples are based on real-world code found in Servo’s net component, in particular how the
CoreResourceManager spawns parallel fetch workers on a thread-pool, and how those parallel fetches share an HTTP cache and a corresponding cache state. Please take a look.
And, as always, I refer to the best article online on the practical use of concurrency: https://www.chromium.org/developers/lock-and-condition-variable