Rust concurrency: five easy pieces.
Rust is a language aimed among other things at improving the story around concurrency.
And indeed, the borrow checker will prevent the most egregious data races from occurring in the first place. That is the often hailed “Fearless” approach to concurrency.
However, as I have written before, that gets you only half of the way.
You, the programmer, still need to ensure that the concurrent business logic of your code is robust to the non-determinism that parallel execution entails.
Data races are only half of the story, while the other consists essentially of structuring concurrent logic around the ability to have one thread of execution “wait for something”, while having another thread of execution “signal for something”.
In this article we’ll gradually build up a system, through five distinct stages, exploring common sources of non-determinism and methods to ensure robustness to those.
We’ll cover both using channels, to achieve a kind of “coarse grained” parallelism, mostly matching the concept of task parallelism, as well as the use of condvar and locks around shared data, giving us were appropriate a more “fine-grained” type of parallelism, corresponding mostly to data parallelism.
We’ll end-up with a model system consisting of one “main component”, sending work to, and awaiting result from, a parallel “work component”, which itself will fork-off work to a thread-pool, and make a cache available to workers on the pool to avoid having them perform the same work multiple times.
Full code examples can be found at https://github.com/gterzian/rust_five_easy_pieces.
See also notes on how to improve the reliability of the threading at https://github.com/gterzian/rust_five_easy_pieces/pull/1
Some relevant links to docs:
The Std condvar: https://doc.rust-lang.org/stable/std/sync/struct.Condvar.html
The channel from
What are we looking at?
Let’s say we have the “main thread” of the test acting as what we’ll refer to as the “main component”, and then we spawn a thread for the “parallel component”.
Both the “main” and the “parallel” component run what we’ll call an “event-loop”, which is another way of saying that they run based on a loop which:
- blocks on receiving on a channel,
- wakes-up when a message is received,
- handles the incoming message, and
- goes back to blocking on the channel.
The “parallel” component does this until the
WorkMsg::Exit has been received, and the main component does this until the
ResultMsg::Exited has been received.
The actual “work” is essentially nothing. A requests contains a number, and the result will contain that same number. And this is all we need since we care about the concurrency aspect of the workflow, not the actual work to be performed.
The workflow consists of having the main component send two requests for work, followed by a request to exit. The test then makes certain assertions with regards to the results received:
- They should come in the same order as they were sent, and
- the exit confirmation should come as last.
The fact that the test passes tells us that the concurrent aspect of the logic is robust to the parallelism involved. We’re not getting results in some random order, instead we can make assertions about the order of things.
This determinism is simply a result of the guarantees given by channels: sends from a single thread are received in the order they were sent, combined with the fact that both side of the workflow run a “one message at a time” event-loop.
As we shall see in the next piece, this determinism breaks down once we introduce more threads.
In this second piece, we introduce a thread-pool, which the “parallel” component will use to fork off work to workers when it receives a
On the other hand, the parallel component will still immediately exit once the
WorkMsg::Exit message is received.
To make the test pass, we now need to remove all previous assertions, since we:
- Cannot know in which order the results will be received,
- cannot know if the parallel component will exit after the work has been performed.
What happened is that we introduced another layer of parallelism: whereas previously the two event-loops would communicate directly, the parallel component now also owns a thread-pool, and uses it to fork work in parallel to itself.
With regards to the work being performed on the thread-pool, the non-determinism of the ordering is probably a good thing, after all we want to parallelize those operations, hence we give up the determinism with regards to the order in which they will complete.
On the other hand, having the “parallel” component exit while workers are still potentially doing work is probably sub-optimal. So in the next piece, we’ll look at a way to ensure deterministically that this component will only exit when two conditions have been met:
- It has received the
- all work performed by workers has finished.
Four things have changed:
- The parallel component owns a boolean flag indicating whether a request to exit has been received.
- The parallel component also owns a counter of ongoing work.
- Workers on the thread-pool have access to a new channel, which they use to signal to the parallel component when they have completed a unit of work.
- The parallel component now selects over two channels: the one for messages from the main component, and the one for messages from workers on the thread-pool.
With these changes, we are able to add some deterministic assertions back into our tests: we can be certain the the parallel component will only exit when all units of work have been completed, and the exit requests has been received.
Note that the
select!itself is non-deterministic, but the logic with the flags is written so that this doesn’t matter, which gives a good example that there will always be parts of your system that are non-deterministic, hence the need to “choose your determinisn”, and ensure your logic is robust to certain non-deterministic aspects of your system.
Another good example is the need to wait on a condvar(as we’ll see later) inside a
whileloop: spurious wake-ups are non-deterministic, hence the need for the loop to make the logic robust to it.
Next, we will introduce a piece of shared data to the workers on our thread-pool: a cache of “work”, and we will see how this brings about a new form of non-determinism.
This time, workers on the pool have a “cache of work” at their disposal, and in their result they will indicate whether the work was in the cache or not.
We also send an additional request for work, for the same number as one that was already sent, so that the “cache logic” can be tested.
To our dismay, we cannot make any deterministic assertions about the behavior of this cache.
Sometimes, the result for the second equivalent work request is found in the cache, and sometimes it is not.
The reason for this is that workers race on the cache. While one worker is “performing work” related to a given cache key, the other worker can already check the cache for the same key, find nothing, and start “performing work” as well.
One potential solution to this problem could be to simply “perform work” inside the critical section on the cache, which would ensure no other thread can check the cache and find it empty in the meantime.
However there is one problem with that: it would mean workers could only perform work serially, which is another way of saying that we might as well remove the worker concept altogether and go back to performing work on the “main thread”(the event-loop) of the “parallel” component, as in the first example.
A more effective solution is to introduce yet another piece of shared data, and this time pair it with a condvar for signalling, which is what we’ll do in the fifth, and final, example.
The addition in this round is a
Condvar, and another piece of shared data, a “cache state” that tells us on a per-key basis whether the cache is ready to be read from.
If the cache is not ready for the given key, the worker thread blocks on the condvar until it is signaled on.
This makes our cache logic robust to the non-determinism inherent in the parallelism of the worker threads: a worker either finds the cache “ready” for a given key, or not. If not, it waits for it to become “ready”.
This means that if a worker is “performing work” related to a given key, other workers doing unrelated work(for other keys) can simply do their work in parallel, while a worker doing related work will find the relevant cache state to be “not ready”, and simply wait for it to become “ready”.
We avoid the loss of parallelism that would be caused by making the critical section on the cache include “performing work”, while still retaining the deterministic outcome in case a worker is already “performing related work”.
As a side note, this also shows that “parallelism” is usually about structuring your algorithm so that work is performed without needing to enter a critical section on shared-data. The time to enter a critical section is when some result needs to be “published” via the shared-data, after the result has already been computed, or before performing work, to obtain some necessary initial data.
And that’s it, thanks for reading and hope this helps!
These examples are based on real-world code found in Servo’s net component, in particular how the
CoreResourceManager spawns parallel fetch workers on a thread-pool, and how those parallel fetches share an HTTP cache and a corresponding cache state. Please take a look.
And, as always, I refer to the best article online on practical concurrency: https://www.chromium.org/developers/lock-and-condition-variable