Rust concurrency patterns: Still communicating by moving senders(like it’s 2018)
Since I last wrote about this topic, just only about a year ago:
Doing concurrency in ‘share by communicating’ style has been popularized by the Go community. It’s a valuable approach…medium.com
Select as used in the standard-library channel, has been deprecated. So it’s a good time to re-visit some of the concepts in that article, this time in the context of using
crossbeam channels, and instead of using a made-up example, let’s dig into some real “production” code, as found in Servo.
Let’s continue our exploration of specific concurrency patterns that are useful in Rust, with more details on how to structure your program than that provided by the mostly lower-level concept of “fearless concurrency”…
A core idea is left unchanged
Share your sender with other threads/components, and keep your receiver to yourself.
A simple heuristic for how to use channels. But what does it mean?
The point of channels, in my opinion, is how they allow you to structure isolated concurrent components. And one way to structure isolated concurrent component is by moving clones of
Sender to other components, while keeping the corresponding
That component then runs it’s logic sequentially, by running an event-loop driven by incoming messages on that
Receiver. More complicated patterns can be expressed using
select, and more than one
Receiver. Those multiple receivers should still remain “hidden” in that one isolated component.
That’s why I was such a fan of the original
std::sync::mpsc, because it pretty much enforced that approach by only implementing
Clone for the
Sender, and by making neither the sender nor the receiver
What about things like “fair” work queues, where each “worker” could own a clone of the receiver? Since I haven’t used channels in that way, I can only speculate and wonder why “messaging” is the right pattern for such a setup, when instead using something like a
rayon thread-pool, perhaps with some shared data between workers, might be more to the point?
On the other hand,
crossbeam channels, combined with native threads, are really perfect to model “isolated concurrent components running their internal logic as an event-loop”. Or, since that’s a bit of a mouthful, “non-event-loops”, with the “non” standing for “it has nothing to do with async I/O”.
An example is introduced
It’s point: to monitor other independent components running their own “non-event-loop”. How does it do it? You’ve guessed it: it makes clones of a
Sender available to each component that it wants to monitor, those senders are used to send messages indicating either the start of an activity, or the start of “waiting”, and the monitor can infer when components are hanging when they are not “waiting”, and it hasn’t heard from them in a while.
One could also say the monitor is “tailing the logs” of concurrent components, with the messages in the channel representing a “stream” of logs, as if we dealt with a distributed system.
A motive is questioned
Why use messaging for this implementation?
First of all, due to the multi-process nature of Servo, one part of the deal, that of reporting hangs back to a central hub named the “constellation”, happens over an ipc channel. So that part simply cannot be done with shared-state type of concurrency, since it happens over multiple processes. So if that is done over a channel, why not do the rest as well?
Not a very strong argument, granted…
Secondly, and more to the point, while the “monitoring” part could have been done with shared-state (since a monitor always lives in the same process as the components it monitors), that would have meant potentially blocking the work of the monitored components. When the monitor wants to inspect “logs”, and a monitored wants to append a “log”, one of them is going to be waiting on the lock to be available. Ideally we’d want the monitor to block, but it might as well be the monitored. Monitoring would then potentially become another source of blocking for the component doing actual work, and there are already plenty of those as it is…
By using an unbounded channel, monitored component can always do a non-blocking send when they want to “append a log” for the monitor to eventually process.
A non-event-loop is encountered
This piece of code represents the internal logic of the monitor. Simply, it blocks until a message is received, and then further drains the channel, handling all available messages, finally performing a “hang monitor checkpoint”. Oh, and if nothing comes up, it wakes itself up using the very handy
It’s a simple list of steps, occurring at each “turn” of the loop, and therein lies it’s power. A glorified for-loop really, and not even worth mentioning in a blog, if it wasn’t for our instinct to often make things more complicated then they need to be…
How does this thing “run”, you ask? In the following way, using an old-fashioned “native thread”(I’m a bit of a Luddite, you see).
And why does this return a
HangMonitorRegister, not simply a sender to a channel? That is the topic of another blog post.
The important thing for now is that this “hang monitor” is basically a
while loop, running inside a thread.
Finally, how do the monitored components interact with it? Here is an example:
The monitor sender is not directly visible(the other post explains why), and monitored component simply call methods on a trait object, like
notify_wait just before starting to “wait” on an event in their own “non-event-loop”, and
notify_activity (wrapped inside another convenience function above), to notify the start of event-handling.
The benefits of a non-event-loop are clarified
So far I’ve just gone on about the “simplicity” of this approach, and you might be wondering if I am perhaps just full of empty statements.
In what way is this approach exactly simpler than say, the equivalent using shared state?
A single, and simple, tool, the channel, gives us actually quite a lot of functionality:
- We get notified when there is work to do, or after a certain timeout. Replacing a
- We get access to what is effectively “shared-state”, the contents of the channel, but in a non-blocking way. The equivalent of a
But all of this is offered to us in a single tool, with a much easier API, and undoubtedly, a more performant implementation. What we’re left with is a very basic form of iteration over incoming data, with the entire concurrency part of it pretty much entirely hidden from view.
We just start a thread, select on a channel, and run our “non-event-loop” as if it was just another form of iteration happening inside a single-thread(because it effectively is, as far as the API is concerned).
Most importantly, while this “non-event-loop” runs inside a complicated concurrent system, we can simply ignore this complicated concurrency that surrounds us. The logic of the monitor consists of a simple form of iteration in a single thread. We don’t care about the “when” or “ordering” of sending of messages, we only care about the messages as they come in one at the time on the receiver.
So the real benefit of a “non-event-loop” is that you’re not really writing concurrent code anymore. Instead, you connect concurrent components together via async communication protocols, with those components running in different threads, processes, or maybe even different machines! In all cases, you structure those “communication protocols” in such ways as to allow you to not care about the ordering or timing of sends, and you isolate business per component inside independent threads of execution.
In other words, and now I’m finally getting to it: You structure a concurrent system consisting of isolated components and the communication that binds them, but you write sequential business logic. The best of both worlds.
A critical note is elaborated
What about this whole
Receiver shouldn’t have to be
Clone business I mentioned earlier?
Let’s take a look at the “state” of the monitor:
MonitoredComponent ? Right now, there is no concurrency involved in manipulated that piece of state.
Ok, so let’s say I want to “boost” the capacity of this monitoring system, and have more than one monitor per group of monitored components.
Could I have each monitor own a clone of the receiver, and then have them “share” the incoming messages and the resulting workload?
Totally possible, yet the problem is that I would lose the “non-event-loop” benefit, and be right back to “hard concurrency”. Why? Because the “state” of the monitor would have to be shared among monitor workers.
RwLock around an
HashMap, you think?
Well, I actually haven’t been totally accurate in my description of the monitor’s state, because since I wrote that gist above, it has evolved!
Do you see how much bigger the state has become in 6 months? Is it still just a question of adding a
RwLock somewhere? Also, you can count on that state growing further .
That’s the benefit of a “non-event-loop”, no matter how complicated the local state gets, it’s still just local to a single thread, and is mutated as a result of sequential steps, with no multi-threading involved!
The point is not about whether a
Receiver should implement
Clone, the point is that if you never clone or share you receivers outside of a single thread, it will enable you to build such a “non-event-loop”, whose benefit is essentially that the entire multi-threading is hidden from view and that there can never be any races to how local-state is manipulated, since it happens in a single-thread. There can only be failures in designing proper protocols or in properly implementing sequential logic(which is, granted, hard enough as it is).
A straw man is exposed
How about the fact that
send ‘s still happen from the various other threads with whom a clone of the
Sender have been shared? Doesn’t that imply all sorts of potential race conditions as many different threads will “race” to send messages?
No, because the magic of the “non-event-loop” is that you only care about the messages as they are received by the
Receiver. Essentially, you need to structure your event-loop so that the order of “sends” doesn’t matter. This flows naturally from keeping all relevant state local to the thread running the event-loop itself.
That example from MIT, that reads “Unfortunately, message passing doesn’t eliminate the possibility of race conditions”, could use the addition “unless you structure you messaging with non-event-loops”. You don’t structure your logic at the point of sending messages, like is done in that “example of a racy ATM”, you structure them from the vantage point of receiving messages. There is no potential for race conditions there, since it all happens in a single-thread, there is only “yet to be implemented logic and protocols” to ensure the ATM behaves as expected. If you can come up with the right “protocol” for your ATM to bank server communication, it can be implemented without races using a non-event-loop.
Finally, if the sending thread really must wait for a reply, you can add a
Sender to the message, and block on the reply on the corresponding
Receiver. This comes with the additional benefit of making “blocking logic” painfully obvious where-ever it is found in your code(compared with locks, where an endless ability seems to exist to convince yourself it won’t block other threads because “there won’t be any contention”).
A potential misunderstanding is clarified
Am I essentially saying that the only way to build sane concurrent systems is by using channels, and having each “component” use only a single thread, itself owing a single receiver? Did I just write-off any other forms of doing concurrency?
No, actually, a single component, running a “non-event-loop” can internally still get more complicated and own other threads or concurrency constructs(like a thread-pool or task executor, or even other event-loops).
For example: the monitor has one “expensive” operation:
suspend_and_sample_thread , which consists of, well, suspending the thread, sampling it’s registers, and turning those into something readable. That can take quite a bit of time.
Your single “monitor”, could own, say, a
rayon thread-pool and perform at least the final “symbolication” part of
suspend_and_sample_thread operations in parallel(there is very little you can do while another thread is suspended, but once you’ve copied the registers, you could resume the thread, and do the symbolication in parallel to the “non-event-loop”). Or maybe you could even do it on another machine(link to a Servo issue if you’re interested in taking that one on)?
One “step” of the “non-event-loop”, could start one such operation in parallel on a thread-pool. No big deal, and that would still leave you with a nice “non-event-loop” that doesn’t in itself involve any multi-threading in the manipulation of it’s internal state.
As I said earlier, a single “non-event-loop”, could, for what it’s worth, even run an entire sub-system of “non-event-loops”…
Don’t parallelize the “non-event-loop” itself, instead, have it own something allowing it to launch tasks “in-parallel” to itself. Results, if needed, could be enqueued back on the event-loop(say on a different receiver, with clones of senders shared with the “parallel task”, and the receiver included in the main select of the loop).
Such “in-parallel” steps to an event-loop are actually a well-established pattern in the implementation of the Web, for example in how “fetch” is integrated with the event-loop of a web-page, and a similar pattern is found in NodeJS.
So, my point is not that you should structure everything using single-threads that are communicating via channels, what I mean is that the “non-event-loop” and the thread of execution that runs it, forms the basic granular unit at the scale of the system as a whole. When you “zoom” into one such component, you can see other concurrency concepts, like maybe a thread-pool used to spawn parallel computations from steps on the “non-event-loop”, or perhaps even other “sub-non-event-loops”.
Fractal symmetry anyone?
A point is driven home
The point of this catchphrase:
Share your sender with other threads/components, and keep your receiver to yourself.
is that it allows you to build such “non-event-loop”, and the point of these is that they turn what would otherwise be complicated multi-threaded code into a sequential list of steps.
And that example of a “non-event-loop” I showed earlier, well, in six months, it has also grown quite a bit:
It has become much more complicated than when I last wrote the article. Yet, it’s still just a sequential list of steps, running one after the other, in a single-thread.
A final word, or only the beginning?
Rust as a language doesn’t really have an opinion on how to do concurrency or parallelism. https://doc.rust-lang.org/nomicon/concurrency.html
And therefore, it’s up to us programmers to experiment with, and over-time come-up with, opinions “encoded” inside patterns on “how to do concurrency”. Also, it doesn’t hurt to learn from decades of organic growth of such “patterns” in other communities, like HTML, or Mac.
Keep it real…
And by the way
If you’re interested, there is work left to do:
Resolving symbols for anything but the smallest profiles is a long process. It would be great to be able to extract…github.com
Follow up on #16740 Code in components/background_hang_monitor/sampler.rs(yet to be merged from #21673) The code…github.com
Right now when there is more than one content process then each process will report its own set of sampled threads, and…github.com
SpiderMonkey exposes an API for integrating with the gecko profiler in…github.com
The android NDK does not include libunwind. We probably need to create an android-specific unwind-sys crate that…github.com
Or, how I learned to stop worrying and love suspending, profiling, and resuming threads…medium.com