Rust concurrency patterns: regretless concurrency?

Rust offers the promise of “fearless concurrency”, and delivers on it through memory safety. This safety doesn’t guarantee code that is easy to maintain, or bug free.

If one is not “fearful” of complexity, concurrency can easily become a story of regrets. Can we get a “regret-less” kind of concurrency?

There is one particular approach to concurrency that I find “only gets better with time”, and I’d like to describe it in some details in this blog.

A sensible default: the event-loop

An event-loop is basically a “loop” which will, you’ve guessed it, handle “events”, preferably one at a time. You might have heard of the “Web event-loop”, and since your also know that Javascript code running on it is single-threaded, you might also be wondering why I bring it up here.

The reason I bring it up here is because an event-loop might be your best “default” for modelling concurrency, and the Web is actually a prime example.

Looking at this screenshot, I count about 200 threads in Web-related processes. What are those threads doing? Ever wondered how something like fetch(url) just works, and without blocking your Javascript code? How come that adding an <img src="friday_night.jpeg"> to an HTML page actually loads an image?

And yet, despite this bacchanal of threads going on under the hood, we still refer to Javascript as “single-threaded”, and that is correct. You will, almost never be able to detect the concurrency of the Web platform from Javascript itself.

How is this done? With the help of event-loops(give or take one per tab in your browser).

The Javascript on a page runs one “task” at a time. Where do those tasks come from? Some of them originate from a previously running task on the same event-loop. For example when a page first loads, a task is enqueued to fire certain events in a later iteration of the event-loop. In this case, running a task results in a another task being enqueued on the same event-loop. A form of “single-threaded concurrency”, one might say.

Other tasks are enqueued from other components running “in-parallel”. When you do a fetch(url), the Javascript offloads this request to a dedicated “networking component”. That component is running in a different thread, or different process, hence your Javascript code doesn’t block while the response from the network is awaited and handled.

When (part-of)the response from the network comes in, what happens? The networking component will enqueue a task back on the Javascript event-loop where the request originated.

The beauty of is that, while the Web event-loop is fairly convoluted, the concept of an event-loop itself is very simple, and you can use it to model your concurrent Rust code, starting simple and making it as complicated as your problem requires.

The native thread: a simple building block.

To go about building our own event-loops in Rust, we need a simple way to represent a “running component”. The thread, a.k.a “native thread”, is a good bet.

Threads sometimes get a bad rap for being “heavy-weight”, with people writing that they are running millions of a certain flavor of “lightweight threads” on a single machine, whereas this particular feat of engineering could never be performed with “native” threads.

When I read for example “Why you can have millions of Goroutines but only thousands of Java Threads”, I wonder, “who said that threads should be the basic unit of computation?”. If you think of a thread as a “running component” that can handle many “units of computations”, with the “unit” being a event/message, there is no need to spawn millions of them to handle, say, millions of events. Instead you’ll spawn about one thread per “logical component”, and let them handle as many events as sensible.

I can only speak for myself but I think that my brain can only handle about 25 to 50 logical components, and even on a good day, that is stretching it. Your OS will have no problems handling hundreds, even thousands, of threads. One could therefore say that the number of threads is less bounded by the capabilities of the OS to context switch between them, but rather by your brain’s capability to context switch on the logic represented by them.

Finally, and here I’m jumping ahead, when I write that a component can be “represented” as a thread, I don’t mean that it absolutely must consist of only a single thread. If needed, one such component could own a thread-pool, or a “lightweight task executor” to perform internal computations(the emphasis being put on “internal”).

Channels: your source of events

If threads represent a “component”, how can we represent “events”, and “loop” over incoming ones? Enter: channels.

You can think of a channel as a parallel queue of events, with component using those to enqueue events, in a form of message-passing, on each others event-loops. Note that “parallel” is only in the eye of the beholder here. From the perspective of one component’s event-loop, the others are running in parallel. Yet each component is actually running a single-threaded, sequential, event-loop.

Time for an example…

Let’s take a look at a real-word concurrent component running it’s own “event-loop”: the background-hang-monitor in Servo.

As you can see from the snippet above, we have a BackgroundHangMonitorWorker “running until it drops” in a native thread.

We can also see that it makes available to the world a HangMonitorRegister, which appears to be a wrapper around a channel sender.

It also seems that another channel sender, called constellation_chan is being shared with the worker.

So far this follows the mantra of “communicate by sharing your sender”…

First, a word on what that component is supposed to be doing: its purpose is to monitor other components for “hangs”, and whenever one such “hangs” is noticed, alert another component, called the “constellation”. More details on it can be read in a previous article.

How do we “monitor” other components? Should we share some state and keep an eye on it in a loop? Nope, components will simply send messages to the monitor, and the monitor can infer the activity/hanging of component from the messages it receives.

The above is essentially the “event-loop” of the monitor. A few features stand out: we’re select ‘ing not only over our receiver, but also over acrossbeam_channel::::after, introducing a useful concept of sending ourselves a message if nothing comes in from monitored components. It’s useful because as you can imagine, we can’t expect hanging components to send us a message telling us to check up on them when they are hanging…

Also, we add a little while loop to drain the channel at each iteration. The thinking behind that is: since we end-up doing some fairly heavyweight sampling of hanging components in some cases, the draining of the channel at each iteration hopefully reduces spurious hangs that could emerge from a “one message, one checkpoint, per iteration” model.

How do things look like from the perspective of a “monitored” component?

Equivalent of

As you can see, this monitored component is also running an event-loop.

The difference is that before blocking on the select , the component will send a message to the hang monitor, notifying it that it will start “waiting” for an event to come in(meaning that idleness shouldn’t be interpreted as a “hang”). When an event comes in, the component will notify the start of a new event handling by sending another message(meaning that if the monitor doesn’t hear back from the component for a while, it is apparently hanging on handling that last event).

A note on distributed systems

Some of you might have thought that the “background hang monitor” described above appears to be “tailing the logs” of monitored components, with the messages sent by components representing the logs.

If you look at the use cases for something like Kafka, you’ll see something fairly close to “message passing” across event-loops as well.

The point is that the event-loop is a simple, yet versatile abstraction, that turns out to be useful in a variety of settings besides when using multiple threads, such as when using multiple processes, and even when doing stuff over the network.

What if it’s more complicated?

What if your component needs to run some complicated parallel algorithm in response to an event(requiring “data parallelism”, as opposed to the “task parallelism” described earlier in the form of parallel event-loops)? How can you model that? As I hinted at earlier, modelling your component as a thread running an event-loop doesn’t mean that it must consist of only a single thread. Your component, while running inside a thread, could still own, or at least have access to, other constructs, such as a thread-pool, on which to run you complicated parallel algorithm.

Your event-loop could, in response to an event, “spawn” all sorts of internal computations. Those could block the event-loop, meaning that no events are handled while the computation is in progress, or it could happen “in-parallel” of it, meaning that event will keep being handled(and perhaps more parallel stuff being spawned). One “event-loop” could, for what it’s worth, even run an entire separate sub-system of event-loops...

Once again, I couldn’t put it better than the HTML living standard:

As an example, the monitored component referred to above is in fact the “layout” component in Servo, and in the context of running its event-loop, there is one event called Msg::Reflow, which actually involves running a complicated parallel algorithm using a rayon thread-pool.

Another example, the “networking component” in Servo, as part of handling a Fetch event, will not only spawn additional threads, it will also use a tokio runtime to do the actual networking.

Does everything have to be an event-loop?

Absolutely not. The most important quality of a component is probably isolation, and running an event-loop using channels is just an easy way to achieve that.

And sometimes, your goal is really to share data among threads(but that’s probably best left as an implementation detail hidden inside a component).

For example, the “networking component” in Servo, while the “main” event-loop of the component is indeed handling event sequentially, these events can spawn several instances of the “fetch” algorithm running in parallel of each other(without blocking the main “network” event-loop). The http-cache used in “fetch” will thus have to be shared among these parallel fetches. While the cache could be modeled as an event-loop, it is probably overkill in this case. So the cache is basically a RwLock<HashMap<CacheKey, Vec<CachedResource>>shared among parallel instances of the fetch algorithm.

Note that the parallel fetches mentioned here are still clearly isolated within the “network” component.

When the “script” component, the one running the web event-loop, is sending a “fetch” message to the network component, that message becomes an “event” to be handled on the “network” event-loop, resulting in a parallel fetch being spawned, which will result in a “task” running on a Tokio runtime handling the networking, and the “effect” of the incoming data from the network will only propagate to script when the appropriate event is enqueued back on the script event-loop, via a message sent by “network”(for the gory details, see an earlier article: “Anatomy of a Fetch”).

Still following? The point is, both “script” and “network” run their own event-loop, and communicate by enqueuing events on each other’s loop. What “handling an event” entails is not the business of other components, and while it can involve a plethora of things, my advice would be to keep those things hidden within a component, as opposed to making them part of the interface to a component.

The End, or the beginning…

Further reading:

  1. “Threads without Locks”, by Russ Cox.