Programming Servo: A ‘script’ event-loop
We have structured the entire Servo browser engine as a series of threads that communicate over channels, avoiding unsafe explicitly shared global memory(…) (Experience Report: Developing the Servo Web Browser Engine using Rust https://arxiv.org/abs/1505.07383)
What is so great about channels vs shared mutable state? Channels, and the loops receiving messages on them, combine multi-threading with iteration. Iteration is by nature sequential, making it easier to reason about the behavior of your concurrent system.
A Multi-threaded ‘event-loop’ in Servo
in Servo, concurrent logic is achieved by combining event-loops with multi-threading/processing. What does that mean?
Basically, an event-loop is quite simply a loop, in a single thread, that will poll sources of events at each iteration, and handle them sequentially. We’re all familiar with iteration, so it’s a very intuitive way to deal with concurrent logic.
By the way, nobody tells you what those ‘events’ should be. When we think of ‘an event-loop’ we mostly think of ‘async-io’ type-of-stuff. However, an ‘event’ can be a lot of things, including a message received on a channel from another thread(or, as we shall see, from the same thread).
The essence of an event-loop is that it will poll something, or a few different things, and handle ‘events’ coming from these sources, at each iteration. Using that technique, your event-loop can become the driving force of a complicated multi-threaded system.
Most importantly, while the system is concurrent, the actual iteration of the event-loop is sequential. So at each iteration, we can finely control the order of each ‘step’ that we take.
This might still read as a little bit too abstract, so an example is in order...
A ‘Script-thread’ in Servo.
Basically, think of a script thread as one, or several, tab(s) running in your browser.
“It’s in charge of processing events” refers to page events like clicks and loads that happen on the page, and running the handlers that might have been attached to them. In other words, the “web/JS event-loop”.
A script-thread will also coordinate with various threads to manage the life-cycle of the web page, from loading, rendering, to unloading.
In other words, a script-thread combines handing events(clicks and so on) that originate from the same thread(the JS code), with other multi-threaded logic.
Sounds complicated? Yes it is, however Servo has been designed to minimize the complications of that situation. How? By having the entire thing go through the same event-loop, which turns a complicated concurrent situation into just another loop with a sequential type of logic at each iteration.
Basically, once the script-thread has ‘started’, it will loop and continuously call “fn handle_msgs(&self) -> bool” until ‘false’ is returned.
Anatomy of an iteration of the event-loop
What goes on inside “handle_msgs”? The ‘script event-loop’, one could say. Essentially, the function will receive messages via channels from various threads and processes, and also run a single-threaded event-loop of tasks originating from the JS code running in one of the web pages managed by this ‘event-loop/script-thread’.
This being the web, the behavior of this event-loop is largely specified as part of the ‘living standard’, yet Servo adds some idiosyncratic parts related to it’s own implementation.
Let’s see if we can describe fairly precisely what this loop does at each iteration:
- Steal resize events. Here, we will grab all “resize” events from all web documents we are managing, and handle these before anything else. The events therefore originate in the very same thread in which they are handled, it is an example of ‘single-threaded concurrency’.
- Select at least one event from various channels. There, we will use Rust’s standard library’s “Select” capability to receive one message, the first one that is ready, from a list of ‘ports’, which are the receiving end of channels.
- It is absolutely worth appreciating the fact that one of these ‘ports’ is the ‘script_port’, which is a receiver for a channel , the sender part of which will be passed to a window upon load inside this very same script-thread, and used as a task source for various HTML/DOM related events. The ‘script-port’, combined with the various “task sources”, is a “channel-turned-thread-local-task-queue”, through which the “web/JS” event-loop is implemented. The ‘script_port’ is a receiver, and the various task sources are just clones of the sender to that receiver. Most task sources, end up being implementation of the TaskSource trait, and are just wrappers around such senders, for example the “DOMManipulationTaskSource”(the TaskSource concept warrants an entire post in itself, which will follow in the near future).
- Inside an inner-loop: 1: Handle the ‘event’ collected under 2, only if it relates to the rendering of a web-page, otherwise, 2: add it to the list of ‘sequential’ events to be handled later, and finally 3: try to receive the next ‘event’. When there are no more events to receive, break. In other words, do the rendering related stuff first, and collect the other events for later processing.
- Handle all the ‘sequential’ events collected in the previous step.
- If, as part of 5, you handle a message telling you to exit, set the exit flag to ‘true’, but continue with step 7.
- For each event handled under 5, also perform a microtask checkpoint, which is basically checking and handling any ‘microtask’ that might have been added to the queue as part of handling the messages under 5(a microtask is another example of single-threaded concurrency, and it is worth noting that upon each ‘microtask checkpoint’ the entire microtask queue is drained, as opposed to just one task being run).
- If the exit flag was set to ‘true’ in step 6, return ‘false’, stopping the event-loop of this script-thread.
- Otherwise, for each web document managed by this script-thread, maybe queue their completion(which will trigger things like “load” events).
- For each web documents, “issue batched reflows”.
- Finally, return true, which means that the event-loop will start another iteration at 1 again…
So, this must read as something pretty complicated doesn’t it?
And now, appreciate the fact that as part of this iteration of the event-loop we’ve coordinated concurrent work across different threads(four, if I’m counting right, which are all logically distinct threads, not just part of a pool), and have interleaved this with running ‘web/JS event-loop’ related tasks(and microtasks), resulting in managing the lifetime of one or several web-pages.
Does it seem helpful that, while work is ongoing concurrently, the ‘effect’ of this work on the web page(s) handled by the event-loop is ‘effected’ through what is essentially a sequential list of TODO’s? You can, for example, know for sure that ‘rendering’ related messages will be handled before other messages as part of each iteration, while ‘resize’ events will be handled before anything else.
Now, try to imagine what the above would have looked like if we had tried to synchronize all this work using shared mutable state only…
But beware, while the centralized and sequential nature of such an event-loop makes it easy for you to reason about your concurrent system, it also makes it easier for attackers to guess what is going-on in that system…