Programming Servo: integrating streams
Streams is one of the Living Standard™ being developped by the WHATWG, introduced as:
Large swathes of the web platform are built on streaming data: that is, data that is created, processed, and consumed in an incremental fashion, without ever reading all of it into memory. The Streams Standard provides a common set of APIs for creating and interfacing with such streaming data, embodied in readable streams, writable streams, and transform streams.
I recommend taking a look at it since “streams” are a popular concept in Rust-land these days, and I think you’ll find that the streams defined in that standard are in some way similar, and in other very different, from streams are used in async Rust, so it’s a refreshing read(and I’m not just referring to the JS API being different, but also about the definition of the internals, in other words the “streaming model” as a whole, which is relevant across languages).
In the meantime this article will focus on how one part of the standard, ReadableStream
, was integrated in Servo.
I write “integrated”, not “implemented”, because actually the stream API is something that is implemented by SpiderMonkey, which Servo can then “integrate” via a set of APIs(for a more general look at how SM is integrated in Servo, see this previous article).
Full code at https://github.com/servo/servo/pull/25873
An architectural overview first
So let’s try to understand what “integrating streams” would mean for Servo. We know already that the streams API is offered to JS code running in the context of the Web. So how is this integration different from other Web APIs?
Well in a way it’s not so different. One difference noted already is that it’s SpiderMonkey that actually implements the streams itself, while for many other Web APIs it’s Servo that has to implement everything and then “make it available to the JS code running in SpiderMonkey”. Now it’s rather SpiderMonkey giving us an API to plug the Rust code into the streams as used by JS.
Another difference with many, but by no means all, Web APIs, is that integrating this one will actually involve making changes in quite a few components of Servo, and restructuring how those components communicate.
In other words, to properly do this, many parts of Servo will have to adopt a kind of “streaming” model of moving data around, which might not have been the case previously.
We can see this in more details, by looking at one half of a “fetch” workflow happening between script and net: transmitting the body of a HTTP request.
With pushes and pulls: transmitting a request body
When some JS code initiates a fetch
, this creates what is essentially a HTTP request, and the body of the request can be provided by the JS code, in the form of various data-containing objects, and also, you’ve guessed it, an actual ReadableStream
.
Actually, each type of body will end-up having a
ReadableStream
associated with it, where in the case of other objects being passed as a body, a stream will be “extracted” from it.
So, once this “request” is received by the net
component, it will start a fetch worker that will run the appropriate algorithms as defined in the Fetch Standard.
One of these algorithm is the so-called “transmit the body of a request” one, which involves reading data from the request body, and sending it over the network.
At that point net
will have to communicate back to script
and initiate “reading from the stream”, one chunk at a a time, in a kind of “pull-based” flow where each time a chunk has been successfully transmitted over the network, the next chunk is requested.
So that’s an obvious integration point for streams, and it will be very different from what came before it.
Before: body as a vector.
So prior to our little “make-over” of the workflow, the body of a request was essentially a Vec<u8>
, as can been seen below:
So how was that body transmitted over the network? Simply, the vector was passed along to a Hyper request builder.
After: body as a… RequestBody!
As you can see also from the above screenshot, this Vec<u8>
was replaced with a RequestBody
.
So what on earth is that then? See below:
Where has the data gone? And why does it contain a “sender”, should that not instead be a “receiver”? Well, remember it’s supposed to be a pull-based streaming flow, so we want net
to request chunks, not just receive them.
I’m not quite sure how to keep this description simple, but I’ll try.
Essentially, when script starts a fetch, an IPC route will be setup using a given receiver of BodyChunkRequest
.
This looks like:
The corresponding sender will end-up in the RequestBody
, which will be sent over IPC to net
.
Then, when the times come for net
to start “transmitting the body” over the network, it will setup it’s own route, using a newly created channel, and then send the corresponding sender back to script, in order to “connect” the two routes. It will then also request the first chunk of data.
This requires two screenshots to show, where we first setup the channel and send the messages(using the sender in the RequestBody
):
Secondly, we setup the route at:
So what’s happening here is that each time we receive a chunk over the route, we:
- Clone a bunch of stuff,
- Spawn an async task, in order to
- Send the chunk on a bounded(1) channel, and when that is done
- Request the next chunk by sending an IPC message back to script.
Finally, we plug this route into the body of the HTTP request, by passing the receiver end of the bounded(1) channel to Hyper, like so:
What I personally think is interesting, is that previously in Servo we would always use only one IPC route, as a kind of “cross-process callback” where you’d send a request to another process, and then handle the response by way of the route.
This time, we’re setting up a second route in the other process, as a way to create what is effectively a pull-based data shuffling workflow.
And how does reading from the ReadableStream
actually work? Let’s take a quick look at that next…
I promise I’ll read it by next week…
Let’s go back to the Route
that was setup in script at the beginning of a fetch:
And this time, let’s dig a bit further into transmit_body_chunk
. In order to do this, let’s first take a closer look at this body_handler
, which looks like:
So here we can see the ReadableStream
, which is actually a Servo-provided wrapper around the actual JS object implemented by SpiderMonkey, and we can see the IpcSender<Vec<u8>>
, which is the sender that the route in net
provides for this one to send chunks as they are read from the stream.
Remember that this TransmitBodyConnectHandler
is found on the IPC router thread, so while it owns a Trusted<ReadableStream>
, it cannot use the stream directly on that thread(since the stream is a JS object tied to the event-loop where it is used, and that event-loop runs in another thread).
So instead of using it directly, it can queue a task back on that event-loop, where the stream will be usable, and then a chunk can be read from it.
I’ve discussed the use of
Trusted
, and the IPC router, in more details in a previous article.
The first part of this looks like:
As you can see, the task is queued, and the closure will then later execute as a task on the corresponding event-loop. The root
call then “transforms” the Trusted
into the corresponding object owned by the thread on which the event-loop is running, and this object can then be manipulated.
We then finally see the call to read_a_chunk
, which returns a “promise”.
The second part shows what is done with this promise:
First of all, it should be noted that this “promise” is in fact a Rust object that corresponds to the actual promise that would be available to the running javascript. And we should also note that the call to read_a_chunk
on the stream, is basically the “Rust native” equivalent of a JS call to read
on a reader for the stream.
So in other words, we are using what are effectively JS API, from Rust, sort of as a convenience but also as a necessity, since the stream we are reading from could be a user provided stream, in which case the corresponding JS code would have to execute in order to read the chunk(the other case would be that we’re reading from a stream backed by a Rust object, but we won’t go into it for now).
What is being done next with the TransmitBodyPromiseHandler
is then another of these “native JS calls”, where here we use a Rust object as the “callback” for when the promise resolves(which is again something you could do in JS).
The reason is again that we want to get the data to which the promise will resolve, to then later send it over IPC to net
.
I’ll go easy on the screenshots, so instead of posting it here you could take a look over at https://github.com/servo/servo/pull/25873/files#diff-ae0ff1fd98b06dfb13baf427cdffc28aR230
The end-result is essentially that when the promise resolves, we take the data out and send it to net
over IPC.
This will then execute the route that you saw earlier, where net
feeds the chunk into the async stream of the Hyper body, and then requests the next chunk.
This new chunk request will then result in restarting the “reading of a chunk” workflow you’ve just looked at, over in script
, and this will go on until the stream is “done”, at which point the mechanisms on both side will be torn down.
That’s it
So there we are. I’ve skipped a whole bunch, since we haven't’ looked into how the Rust actually interfaces with the SpiderMonkey API for reading chunks, and I also haven’t gone over how you can provide a Rust object to be the “source” of a JS stream. And I think this is enough for the day(also I’ve noticed that, weirdly, the shorter the article, the more people actually read it, go figure).
You could still take a look at the implementation of these concepts at https://github.com/servo/servo/blob/f3c70f1d7b425d290aeae6ff6e31140b6336f8aa/components/script/dom/readablestream.rs
My takeaway for this is that you can build some pretty useful streaming workflows, which go a bit beyond mere message-passing, as they can be used to model both “pull-based” workflows(like the one we looked at today where net
would “request a chunk” from script), as well as the more traditional “push-based” flow, which I think matches message-passing in general.
And those workflows can cross process boundaries, consist of both “native” Rust APIs and code running in a VM, involve multiple threads, and compose with async tasks at the “edge” of the system, where the actual networking happens.
Finally, below are some follow-up issues in Servo: