Adventures in Async
I was very recently the holder of three opinions:
- Rust is magic fairy dust that can fix all my problems
- Async is magic fairy dust that can fix all my problems
- GraphQL is magic fairy dust that can fix all my problems
Unsurprisingly, all of these appear to be wrong. This post is about the first two.
In the journey to get to product/market fit with Dark, I’m taking a slightly different strategy than we took before, which I’m calling “Hard Things First”. Repeatedly with Dark we spent significant time coming up with hacks to work around previous hacks which themselves were working around previous hacks. That made for a codebase where people can only contribute is well-defined ways, as going outside the box was to invite madness. Or more specifically, to need to have the entire history of the codebase and company and roadmap in your head to understand why the code is like that.
So instead, I’m looking at doing the Hard Things First. Dark has a few problems in its server-side implementation, and those need to be fixed. We long speculated that we’d be able to hack them for now and do a Big Rewrite Later, with the copious resources that we’d have after the Series A. But alas, that is not how things are going, so I now need to figure out some of these Hard Things, and the first question is whether to stay in OCaml or switch to something else.
Staying in OCaml
If we stay in OCaml, I need to figure out how to solve a number of key problems, the biggest being how to not hang the server when making a slow
HTTPClient::get request. The Dark web server is currently synchronous, and so long or slow requests--at sufficient volume--can cause operational issues for us. Solving that is something we aggressively put off; but the first of the Hard Things First to address. And that's where async comes in.
Quick async recap
As a quick introduction, the word async describes a way of allowing servers respond to many more requests by taking advantage of non-blocking IO. In the old days, servers used one thread to respond to each request. Now, servers commonly use an async implementation where a single thread can handle many many requests.
They do so by handling the requests in a simple loop. Each request is added to a queue when it comes in, and the request at the front of the queue is run by the thread. If the handler does some IO (perhaps talking to the DB or to a HTTP API), that thread will stop working on that request, and move onto the next request in the queue. When the IO finishes, it is re-added to the queue, where the thread will come for it shortly.
This is sort of like context switching between threads, but much lower overhead as you can switch between items in the queue much more cheaply than switching between threads. A single thread can handle dozens or hundreds of requests at once (so long as they are mostly IO-bound), so you can process many more requests than using simple multi-threading (and of course, you can still be multi-threaded in a multi-core situation, running an async server on each thread).
If you enjoyed that, you’ll probably enjoy and learn a lot from this post.
Back to OCaml
In OCaml, there’s two competing async implementations, Lwt and (the aptly named) Async. The programming model isn’t terrible since OCaml 4.08, when they added a
let* keyword. This is roughly akin to the
await keyword in JS, Python and Rust. So you take your synchronous code:
let sync_function () : int =
let x = sync_function () in
x + 2
and rewrite it asynchronously
let async_function () : int Lwt.t =
let* x = async_function () in
Lwt.return (x + 2)
Lwt.t here is a promise, and it's the same concept as a Promise in JS, and
Async<...> in Rust and F#.
The difference between the Lwt and Async libraries seems to be that Lwt is greedy (it will keep going on the same request if it can, that is, if the promise resolves fast enough) and Async is not (it will spread the work around if it can). So, roughly speaking, Lwt prioritizes latency and Async prioritizes fairness and preventing requests from being starved.
Other OCaml improvements
Of course, the lack of async isn’t the only problem in the Dark codebase. There are many things in our codebase that make it attractive to look at redoing the server-side implementation in another language.
However, having looked through the backend codebase recently, I think most of it can be solved by refactoring and using some of the tools that have come along in the last few years. In particular, Caqti solves some of the needs that are not solved by using Postgresql-ocaml directly (such as connection pooling). Slih has come along to make it much nicer to write web services in OCaml. httpaf and H2 look to solve some of the core performance problems in CoHttp. And OCaml-graphql-server provides a nice graphql server in OCaml.
The other stuff that needs to be worked on to keep in OCaml is the tooling, but I think a lot of the problem comes from how we’ve mashed everything together, and a little bit of performance optimization in our build scripts might solve a lot of problems.
Rust and F#
I spent some time rewriting the core of Dark in Rust and F#. Not the full thing, but just enough to connect a server to a Dark interpreter and calculate Fizzbuzz. The goal is to get a feel for the ecosystem, programming model, and of course the performance for asynchronous workloads like people build on Dark. Early learnings indicate that magical fairy dust does not exist.
The first learning, and I’ll probably write some more about this later, is that coding in Rust is hard. Like, much harder than I expected. And using async in Rust is exponentially harder than writing synchronous Rust, to the point that I’m not even sure that it’s a good idea for anyone to do it (again, more on this in a future post). I had not realized how much the garbage collector does for you in managed languages, and how much complexity managing your own memory adds in an asynchronous server.
By contrast, rewriting the core of Dark in F# was very simple, and adapting it to use async was also extremely straightforward. This comes from someone who doesn’t know anything about the dotnet ecosystem, but who does of course know OCaml, which is about 95% the same at F#.
The code is at https://github.com/darklang/sync-async-benchmarks, which I hope to turn into benchmarks. I hope they’ll help inform my decision about which way to move forward. I’ve been learning a lot from the experiments with Rust and F#, and I’ve been using the experience to start working on a roadmap/spec for the next version of Dark, which I’ll talk about soon.
As you can imagine, I’m eager to stop experimenting and make some decisions, and I’m looking forward to getting back to moving the Dark language and implementation forward.