Loose Coupling in Asynchronous Systems

When Windows designed the WinRT API (first released as part of Windows 8), they made a large commitment to using asynchrony (in the form of asynchronous interfaces) to clearly communicate the potential of some API to block or take an unbounded amount of time to complete (typically because of some form of IO — file, network or inter-process communication). This contrasts sharply with Win32 which does not provide such consistency (and is the characteristic that has led some to describe the Win32 API as a set of landmines). There is nothing in the basic signature of a Win32 API that gives any indication as to whether an API will return with a result in a predictable amount of time or will take some unbounded amount of time to complete and return a result. In fact, the predictability of an API’s performance would often change release-to-release as new features were added to Windows — e.g. support for mirroring parts of the file system on a remote server.

Because of the challenges of asynchronous programming and the changes in API design in Windows 8 that would make asynchrony more common, language assistance was added in the form of the async and await keywords and associated runtime facilities. Among other attributes, these allow an asynchronous algorithm to be described in a form that looks like a standard sequential program. Under the covers, the compiler and runtime generate a set of mechanisms that transparently handle capturing the state of the routine and then continuing the computation when the asynchronous request completes.

Certain of these features ease basic syntactic interaction with asynchronous APIs — similar to the use of “smart pointers” for managing AddRef/Release in COM programming. However, I’ve always considered the overall approach of trying to hide asynchrony as often dangerous and mostly ill-advised. There are a few underlying reasons that show up over and over again in actual usage.

  • Lack of isolation leads to problems. Although these techniques are often demonstrated in terms of an isolated computation, this technique is frequently used in situations where the program needs to publish status and incremental progress out of the computation to other parts of the program — for example to let the user know that progress is being made and publish partial results to the UI. Alternatively, you want the asynchronous processing to be impacted by changes in state that occur outside the computation — for example, the user has taken some action that impacts what the program was trying to accomplish and the action might need to be restarted or canceled. The pseudo-sequential computation needs to be ready for its world state to have changed significantly whenever execution restarts even though there is little in the literal coding of the pseudo-sequential routine that makes this obvious. In fact, the code is specifically designed so it is not obvious. So it is easy to make mistakes and the code that is easy to write is also easy to misunderstand. This is very similar to the challenges that unexpected reentrancy introduces in Windows applications. Surprising changes in global state occur when making innocuous looking calls.
  • Isolation leads to problems. Long-lived state is hidden inside the routine. Although the routine is written as if the code executes sequentially, in fact the intermediate state of the computation is really long-lived program state. In almost all cases, it is good practice to make this long-lived state explicit and explicitly managed so it is in some way accessible to the rest of the program. Simply put, the program should know what the heck it’s doing.
  • These async operations are often expensive operations, either in latency or other resources. This typically leads to a desire to share or otherwise optimize them. Writing them as independent isolated routines tends to make sharing and optimization more difficult.

More generally, introducing asynchrony often requires explicitly dealing with issues of overall resource management and flow control through the asynchronous processing, neither of which is addressed by representing the computation in pseudo-sequential form.

This post about the networking infrastructure inside the Google Chrome browser provides a great concrete example of the motivations behind these arguments. A key part of the browser infrastructure involves asynchronously fetching resources (HTML, CSS, images). This involves a number of asynchronous steps to check the local resource cache, resolve the domain address, establish a TCP connection, optionally negotiate an SSL connection, make a resource request and then process the response.

Inside Chrome, each of these stages are broken apart and can be independently optimized (e.g. the browser may speculatively launch a DNS resolve request when the user hovers over a link in order to reduce the end-to-end perceived latency if the user does end up clicking on the link). Or it may reorder the priorities of a set of resource requests once the connection has been established based on user actions that have occurred in the meantime — e.g. the user has switched to a different tab in the browser. In general, the approach is to treat each asynchronous completion event as an action to be processed based on the current state of the application — not so different from the challenges in building an event-driven user interface. By writing this in a way where the application can “look around” when an event completes, it is more straightforward to introduce additional intelligence as well as making the long-lived intermediate state of the computation explicit. If each request was written as a tightly bound set of continuations, it would be much more difficult to extend in this way.

This may not be true for every asynchronous computation you need to execute, but it is certainly true for the key elements of a program’s architecture. In many ways, the trade-offs here seem similar to those involved in whether to invest in a robust loose coupling for model and view or to build an easier-to-write but more difficult-to-optimize tightly bound coupling. The loose coupling requires more initial infrastructure but the dividends almost always pay off for any long-lived complex application. In the examples described in the Chrome post, it was straightforward to implement optimizations like pre-warming of DNS lookups, re-using of TCP connections, re-prioritizing of requests, etc. because each of these stages were cleanly isolated and the state that was being maintained was clearly structured and explicit. This flexibility is a consequence of the loose coupling built into the approach. It was not an accident of the architecture — it was core to the intended design.

Many asynchronous computations end up looking like a sequence of macro-state transitions interspersed with micro-state transitions — e.g. processing a response might require multiple asynchronous reads before enough data is available to process while the macro state remains “Reading Response”. However, it is easier to gracefully manage the state of these micro transitions if you’ve identified the major independent and loosely coupled major stages with the valuable resources and state that they manage.