Async Jersey + Kotlin Coroutines

Dan Murphy
ClassPass Engineering
5 min readMay 17, 2022

--

Jersey is a popular framework for building RESTful web services on the JVM. At ClassPass, we make heavy use of Dropwizard, which is a web framework based on Jersey. With Jersey, requests are processed synchronously by default. In particular, each request is handled by a single thread, where the thread is blocked until the request is complete. Endpoints in Jersey apps are defined using JAX-RS resource classes. For example, here is how we would define the endpoint GET /classes

This is problematic in apps with resources that take a long time to execute since it can lead to us quickly exhausting the I/O Container thread pool, which Jersey uses for processing requests. This creates a bottleneck on throughput which affects all resources in an application. Fortunately, Jersey provides an API for asynchronous processing. With the asynchronous API, I/O container threads that initiate a long running request can be returned to the I/O container pool immediately to process new requests.

Here is how we would implement the GET /classes endpoint with the asynchronous API:

In this implementation, we pass the work to get our Classes off to a new thread and exit the getClasses resource method. The request connection is now suspended and the I/O container thread that entered getClasses is released back to the I/O container thread pool, where it’s available to process new requests. When the new thread is done computing Classes(), calling asyncResponse.resume(classes) resumes the suspended request on an I/O container thread which completes the request and closes the connection.

Avoiding Connection Leaks

One of the drawbacks of the asynchronous API is that we have to be careful to avoid connection leaks. For example, if the Classes constructor call throws, we never call resume and the client connection will not be closed. One way to avoid this is to configure the asyncResponse to timeout after a specified amount of time, which can be done by implementing a TimeoutHandler. While this works great as a catch all, it can be problematic for error signaling since this will result in the server responding with a 503 status code with no other context. Aside from normal error handling, It is generally a good idea to catch any Throwable and resume with it.

Integrating with Kotlin Coroutines

Kotlin coroutines offer concurrency with a high level API and constructs that provide a safe way to manage concurrent operations. One major benefit of coroutines is that they are lightweight. In particular, coroutines are not bound to a single thread, allowing for non-blocking asynchronous operations. Used in conjunction with the Jersey asynchronous API, we can avoid issues related to thread contention at both the Jersey I/O thread pool level and in thread resources used for executing our application logic (which in some cases might just expose bottlenecks somewhere else, but it’s progress either way!).

Suppose we have an endpoint with logic that can benefit from use of concurrency, and we want to implement the logic with coroutines. It is common to use runBlocking to create a bridge into coroutine-land, but that doesn’t play nicely with the Jersey asynchronous API since runBlocking will block the request processing thread and prevent it from being returned to the I/O container. We can instead construct a new CoroutineScope which allows us to kick off a coroutine in a “fire and forget” fashion so the request processing thread can be returned to the I/O container while the coroutine is executing. Consider the following example where we have a resource method getClass that concurrently fetches class details and a list of attendees, then combines them into a response object that contains the class details and list of attendees.

While this works well enough, the implementation leaves something to be desired in terms of ergonomics. Ideally we would be able to implement our logic with coroutines, but avoid the extra ceremony involved in creating a new CoroutineScope along with adding a try/catch to ensure we resume on all exceptions. To make for a smooth integration and help avoid some of the pitfalls of the Jersey asynchronous API, we built an extension function on AsyncResponse that accepts a function block parameter for the resource logic.

With submitToCoroutine, we can implement our getClass resource as follows

Challenges

One of the challenges we encountered relates to use of MDC for providing context to logs. MDC is essentially a ThreadLocal map that’s used to provide context in logs without explicitly needing to pass values around. In our case, we had a system for including a transaction id in logs where the transaction id was included in the MDC context. This system allows us to search for logs by a transaction id to find all logs written for a single transaction. Fortunately, there is a package that provides integrations for coroutines and slf4j, which includes an MDCContext coroutine context element which allows us to bring theMDC context into our coroutine.

Another challenge was related to instrumentation with New Relic. At ClassPass, we use New Relic for APM, and take advantage of some of its transaction tracing features to get visibility into endpoint performance (e.g. see what I/O operations are happening and how much time is spent on them). For each web transaction, a New Relic transaction is created and tied to the thread that is handling the web transaction. When we pass work to a coroutine, the work is run on a separate thread, and the New Relic transaction is unable to keep track of this work. To get around this, we took advantage of an API offered by New Relic for instrumenting asynchronous activity. In particular, Token objects can be generated from New Relic transactions, which can be passed between threads to link asynchronous work to the original transaction. Here is a simple example of using a token to link asynchronous work to a transaction:

Since coroutines aren’t bound to threads, we have to ensure the New Relic transaction is linked whenever a coroutine resumes on a new thread. We solved for this by building a dispatcher that links the transaction when resuming a coroutine:

Threading all of this together, the submitToCoroutine extension function ended up looking something like this:

Outcomes

Using the Jersey asynchronous API along with coroutines has allowed us to make our applications more resilient in the case of large traffic spikes and isolate the impact of higher latency endpoints.

--

--