Bye-bye Callback! Welcome Coroutines!

Ismail Afiff
Traveloka Engineering Blog
6 min readJun 23, 2020
Figure 1. Don’t block, keep moving

Editor’s Note: Today, we hear from Ismail Afiff, a Principal Software Engineer with the Universal Search team, on his experience discovering the pain points of using asynchronous or non-blocking programming pattern to achieve concurrency via Callback and tackling its shortcomings to achieve feature and performance parity with its counterpart — Coroutines while delivering better expression of asynchronity in terms of safety, lucidity, conciseness, and adaptability.

The Universal Search team is responsible for delivering solutions that allow users to fulfill their travel and lifetstyle needs through smart query understanding and recognition.

The Challenge of Writing Async Code

Asynchronicity or non-blocking function is one of Traveloka’s Backend Engineering’s staples. In fact, a lot of performance-intensive parts of Traveloka use async patterns heavily for high performance reasons (including the commonly used REST Client).

In Traveloka, async or non-blocking is generally expressed via Callback — a Continuation function passed into the calling function as an argument. Although using Callback has solved the high-performance challenge of serving millions of customers, our codebase still progressively becomes more difficult to understand, error-prone, and inflexible as it grows 10x.

Therefore, in this article, I will discuss how we utilized non-blocking function and asynchronicity and improved their application that is tailored to our use cases to solve our pain points.

A Typical Problem

Let’s take a Customer-Review Translation process as an example. It can be simplified as followed:

  1. Get a single review.
  2. Translate the review into multiple languages. The translation function is essentially a simple REST call to a translation server that can also be run concurrently for every translated language. Since network is prone to failure, add Time limit and Retry up to three times.
  3. Finally, if all translation tasks for every language are successful, combine the results and save. If any translation task fails, end or return the process and log the exception. (Partially translated review is of no use.)

Seems like a common problem, right?

Expressing Non-Blocking Function in Traveloka

Translating Function

The distinguishing feature of the translate function is its non-blocking signature as opposed to regular and blocking function. The reason is that the translate function calls a REST endpoint behind the scene, hence the function is more performant if done in a non-blocking way.

Code sample 1. Translate function signature and Callback Interface

The standard way of writing non-blocking code in Traveloka is providing a Callback as the last parameter to a function. The Callback itself is an interface serving as the Continuation function that will be invoked after the function being called(translate) has finished executing. The function being called has the option of continuing the process with either executing the onComplete function if it’s successful or the onException function if there is an error.

Aggregating the Continuation functions.

Recall that the process of combining the translation results will continue only after the text has been translated to all languages. Therefore, the Callback function should expect to accept not a single, but all translation results. However, since the translate function only accepts a single result callback, how could we aggregate all translation results?

As expected from a medium-to-large company, Traveloka tackled this challenge by developing various libraries to compose callbacks that convert a single callback into an aggregated version using generic instance methods such as List or Map .

Code sample 2. Various Callback composition Libraries

Combining Together

Code sample 3. Overall Translation implementation using Callback

For simplification, several notable requirements such as Timeout and Retry have been omitted, left as an exercise for the reader :).

Nevertheless, the code has shown several notable characteristics that makes asynchronous and non-blocking code difficult to understand and error-prone, namely:

  1. Bottom-up way of writing:
    The callback function is created before the translate function is called and thus, the code is written backwards. In contrast, regular and blocking code is written in a sequentially forward-looking and step-by-step manner.
  2. Error Propagation non-idiomatically through the onException function:
    The issue with this technique is that any uncaught exception in the function being called (translate) will not invoke the callback (continuation function) causing unresolved or hanging callback. This will create an undefined behavior on the aggregated callback library.
    Therefore, this technique requires meticulous attention in catching all possible exceptions in the function being called, to make sure that any exception will always be invoked and continue with onException instead of getting unresolved callback.
  3. Inflexible concurrency composition due to libraries over-dependency:
    MapCallbackComposition library is inflexible since it only produces binary translation results; either all successful or all exceptions. There is no flexibility to obtain successful translation results in the case of a partial exception. Every single exception will result in all Exceptions being returned.
    Different libraries are required for different behavior.
  4. Callback hell:
    Also referred to as the nested callback. It is one of the most prevalent problems when callbacks are executed sequentially.
    In the actual translation task (omitted in this problem), there is a need to obtain the original review language (a non-blocking callback function) before translation. Because of the sequential flow, it will result in a nested callback. Adding a retry will also result in callback hell since retry callback is run sequentially after the first onException callback.

Tackling the Challenge and Solving the Pain Points

There are various ways in expressing non-blocking functions and concurrency using programming language’s syntactic features such as Future, Reactive, or async-await. Nevertheless, based on my experience and experimentations, coroutine or suspending function is the one that solves the pain points elegantly due to its regular function signature.

Translating Function

Nowadays, I write the translate function using the suspending function that doesn’t block threads, but suspends a coroutine for performance reason. Similar to the original translate function, internally it uses a non-blocking REST client that might be written in a callback or the more idiomatic suspending function.

Code sample 4. Translate Function expressed as suspending function

The elegance of this approach is the similar signature to a regular blocking function that simply returns a String object as opposed to supplying callback as the last parameter.

Furthermore, as opposed to Javascript’s async function, suspending function is free from concurrency as it is up for the caller to decide and compose.

As you will see, these characteristics make adding and composing concurrency not only intuitive, but also adaptable and less prone to error.

Retry and Timeout

Code sample 5. Generic RetryWithTimeout Higher-Order-Function using Suspending function (modified from Elizarov’s)

Creating a generic retry function that accepts a lambda variable (block) is really simple and easy to understand because idiomatic constructs such as for-loop and try-catch (similar to a regular code) can be used. For a deeper explanation, please refer to Kotlin Coroutines, a deeper look.

Wrapping it up

Code sample 6. Overall Translation implementation using Coroutines

Let’s go through the function assignment and try-catch function definition above:

  1. Easier code comprehension:
    It is much easier now to comprehend the code’s overall sequential flow as opposed to the inverted flow of using callback.
  2. More adaptive concurrency and less library dependency:
    Because of the regular function signature, concurrency is written and composed with simple, explicit, and adaptable constructs (delay, retry, timeout, async, for-each, try-catch, or coroutineScope). 1Furthermore, the translate function results can now be written and transformed in a concise and functional way with generic instance methods such as map, filter, or reduce.
    This approach is easier to adapt and customize than using callback aggregation libraries that are made for specific concurrency composition.
    In the code sample 5 above, the try-catch encloses the whole translation tasks to catch any single translation exception immediately and suspend the overall translation tasks (similar to the MapCallbackComposition behavior). If there is a need to obtain the partially successful translation tasks (not only as a simple all success or exception), modifying the try-catch scope is all it needs, as opposed to using a different callback composition library.
  3. Error handling is safer by using idiomatic try-catch:
    Try-catch is safer than calling onException callback since try-catch will guarantee that any exception within the block will get caught, whereas commonly forgetting to call onException will result in a runtime undefined behavior because of unresolved callback.

In conclusion, writing non-blocking asynchronous code using Callback causes a lot of issues mentioned above and therefore, should be avoided. Nowadays, Coroutines is the preferred way of expressing asynchronicity since the resulting code is more concise, easier to grasp, adaptable, and safer. Hence, solving the pain points of using Callbacks.

Special thanks to the Backend Infra team, especially Fajrin Azis and Bobby Priambodo of the Kotlin group for the fruitful discussions in making asynchrony more developer friendly.

I am really fortunate to undertake this fascinating and open-ended concurrency algorithm challenge with my Universal-Search team at Traveloka; one of the largest online travel companies in Southeast Asia. If you’re a software engineer interested in developing state-of-the-art solutions to help millions of users in finding their next adventures, have a search at the opportunities on Traveloka’s careers page!

References

  1. https://en.wikipedia.org/wiki/Callback_(computer_programming)
  2. https://medium.com/@quyetvv/async-flow-from-callback-hell-to-promise-to-async-await-2da3ecfff997
  3. https://medium.com/@elizarov/blocking-threads-suspending-coroutines-d33e11bf4761
  4. https://medium.com/@elizarov/kotlin-coroutines-a-deeper-look-180536305c3f
  5. https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/
  6. https://ktor.io/clients/index.html#calls-requests-and-responses
  7. https://blog.jetbrains.com/kotlin/2018/10/kotlin-1-3/

--

--

Ismail Afiff
Traveloka Engineering Blog

Software Engineer, working on Information Retrieval. Passionate about Photography and Classical Guitar.