Request tracing in microservices architecture — A Kleisli tale

Louis Forite
Mar 18, 2020 · 7 min read
Image for post
Image for post

Debugging microservices is a tedious task. A lot of components interact with each others and when a bug pops up identifying its root cause is complex. It gets even messier at scale.

It is difficult, but not impossible. Here, we will see how we can leverage a functional programming data type, namely Kleisli, to achieve such result.

The problem

Let’s consider the following architecture composed of four services: A, B, C and D.

A basic microservices setup

In this setup, service A calls service B, service B calls service C and D then aggregates the result to do some business processing and returns the result to A. So far so good.

Now consider the following scenario:

Image for post
Image for post

There is a problem in production. You have to debug it, so the first thing you do is checking the logs.

Service A logs

Line number 5 indicates there is a problem related to the service B http call. So you decide to check the logs of service B.

Service B logs

Line number 6 indicates there is a problem to the service D http call. So you have to check the logs of service D.

Service D logs

Line number 6, we have finally found the bug. It turns out there was a connection issue with the database. Sure we are now on our path to fixing the issue, but we took the long and cumbersome road. We had to go through the logs of three different services (that we might not even own) to try and correlate log traces manually (checking that the timestamps match and drawing conclusions that this line is correlated to our initial problem).

By the way: did you notice? The timestamps in the logs don’t even match. The time in the service B is off by 2 hours, which makes it even harder to find the bug.

This example is also mercifully simple, we are only managing four microservices here. Imagine what would happen with a lot more.

We need to avoid this much interpretation-heavy and error-prone exploration, but how?

The solution

A simple solution would be to add a correlation id to the logs, one that is propagated to all the subsequent calls. It would look like this:

Service A logs with correlation id

Right next to the timestamp, a generated request id has been added.

Service B logs with correlation id

With this technique, it is easy to grep all the logs matching an id and track down the source of the bug. We can easily see that the correlation id 1EgBpH1p matches with the one from the service A logs.

Service D logs with correlation id

Neat, but how do we implement such a solution? Let’s explore the tools we have at our disposal.

JVM Tools

The JVM ecosystem is pretty rich when it comes to tooling. Some tools exist to instrument code and to able to correlate log traces.

Generally speaking, those tools require a lot of specific knowledge. It means you need to invest time (sometimes a lot) in learning something that might not be reusable. It is also very library / framework specific.

Pros

  • works out of the box if the framework and the libs fit

Cons

  • side effects (it might cause OOM exceptions)
  • Hard to setup
  • framework specific
  • specific knowledge

Thread MDCs

Thread MDCs (Mapped Diagnostic Context) are thread-local HashMaps and they store some context associated with the current thread. As long as a task is running on the same thread, you can access this context.

Thread MDC usage example

It is a good solution but it falters when you have asynchronous computations. There, the correlation id will not get propagated to the next thread (it won’t work with Future, IO, Task…). To make it work you would have to copy the context which is an expensive and tricky thing to do.

Pros

  • simple to setup
  • code changes are only located where the correlation id is needed

Cons

  • does not work in a multithreaded context or needs tricky adaptation
  • not test friendly

Argument passing

A more straightforward solution would be to just pass the correlation id in each method call such as:

Argument passing example

Pros

  • straightforward to implement
  • test friendly
  • work in a multithreaded environment

Cons

  • spreads everywhere as all the methods will end up needing it
  • adds a lot of noise in the method declaration and calling site

Implicit argument passing

Implicit argument passing example

Pros

  • straightforward to implement
  • work in a multithreaded environment
  • method calling do not require to explicitly pass the parameter

Cons

  • still noisy
  • spreads everywhere as all the methods will end up needing it

Kleisli

Heinrich Kleisli was a swiss mathematician. He has been very influential in the field of category theory and you can find more about him on wikipedia. I am not really well versed into category theory so I am not going to discuss much of the topic here.

What matters is that Kleisli’s work has been translated into computer science and we can find the following data structure in the Cats / Scalaz libraries:

Taken from Cats / ScalaZ

Let’s break it down:

  • F[_] is the Monad / Applicative / Functor of your choice such as Future or Option or IO .
  • A is what I call a “context” (or dependency)
  • B is the type you will get within your F

To keep it simple, a Kleisli is a data structure that encapsulates a function that given an A results in a F[B] . To get an F[B] you have to provide an A. So very naively explained, we introduce a dependency between the computation F[B] and A. Example:

if F[_]is an IO and A is a CorrelationId and and B is a String then we get

Kleisli[Option, CorrelationId, String](run: CorrelationId => IO[String])

To execute the run function, we have to provide a CorrelationId (the “context”) and we will get an IO[String] in return.

Kleisli properties

The cool thing about Kleisli is that it keeps the properties of your F[_] . It means:

  • If F[_] is a Functor then Kleisli[F[_], _, _] is also a Functor
  • If F[_] is an Applicative then Kleisli[F[_], _, _] is also an Applicative
  • If F[_] is a Monad then Kleisli[F[_], _, _] is also a Monad

Concretely it means that you will have access to map , ap (or product / zip), flatMap functions from your Kleisli[F[_], _, _] construct depending on the properties of your F[_] just as you would do with Future , Option and IO .

Pretty cool huh ?

Kleisli operations

There are three main operations with Kleisli: create a Kleisli, run a Kleisli, ask out of a Kleisliits context.

Creating a Kleisli
Running a Kleisli
Asking the “context” out of a Kleisli

That is a lot of information and you might be wondering how it is helping us track our log errors. So let’s get back on track.

Solving the correlation id problem with Kleisli

As we mentioned earlier we can see Kleisli as a way to introduce a dependency between a value and a computation. Another way to consider Kleisli is to view it as a way to contextualise a computation. For our example, we can consider that CorrelationId is the “context” in which an http request runs into. Which means that to get the result of a request, a CorrelationId is required. So all our methods will have the form of Kleisli[IO, CorrelationId, _] since everything runs “in the context of” a request.

Putting it all together it would look like this:

A logger using Kleisli
Services using Kleisli

Since IO is a Monad, we can still map / flatMap on a Kleisli[IO, _, _] . If you look closely, this version is very close from the one with implicit parameters. The body of the createA method is similar !

Retrieving or generating the correlation id
Flowing the correlation id to other services

Pros

  • test friendly (just provide an a context and get a regular IO
  • work in a multithreaded environment

Cons

  • still verbose

The shorte(r/est) version of Kleisli

We’ve solved most our problems but we’re still writing a lot of code. To cut on repetition, you can make an alias out of it so it removes the noise all around your application.

If you want to be able to use all the Kleisli methods with the shorter name RIO you have to alias them once such as:

Alias Kleisli operations

I personally like the RIO name because it is short and is still meaningful. RIO stands for Rich IO (implying it comes with some context).

Replacing Kleisli[_, _, _] by RIO[_]

You ca find some examples here:

Looking at the future

Expressing a dependency between a computation and a context is a very common thing to do. So common that Dotty (eventually Scala 3) has a built-in construct called Context Function Type to address this need. It introduces a new operator ?=>. Here is a snippet of what you you will be able to do in the future :

I expect this feature to gain a lot of traction for all the reasons we mentioned in this article. Is RIO dead ? Not quite:

Leveraging the new Dotty contextual function

Find more about it here: https://dotty.epfl.ch/docs/reference/contextual/context-functions.html

Summary

Kleisli is a powerful construct to express a dependency between a context and a computation. Since it preserves the properties of your F it is the perfect building block for your application. In this article we covered the log correlation issue but you might extend this example to add more “context” to your computations such as the current “user” executing the query… In the future, I expect the new Dotty feature “contextual function” to gain a lot of traction and to be a replacement such use case.

iAdvize Engineering

Publications from the iAdvize engineering team :)

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store