Debugging basics in Project Reactor

Benedikt Jerat
Digital Frontiers — Das Blog
8 min readFeb 11, 2022

Stack traces are undoubtedly a very useful instrument for tracing the processing flow in the event of an error. Error messages, invocation hierarchy, root causes, not necessarily in a very compact format, but it provides valuable information for us developers. Sooner or later, however, you will notice that stack traces become much less useful in reactive applications. Stack traces of 200 or more lines are not uncommon. But more importantly, the stack trace often does not contain a single hint to our self-written methods, even if these methods have actually been called.

Hard to believe?

Then let me illustrate an example that demonstrates this behaviour, why this is the case and what we can ultimately do to regain useful information about the invocation order.

But before we get started, the code examples can be found on our GitHub repository as well: https://github.com/dxfrontiers/coroutines-vs-reactive

The individual solutions are implemented as separate branches in the repository. Just have a look!

Stack traces straight from hell

In our example application, we have a typical three-layer architecture with a web layer, a service layer and a persistence layer.

I adjusted the call to the persistence, so that saving the entity fails deep in the persistence layer, producing a wonderfully realistic stack trace.

Imperative variant

Let’s first find out how the stack trace will look like, using a conventional blocking implementation with Spring MVC and Spring Data JPA:

When receiving a request to add a new character, we delegate this call to the CharacterService, which then persists the entity, if it did not exist yet.

The call to the CharacterRepository.save method fails, because the referenced house id does not exist, giving us a nice exception:

It may take a bit of scrolling, but sooner or later we stumble over our method calls. I have shortened the stack trace to the relevant lines, as even the imperative variant produces a stack trace of a good 100 lines in this example. Of course, the error message already contains enough information for us to detect the problem. But we cannot always rely on that.

What about the reactive implementation?

Reactive variant

Here, we use Spring WebFlux for the web part and Spring Data R2DBC for the persistence, making our application fully reactive down to the persistence:

The procedure is roughly the same. The controller delegates the call to the service layer:

The service then tries to persist the entity and fails by doing so. The generated stack trace is a whopping 166 lines long. I have tried to filter out the relevant elements. But that doesn’t really help much either:

We see some FluxSwitchIfEmpty, a MonoFlatMap, a FluxOnErrorResume on the top, okay… but these are just some operators somewhere in the reactive chain. In a very small code base, we might now be able to narrow the error down to a specific method call. In a larger code base, this is likely to be impossible.

And more importantly, what we do not find in the entire stack trace is any of our method calls that could lead us to the failed call to the save method. They simply evaporated into the void.

But why is that the case? Why are the stack traces in reactive chains so bloated and hard to analyze? To answer this question, let me introduce you to the three phases of the reactive pipeline:

  • Assembly phase
  • Subscription phase
  • Execution phase

Reactive programming strictly separates declaration and execution of our lines of code. When we write down our reactive pipeline, we define which steps to execute, not when to execute them. This is called the assembly phase. We chain the individual calls and at the end we are left with a publisher (Mono or Flux) to which we can subscribe. As long as we don’t do that, nothing will happen. You can try that out by omitting the “subscribe” at the end of the reactive chain. Only by subscribing to the reactive pipeline, an execution context is provided and schedulers take care of the next steps behind the scenes.

The subscription is then propagated backwards from the publisher up to the source and thus opposite to the direction we implemented. During this period we are in the subscription phase and the reactive pipeline is getting evaluated.

When we have reached the source, it is being signaled that we are now ready to process data. With this we launch the execution phase. Data is now flowing downstream through the pipeline from the source to the subscriber(s).

And this is where the problem arises. Stack traces are a representation of the runtime state. In asynchronous programming, this does not necessarily correspond to the order in which we originally wrote the code. This piece of information is solely available at assembly-time. Fortunately, the Reactor framework gives us some options to compensate for this mismatch. In the following sections, I will demonstrate these in short examples.

Printf Debugging

The most effective debugging tool is still careful thought, coupled with judiciously placed print statements.

Back to the basics! Brian Kernigham once wrote this quote in his paper “UNIX For Beginners” in 1979. The following technique is probably the closest to what many know as “printf debugging”.

The Reactor Framework offers two basic variants for this:

  • log()
  • doOnError()

log()

By adding log() to the reactive pipeline, all occurring reactive signals will be logged. We can add a category parameter to hardcode a (at best unique) white label to the call, so that we can search for this label in the logger output:

This will not change much about the stack trace. However, we receive additional log messages related to the logged reactive signals:

The first three lines are new. They show us that the reactive chain has sent a total of three signals: the initial subscription (onSubscription), a request to start data processing (request), and an error termination signal (onError).

Not really helpful on its own though. But at least we know which of our methods was called and which signals the framework sent. And the solution adds almost no performance overhead compared to some of the approaches presented later.

doOnError()

This method can be added to the reactive pipeline at specific points, where we expect calls to potentially fail. In our case, this is likely to be the call to the service layer:

Contrary to operators, like onErrorResume, doOnError does not handle the exception, it just adds some side effect that gets executed whenever an error is signaled:

Here, we gain some additional information about the given parameters to the REST call, but the possibilities are also relatively limited.

Hello operator, please give me number nine.

Up to this point, we have worked exclusively with operators that add additional information to the log output, but leave the stack trace untouched. What we actually want is the information we had at assembly-time to bridge the gap between execution and assembly phase.

onOperatorDebug()

This option allows us to activate a so-called “debug mode” via setting Hooks.onOperatorDebug(). Preferably, this is set in the application initialization phase, once and as early as possible. In a Spring Boot application, this can be accomplished by calling it in the main method, even before bootstrapping the Spring application:

This will activate the Reactor debug mode for the whole application. It enables a stack recorder that captures each and every interaction with the reactive operators. Additionally, it recovers an in-order stack trace by also capturing the declaration stack when an operator is getting instantiated. This is immense.

If we add the operator hook, we will receive the following additional output besides the original stack trace and error message:

As we can see, the entire execution flow has been reconstructed. We see the call to the persistence in line 9 and the call to our service in line 10. The operator switchIfEmpty in line 11 was passed and we didn’t catch and handle the exception in onErrorResume in line 12, because we only handle the CharacterNotFoundException. Therefore, the exception is propagated upwards, until it will be handled in the ExceptionHandlingWebHandler by Spring WebFlux directly.

The whole thing is actually starting to look like an useful stack trace again. However, despite being the Holy Hand Grenade in our debugging arsenal, the performance impact of this operator is not to be underestimated. It requires to capture the stack trace on every operator call, which adds up very fast. In production, you should therefore opt for any of the more lightweight alternatives like the following.

checkpoint()

In principle, onOperatorDebug() and checkpoint() are the same thing, except that checkpoint() must be explicitly added to the pipelines of interest. For example, if we expect the addCharacter method to fail sometimes, we might want to add a checkpoint at the end, so that any exceptions triggered upstream will be reflected in the assembly trace:

A call to the checkpoint operator has been added to the reactive chain at line 14. This will add the given description to the assembly trace, potentially helping the dev team to find out where the error occurred:

Actually, we see four checkpoints in the logs here. The reason for this is that Spring WebFlux and Spring Data R2DBC already add checkpoints at different levels of processing to enrich the error message.

The advantage of checkpoint() over onOperatorDebug is the almost non-existent performance overhead. However, we have to add the checkpoints to the reactive chain ourselves and the information gain is also quite limited.

You hear that? That is the sound of inevitability!

Let’s move on to the final tool for debugging in this blog post.

Did I say Holy Hand Grenade earlier? The ReactorDebugAgent is at least on par, just without the performance overhead of capturing the stack traces on each operator call. For this purpose, the ReactorDebugAgent uses a call-site bytecode instrumentation by enriching the bytecode with additional information.

For example, the following code snippet:

turns into:

Since capturing the stack traces is completely omitted here, this approach is relatively lightweight. All you need to do is adding the following dependency to your build:

And again initialize the ReactorDebugAgent as early as possible, for example in the main method:

The agent can also be added as a Java Agent to the VM parameters or integrated in the build as transformation plugin step. Kindly have a look at the official documentation for that matter!

Now that we added the ReactorDebugAgent, we will receive the following output for our example:

As before with Hooks.onOperatorDebug, we can follow the execution flow very well, except this time we see even more intermediate steps. Just check it out yourselves!

I hope I was able to give you a rough overview of the various possibilities that make debugging in the Reactive environment more pleasant again.

The operators log and doOnError are useful for logging specifically selected execution points. They cannot reconstruct the data flow, but rather enrich the logs by printing invocation parameters, intermediate data transformations, or other valuable information. ReactorDebugAgent, operator debug mode via Hooks.onOperatorDebug and checkpoints actually enrich the stack trace itself with varying degrees of information. The ReactorDebugAgent and the checkpoint operator are certainly suitable to be used without hesitation, even in a production environment. Hooks.onOperatorDebug should only be used during the active development phase due to the increased resource consumption for capturing the declaration stack and all the interactions with the reactive operators.

Ultimately, none of these methods are useless, even if some are more beneficial than others. However, it is always important to know what options you have. Eventually, in one form or another, you will use each of them in the course of time in your reactive application.

Thanks for reading! Feel free to comment or message me, when you have questions or suggestions. You might be interested in the other posts published in the Digital Frontiers blog, announced on our Twitter account.

--

--