Making Word Crack Mix 2: don’t halt, don’t catch fire

Reporting and handling errors in a reactive application

Rodrigo Rearden
etermax technology

--

Select some data from an event source to a more fitting type, take only events Where some conditions are met, Zip this with that, Select again… and then SelectMany a final request from a service.

It fails miserably

The error propagates out of the SelectMany and everything fails in cascade, a lot of subscriptions are lost, the application behaves awkwardly and becomes unresponsive, and then error logs are cryptic and hard to follow.

If you have ever written a single SelectMany in UniRx (or flatMap in any other Rx library outside .NET) to request anything from a remote service (or anything that may actually fail), surely this must have happened to you, as well as it happened to us.

This article explains how we managed error propagation to prevent the app from breaking.

Our architecture setup

To recap, last time we showed that our domain is written with a reactive interface: it receives actions as Observers, and emits events as Observables. We used the actual gameplay and board as examples; now we will use our shop’s domain as the starting point. In Word Crack Mix 2 we can buy coins as an In App Purchase, and trade them for power ups, so our interface will reflect those operations as actions, and the changes performed by those operations will be reported as events.

Word Crack Mix 2 Shop’s Core Domain

One of the immediate consequences of working this way, is that it implies that subscriptions between actions or events, and their respective presenters, should almost always be up to ensure that no events are lost.

Typical bindings in our application are declared in a context, like so:

Bindings between our Shop’s Core Domain and the presentation layer

Error behavior in Rx

In Rx errors are typed as Exceptions, and propagate from their emission up to subscriptions. If any error reaches any operator or subscription, it is cancelled and the whole pipeline is trashed.

In our setup, this means that any error propagating from the observable events in our Shop to the methods in our presenters, is actually a bug and has broken the core internally, cancelling every operator and subscription it went through.

So:

  • Our architecture demands that subscriptions are always up.
  • Rx unsubscribes each and every operator reached by any error.

This implies that errors must be either caught as soon as possible, or some kind of re-subscription policy must be used somewhere, which will be discussed further on in this text. In the next section, we show our particular approach to the issue by separating error reporting from error handling, and how and where to do both.

Reporting errors

How we built a primitive to report failures

Since errors must not reach the subscriptions, we can start by using UniRx’s DoOnError and CatchIgnore:

Our first iteration on error reporting

DoOnError allows us to do something when the pipe fails, then we ignore all errors with CatchIgnore, preventing the subscription from getting cancelled (but completing it anyway). DoOnError and CatchIgnore do not exist in most other Rx libraries, but both can be easily implemented using Catch. We may cover this in another article.

This solution is a bit clumsy and could provide better semantics. With this in mind, we refactor out an extension method:

The pipeline just finishes with an Empty

This is better, but we want to separate reporting from handing, so instead of a callback, we want to use an error bus. For this purpose, ReportErrors will receive an Observer instead. The motivation for this approach will become clearer later, in the section about handling errors.

Subscription example to a failing pipeline
First iteration of an abstract reporting primitive

With this basic setup we have now separated error reporting from error handling, and can now deal with where to report our errors.

Where to report failures

As said before, either errors are caught ASAP, or a re-subscription is used to rebuild the trashed pipeline. For Word Crack Mix 2 we chose the first solution, which suggests that errors must be reported in our actions just before the entry points to the service layers.

One important note is that it is convenient to have errors reported in actions , since they are almost always part of the use case that we implement, and part of the language that our domain logic defines, implying of course, that errors must be considered in unit tests. Keeping in mind that our architecture demands subscriptions to be always up, those same tests should also expect successful events that can occur after an error was reported.

First, let us go back to the shop example and implement an action:

Implementation of a TradePowerUp action

Trade actions enter our domain through TradePowerUp. A repository checks if there are enough coins for the operation, then a request is sent to a shopService and after that the response is mapped to two different events.

Since the failing point in this action is the operation with the shopService, an error is reported, should it occur.

Report errors just after a request that may fail

The errorBus observer should be injected in the constructor as a dependency; in our game, all domains report to the same error bus.

Handling errors

How and where

Each error emitted through the bus will have its own type. We can use that to differentiate them and present the user with the correct information. To achieve this through a rather simple semantic, we propose the following in the context that binds presenters with the actions and events:

Handling errors coming from the error bus

Handle “catches” an error that matches its type parameter and removes it from the bus. Then an action parameter handles it, every other unattended error reaches the Subscribe part, and is handled by the subscribed action. Handle<E> is rather simple to implement:

Implementation of an error handling primitive

The type parameter is the error we want to match, errors matching E are forwarded to the errorHandler action, and the rest continue through the pipeline.

A final touch: errors as part of the language

So far so good, but the watchful eye may have noticed something weird on how we report errors: we’re basically forwarding any error raised in the service layers to a bus that talks with presenters. This is not ok.

The service layer defines the language (the interfaces) used to communicate between the domain (our business logic) and the infrastructure (persistent memory, native calls, rest services). So, an error defined in these interfaces or even an error defined in the infrastructure, should not be raised up to the presenters, since they should communicate with the use cases in a completely different language.

For example, when trading an item, our Shop’s presenter should not care whether an error coming from the infrastructure is an http error, or a connection error, or any other, the error should just be TradeFailed.

Mapping errors to the domain’s language

The callback maps an error coming from the service layer, to an error defined in the domain, if a more granular discrimination should be needed between errors, that should happen in the callback. The new implementation for ReportErrors just maps the error:

Include error mapping in the ReportErrors method

Wrapping up

In order to work with errors in our reactive application, we first separate reporting from handling.

We report all errors to an error bus, which is done in the actions immediately after making a request to a service. If the request fails, it won’t break the whole rx pipeline, and everything will be ok for the next call, an error from the service layer can be mapped to an error defined in the domain.

We handle errors raising from the error bus in the presentation layer. Then, we can match the presentation strategy by the type of the error and any unmatched errors are handled by a default case defined in the subscription.

That is all for now!

--

--