Making Word Crack Mix 2: don’t halt, don’t catch fire
Reporting and handling errors in a reactive application
Select some data from an event source to a more fitting type, take only events Where some conditions are met, Zip this with that, Select again… and then SelectMany a final request from a service.
It fails miserably
The error propagates out of the SelectMany and everything fails in cascade, a lot of subscriptions are lost, the application behaves awkwardly and becomes unresponsive, and then error logs are cryptic and hard to follow.
If you have ever written a single SelectMany in UniRx (or flatMap in any other Rx library outside .NET) to request anything from a remote service (or anything that may actually fail), surely this must have happened to you, as well as it happened to us.
This article explains how we managed error propagation to prevent the app from breaking.
Our architecture setup
To recap, last time we showed that our domain is written with a reactive interface: it receives actions as Observers, and emits events as Observables. We used the actual gameplay and board as examples; now we will use our shop’s domain as the starting point. In Word Crack Mix 2 we can buy coins as an In App Purchase, and trade them for power ups, so our interface will reflect those operations as actions, and the changes performed by those operations will be reported as events.
One of the immediate consequences of working this way, is that it implies that subscriptions between actions or events, and their respective presenters, should almost always be up to ensure that no events are lost.
Typical bindings in our application are declared in a context, like so:
Error behavior in Rx
In Rx errors are typed as Exceptions, and propagate from their emission up to subscriptions. If any error reaches any operator or subscription, it is cancelled and the whole pipeline is trashed.
In our setup, this means that any error propagating from the observable events in our Shop to the methods in our presenters, is actually a bug and has broken the core internally, cancelling every operator and subscription it went through.
So:
- Our architecture demands that subscriptions are always up.
- Rx unsubscribes each and every operator reached by any error.
This implies that errors must be either caught as soon as possible, or some kind of re-subscription policy must be used somewhere, which will be discussed further on in this text. In the next section, we show our particular approach to the issue by separating error reporting from error handling, and how and where to do both.
Reporting errors
How we built a primitive to report failures
Since errors must not reach the subscriptions, we can start by using UniRx’s DoOnError and CatchIgnore:
DoOnError allows us to do something when the pipe fails, then we ignore all errors with CatchIgnore, preventing the subscription from getting cancelled (but completing it anyway). DoOnError and CatchIgnore do not exist in most other Rx libraries, but both can be easily implemented using Catch. We may cover this in another article.
This solution is a bit clumsy and could provide better semantics. With this in mind, we refactor out an extension method:
This is better, but we want to separate reporting from handing, so instead of a callback, we want to use an error bus. For this purpose, ReportErrors will receive an Observer instead. The motivation for this approach will become clearer later, in the section about handling errors.
With this basic setup we have now separated error reporting from error handling, and can now deal with where to report our errors.
Where to report failures
As said before, either errors are caught ASAP, or a re-subscription is used to rebuild the trashed pipeline. For Word Crack Mix 2 we chose the first solution, which suggests that errors must be reported in our actions just before the entry points to the service layers.
One important note is that it is convenient to have errors reported in actions , since they are almost always part of the use case that we implement, and part of the language that our domain logic defines, implying of course, that errors must be considered in unit tests. Keeping in mind that our architecture demands subscriptions to be always up, those same tests should also expect successful events that can occur after an error was reported.
First, let us go back to the shop example and implement an action:
Trade actions enter our domain through TradePowerUp. A repository checks if there are enough coins for the operation, then a request is sent to a shopService and after that the response is mapped to two different events.
Since the failing point in this action is the operation with the shopService, an error is reported, should it occur.
The errorBus observer should be injected in the constructor as a dependency; in our game, all domains report to the same error bus.
Handling errors
How and where
Each error emitted through the bus will have its own type. We can use that to differentiate them and present the user with the correct information. To achieve this through a rather simple semantic, we propose the following in the context that binds presenters with the actions and events:
Handle “catches” an error that matches its type parameter and removes it from the bus. Then an action parameter handles it, every other unattended error reaches the Subscribe part, and is handled by the subscribed action. Handle<E> is rather simple to implement:
The type parameter is the error we want to match, errors matching E are forwarded to the errorHandler action, and the rest continue through the pipeline.
A final touch: errors as part of the language
So far so good, but the watchful eye may have noticed something weird on how we report errors: we’re basically forwarding any error raised in the service layers to a bus that talks with presenters. This is not ok.
The service layer defines the language (the interfaces) used to communicate between the domain (our business logic) and the infrastructure (persistent memory, native calls, rest services). So, an error defined in these interfaces or even an error defined in the infrastructure, should not be raised up to the presenters, since they should communicate with the use cases in a completely different language.
For example, when trading an item, our Shop’s presenter should not care whether an error coming from the infrastructure is an http error, or a connection error, or any other, the error should just be TradeFailed.
The callback maps an error coming from the service layer, to an error defined in the domain, if a more granular discrimination should be needed between errors, that should happen in the callback. The new implementation for ReportErrors just maps the error:
Wrapping up
In order to work with errors in our reactive application, we first separate reporting from handling.
We report all errors to an error bus, which is done in the actions immediately after making a request to a service. If the request fails, it won’t break the whole rx pipeline, and everything will be ok for the next call, an error from the service layer can be mapped to an error defined in the domain.
We handle errors raising from the error bus in the presentation layer. Then, we can match the presentation strategy by the type of the error and any unmatched errors are handled by a default case defined in the subscription.
That is all for now!