Most of the applications have multiple internal and external dependencies and their readiness & liveness statuses are important for service quality and SLA.
When Akka Actor pattern is used, ActorSystem will also be one of these dependencies and its lifecycle can also need to be checked as the other dependencies.
This article aims to show how to define an ActorSystem Health Check to track its health status at runtime by defining a separated/isolated ActorSystem. Let’ s have a look for sample implementation as follows:
- JDK v1.8
2. Scala v2.13
3. Akka v2.5.23
1- We can define 2 Actor Systems.
app-actor-system: is used for service’ s computation layer.
app-health-actor-system: is dedicated ActorSystem to track all the service dependency health checks(e.g: service itself, its dependencies(Database, Zookeeper, other ActorSystem(s) etc…)):
HeartbeatActor running on
app-actor-system can be defined. This actor will be used to decide if
app-actor-system is alive or not:
StackOverflowErrorActor running on
app-actor-system can be defined. This actor is used to simulate fatal-error case on
app-actor-system. It waits
TriggerStackOverflowError message and then will throw
app-actor-system’ s all expected statuses such as UP, UNKNOWN and DOWN:
app-actor-system status by sending
HeartbeatActor and expects
HeartbeatResponse in timeout (5 secs). If it gets consecutive
FailureThreshold (3 times), it will set
app-actor-system status as DOWN, otherwise UP or UNKNOWN in the light of
FailureThreshold can also be defined as property so it can be managed by property file in need.
ActorSystemHealthApp is defined to run the sample application. It initializes
ActorSystemHealthChecker and schedules the following runnables.
ActorSystemCheckRunnable:: runs every 10secs and logs
ErrorOnActorSystemRunnable: is scheduled once and is used to trigger
In this example, standalone application has been used. Also, ActorSystem’ s health statuses can be exposed through
For example: What happens when a fatal-error(e.g:
StackOverflowError) occurs on an actor?
If a fatal-error occurs on an actor, its ActorSystem will be shut down as default, and then all actors managed by this ActorSystem will be stopped as well. In this case:
a- As the first-step, Root-Cause-Analysis (RCA) of fatal-error will need to be done and proper fix should be applied.
b- Akka supports
akka.jvm-exit-on-fatal-errorproperty. This can be set
off (depends on the case). However, when fatal-error occurs, JVM will exit as default. This(default) behaviour can be preferred most of the use-case. However, there may still be some (rare) use case to not good fit. For example: after first service JVM exits, if same operation is applied to next service instance, it may also get down and this can also cause for all other nodes. To avoid for these kind of cases and keep problem scope limited (specially, multi-tenanted environment and when request rooting policy is consistent-hashing),
jvm-exit-on-fatal-error property can be set as
off. In this case, ActorSystem will be shut down but JVM will not be exited at once. After catching the DOWN Status on ActorSytem, required actions can be applied.
akka.jvm-exit-on-fatal-error property can be set as
off in order to simulate DOWN status and avoid to spread the problem(e.g: Fatal Errors =>
OutOfMemoryError ) to the other service instances.
or this can set programmatically as follows:
8- Please find the application trace as follows:
9- Please see the complete example as follows: here
In this example project, we implemented a sample Akka ActorSystem Health check. Akka also supports readiness and liveness checks through akka-management. Its documentation can also be useful to have a look.