Quarkus and the MicroProfile Fault Tolerance API

Samuel Catalano
The Fresh Writes
Published in
7 min readJan 25, 2024

--

Using Quarkus and the MicroProfile Fault Tolerance API together, you can enhance the resilience of your microservices by incorporating features like circuit breakers, timeouts, retries, and fallbacks. Here are the steps to use the MicroProfile Fault Tolerance API with Quarkus:

We require two different applications for the testing process. The first application will act as a server and simulate an API that takes a long time to return the data, which in our case is just a simple string. The second application will act as a client that consumes this API and implements fault tolerance.

But at first, what is Fault Tolerance?

Imagine you’re building a tower out of building blocks. If one block gets knocked over, the whole tower might collapse. Fault tolerance is like adding extra support or backup blocks so that even if one or a few blocks fall, the tower can still stand. In the world of technology and systems, fault tolerance means designing things in a way that if something goes wrong (like a part of a computer system fails), the overall system can keep working without completely breaking down. It’s like having a safety net to prevent the whole system from crashing when something unexpected happens.

Some techniques can be applied to prevent this from happening, such as Redundancy, Load Balancing, and Checkpoints, among others. One of these techniques is called Circuit Breaker.

What is a Circuit Breaker?

Imagine you have a bunch of appliances connected to an electrical circuit in your home. A circuit breaker is like a safety switch that automatically turns off the electricity if something goes wrong, like if there’s too much electrical current flowing through the circuit. It’s there to prevent electrical fires and protect your appliances.

So, if there’s a problem, the circuit breaker “breaks” the circuit by stopping the flow of electricity, just like turning off a light switch. This helps keep you and your home safe from electrical hazards. After fixing the issue, you can flip the circuit breaker back on to restore power. It’s a safety feature in electrical systems!

Now that we know the concepts, let’s code

Let’s kick things off with our server application, which we’ll playfully refer to as “slow-api-example

mvn io.quarkus.platform:quarkus-maven-plugin:3.6.7:create \
-DprojectGroupId=com.slow.api \
-DprojectArtifactId=slow-api-example \
-Dextensions='resteasy-reactive,rest-client-reactive-jackson'

In this application, we simply require a Resource class that contains a basic String for our testing data.

package com.slow.api;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

@Path("/api/slow")
public class SlowResource {

@GET
@Produces(MediaType.TEXT_PLAIN)
public String hello() throws InterruptedException {
Thread.sleep(3000L); // this simulate a delay of 3s
return "Hi there, I'm a slow API \n";
}
}

For testing purposes, we will change the port of this application to run on 8081 since we will need both applications running and do not want a port conflict. In your application.properties:

quarkus.http.port=8081

To start our application, simply run the command in the terminal:

mvn compile quarkus:dev

Having accomplished that, let’s proceed with our client application, affectionately referred to as “client-fault-tolerance

mvn io.quarkus.platform:quarkus-maven-plugin:3.6.7:create \
-DprojectGroupId=com.client.fault.tolerance \
-DprojectArtifactId=client-fault-tolerance \
-Dextensions='resteasy-reactive,rest-client-reactive-jackson,smallrye-fault-tolerance'

In this application, a requisite component is a RestClient, encompassing the URI to our server and the method responsible for data retrieval. Additionally, a Resource class is essential for externalizing this API call.

RestClient class:

package com.client.fault.tolerance.client;

import jakarta.ws.rs.GET;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

import org.eclipse.microprofile.faulttolerance.CircuitBreaker;
import org.eclipse.microprofile.faulttolerance.Fallback;
import org.eclipse.microprofile.faulttolerance.Retry;
import org.eclipse.microprofile.faulttolerance.Timeout;
import org.eclipse.microprofile.faulttolerance.exceptions.TimeoutException;
import org.eclipse.microprofile.rest.client.inject.RegisterRestClient;

import java.time.temporal.ChronoUnit;

@RegisterRestClient(baseUri = "http://localhost:8081/api/slow")
public interface ApiToleranceClient {

@GET
@Produces(MediaType.TEXT_PLAIN)
@Timeout(unit = ChronoUnit.SECONDS, value = 2)
@Retry(delayUnit = ChronoUnit.SECONDS, maxRetries = 2, delay = 1)
@Fallback(fallbackMethod = "defaultFallback")
@CircuitBreaker(delayUnit = ChronoUnit.SECONDS, requestVolumeThreshold = 4,
failureRatio = .75, delay = 3,
successThreshold = 2
)
String getDataFromSlowAPI();

default String defaultFallback() {
return "It seems that the server is down! We need to define a better response to the final user \n";
}

}

Resource class:

package com.client.fault.tolerance.resource;

import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.core.MediaType;

import com.client.fault.tolerance.client.ApiToleranceClient;

import org.eclipse.microprofile.rest.client.inject.RestClient;

@Path("/api")
public class ApiToleranceResource {

@RestClient
@Inject
ApiToleranceClient apiToleranceClient;

@GET
@Produces(MediaType.TEXT_PLAIN)
public String getDataFromSlowAPI() {
return apiToleranceClient.getDataFromSlowAPI();
}
}

Understanding our code

Let’s go further into the annotations present in the RestClient class as well as their values to understand the concepts we learned before.

@Timeout(unit = ChronoUnit.SECONDS, value = 2)

Do you remember that earlier in our server application, we defined a call to Thread.sleep(3000L) to simulate a delay of 3s? So, to simulate our fault tolerance, we are saying that the maximum time our request can wait is 2s

This means that after 2s the application will return a TimeoutException :


org.eclipse.microprofile.faulttolerance.exceptions.TimeoutException: com.client.fault.tolerance.client.ApiToleranceClient$$CDIWrapper#getDataFromSlowAPI timed out
at io.smallrye.faulttolerance.core.timeout.Timeout.timeoutException(Timeout.java:91)
at io.smallrye.faulttolerance.core.timeout.Timeout.doApply(Timeout.java:78)
at io.smallrye.faulttolerance.core.timeout.Timeout.apply(Timeout.java:30)
at io.smallrye.faulttolerance.FaultToleranceInterceptor.syncFlow(FaultToleranceInterceptor.java:255)

Goof stuff! Despite the error, this is a good sign that our timeout test is working as expected.

@Retry(delayUnit = ChronoUnit.SECONDS, maxRetries = 2, delay = 1)

Well, a timeout occurred, correct? But this could have been a momentary unavailability of our server, which could have been caused, for example, by the deployment of a new version or something similar. Doesn’t it seem like we should try a few more times before claiming the server is down?

That’s exactly what the @Retry annotation does, where maxRetries indicate the number of retries our client API will try to call the server API before launching an Exception and delay indicates the time between these attempts.

@Fallback(fallbackMethod = "defaultFallback")

Ok, we made the attempts and the server continues to take a long time to respond and that’s why we are receiving the TimeoutException. I suppose that you agree that would be horrible to return this exception to the customer, correct? Or maybe we could call an alternative API or workflow. Great, that’s the function of the @Fallback annotation.

There are a few options we can use @Fallback but it’s a discussion for a specific article. By default, we can configure what we call a fallbackMethod which defines a method that will be called when we get an error, in our case, the TimeoutException

An important point is that the fallbackMethod must have the same signature as the method that will be applied which in our case is a String.

Create a new method giving the name as your preference and set it as the value. Example: fallbackMethod = “defaultFallback”:

default String defaultFallback() {
return "It seems that the server is down! We need to define a better response to the final user \n";
}

After we get the error, this method will be called and then return a friendly String instead of a block of code containing an Exception.

@CircuitBreaker(delayUnit = ChronoUnit.SECONDS, 
requestVolumeThreshold = 4,
failureRatio = .75,
delay = 3,
successThreshold = 2
)

Last but not least, our CircuitBreaker. Do you remember the idea is to interrupt the flow of work, like turning off a light switch until we fix the problem? Great, so how do we do that?

The attribute requestVolumeThreshold indicates the number of requests that we will analyze in order to be sure that we have a problem. In our example, we set 4 as the value, which means that we will analyse 4 requests before affirming that we have a problem.

The attribute failureRation indicates the percentage of failures that we will consider to affirm that we are indeed having a problem. we set .75 as the value, which means if 75% of the requests (3 of 4) have a problem, then we have a problem in fact.

The attribute delay indicates the time between these attempts.

Finally, the attribute successThreshold indicates how many requests made successfully we need to confirm that the problem has been resolved and we can turn on the light switch again (close our open circuit).

How to test everything?

Once you have created both applications, run them using the mvn compile quarkus:devcommand. You can either use an API platform like Postman or Insomnia to hit the endpoint http://localhost:8080/api using a GET HTTP method or a cURL command via terminal:

while true
do curl localhost:8080/api
sleep 2.0
done

To effectively see the differences, change the values of Thread.sleep(3000L) and the other properties discussed above.

Conclusion

This article demonstrates how to use Quarkus and the MicroProfile Fault Tolerance API to enhance the resilience of your microservices. By incorporating features like circuit breakers, timeouts, retries, and fallbacks, you can ensure that your system continues to function even when individual components experience failures.

It tries to provide a step-by-step guide for building a server application with a simulated delay and a client application equipped with fault tolerance mechanisms. It explains each annotation used in the client code and its impact on the behaviour of the system.

Key takeaways:

  • Fault tolerance is crucial for building reliable and resilient microservices.
  • The MicroProfile Fault Tolerance API offers various tools like timeouts, retries, fallbacks, and circuit breakers to implement fault tolerance strategies.
  • Quarkus facilitates easy integration of the MicroProfile Fault Tolerance API in your applications.

By understanding and implementing these concepts, you can build robust microservices that can withstand potential failures and deliver a consistent user experience.

The full source code of our examples here is over on GitHub.

Happy coding ;)

--

--

Samuel Catalano
The Fresh Writes

Samuel is a Software Engineer from Brazil with main interests in Java, Spring Boot, Quarkus, Microservices, Docker, Databases, Kubernetes, and Clean Code