Reactive programming with Spring Boot and Webflux

Rareş Popa
8 min readJul 23, 2019

--

A decade ago we had monolith apps that had tens of servers, took seconds to respond, hours of being offline for maintenance and needed huge spaces for storage. But today we live in a fast-paced world, one were we are building apps using microservices that embrace distributed systems, deployed on everything from mobile devices to cloud-based clusters.

This new kind of app should be highly scalable, use resource efficiently and have fast response time. And by the way users expect milisecond responses and 100% uptime. Kewl.

Let’s take a look at the way we did things.

The traditional api handles the concurrent requests with the thread per request model, which is limited by the thread pool size. The default for a Tomcat server is 200 connections. Each thread eats up its fare share of memory, so as the thread pool size gets higher more memory is being used, which ultimately leads to poor performance issues. We don’t like that.

The most common strategy for handling this problem is spinning up more instances of the app, aka horizontal scaling. But creating new instances of the whole app also means deploying parts of the app we won’t use and if the app is deployed in the cloud, it means we are burning money. Well I don’t know about you, but I like money.

Traditional REST API are blocking and synchronous. Let’s consider a HTTP request to the server in which we ask for a list of users. When the server takes a request, a new servlet thread gets assigned to it, makes some processing and it makes a db call. Now meanwhile the thread that was allocated to that request will remain blocked until the db returns a response. Uggh! From a scalability point of view at a certain point in time we can serve concurrently 200 requests. The 201th request will have to wait until a thread it’s freed and can take up the request.

We have to do better than that.

What if we had threads processing the request and instead of waiting on the response of db, we are gonna ask to be notified once it is done. Meanwhile the freed up thread can serve the next request. Once the db has the data it can make the callback and the same thread or any other thread that is free can send the data back to the user. That sounds way better. This mechanism can be used to avoid servlet threads going into a wait state.

The mechanism is called an event loop. Whenever there is a request we do same basic processing and we immediately delegate any IO operations to the DB Driver as an event. This in turn is gonna fire an event, whenever he is ready, that it has the data. Any of the thread from the thread pool can take the data and send the response back. This leads to more efficient use of CPU, more concurrent requests.

But we still have some problems to solve, like the size of the response. What if the data we had requested is more than we can handle, that could possibly lead to a crash of the front end app.

If only there was a way…did someone said reactive programming?

Reactive programming is about non-blocking apps that are async and event-driven and require a small number of threads to scale. Also has the key for the latter problem, it provides a concept of backpressure which is a mechanism to ensure that producers don’t overwhelm consumers. For example in a pipeline made of reactive components that goes from the data source up to the HTTP socket, when the client can’t handle the volume of data, the data repo slows down or stops until the client catch up.

So reactive programming is a new programming paradigm, which is async and non-blocking. The data is going to flow as an event/message driven stream. The thread is not being blocked and as soon as the data becomes available it is sent through the stream to the client. Once all the data is retrieved, on the happy path we get an event onComplete() and if an exception is being thrown while the process is undergoing than we get an onError() event. Also we need to know about onNext() event which notifies the observer that a new element is present in the sequence.

The world we live needs standards, to keep us away from chaos. Well for that reason we have Reactive Streams specifications, which provide a standard for async stream processing with non-blocking backpressure. These standards need to be implemented by a reactive library.

The API it’s made up of 4 core interfaces: Publisher, Subscriber, Subscription and Processor. Let’s see their definitions.

  1. Publisher — a publisher is the data source(database/API call/etc.) of a potentially unbounded number of sequenced elements and it publish them according to the demands of the subscriber. It has only one method:
public interface Publisher<T> {
public void subscribe(Subscriber<? super T> s);
}

2. Subscriber — aka the consumer. The purpose of this interface is to establish that is the responsibility of the subscriber to decide when and how many elements it is able and willing to receive. It has 4 abstract methods:

public interface Subscriber<T> {
public void onSubscribe(Subscription s);
public void onNext(T t);
public void onError(Throwable t);
public void onComplete();
}

3. Subscription — represents a unique relationship between the Subscriber and the Publisher. It comes with 2 methods:

public interface Subscription {
public void request(long n);
public void cancel();
}

4. Processor — as the name implies it represents a processing stage, which is in the middle ground between Subscriber and Publisher and must obey the contracts of both.

public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {
}

Let’s understand how all of this works with the following diagram:

The subscriber is making a method call subscribe() from the Publisher interface, with the subscriber instance as an argument. The subscription is confirmed by the publisher by sending it to the consumer. Now that the subscriber has the Subscription instance, it can request elements by invoking the request method. If it doesn’t pass any value for the size of the response, than by default the publisher is gonna return all of the data, by using the onNext() to send the items until it finishes all of them.
By specifying the size of the response, we can see a simple example of backpressure, in which the consumer controls the size of the response. Once the data is retrieved the publisher is sending an onComplete() event.
The subscriber also has the option of cancelling the data flow, by calling the cancel() method from Subscription interface. This can be called after the instance of the Subscription is retrieved by the Publisher, this subscription is going to be canceled and the publisher won’t emit any more events.

Project Reactor has the implementations of the interfaces aforementioned and is a Reactive library for building non-blocking apps on the JVM. If we check the documentation we find out that is made of several modules.

In reactive-core module we can find Flux and Mono, which represents a Reactive Stream Publisher. The main difference between them is the number of elements emitted, Flux emits 0 to N elements and Mono emits one element. They both complete successfully or with an error.

https://projectreactor.io/docs/core/release/api/reactor/core/publisher/doc-files/marbles/flux.svg

The marbles at the top represents the data retrieved by the data source, with an operator that represents the processing of the data and the marbles at the bottom represents the data that will be sent to the subscriber.

The mono is used when you want just one item from the data source, think of retrieving an user by id.

https://projectreactor.io/docs/core/release/api/reactor/core/publisher/doc-files/marbles/mono.svg

Now in order to build reactive web apps we need Spring WebFlux, which is a part of Spring 5 and provides reactive programming support for web apps. Spring WebFlux uses internally Project Reactor and its publisher implementations, Flux and Mono. This framework supports two programming models:

# Annotation based

# Functional routing and handling

To have a full fledged async, non-blocking app we also need a Servlet container to rise up to the challenge. Luckily if we are using Spring WebFlux, Spring Boot automatically configures Reactor Netty as the default server, which is an async event-driven network app framework for rapid development of maintainable high performance protocol server & clients. Again we can find the keywords: async and event-driven.

When a request is being made to the netty server, this responds immediately with a Future, in this way not blocking the client. This is free to do whatever he wants meanwhile. Also a big bonus is that it handles a large number of connections. An event for netty could mean a clients making a request. A response from netty is also considered an event.

The connection between the client and the server is established through a channel. The netty event loop is looking for events. It has an internal event queue in which it stores the events with the event loop taking the first in (FIFO), until it process all of them. The ChannelHandler is looking for the right endpoint to call, it looks in classes annotated with either the @RestController or a functional web endpoints. Netty event loop associates each request with a channel and in this way it knows to which channel it needs to send the response.

Aaand I think that will be all for today. In the next post I’m gonna try to put all of this together by building reactive microservices using Spring Webflux and Spring Cloud.

Sources:

https://projectreactor.io/

--

--