Processing streaming data with Spring WebFlux

Tl;dr How to consume streaming data with Spring WebFlux and the Reactive Stack

Background: Traditional web applications are considered to be “blocking” where each request is handled by a thread and this thread blocks until it is able to fully process the request. For example, a request to fetch a large amount of data from a database over the network might take a few seconds and until this operation completes, the thread will be blocked from servicing other requests. Hence, the solution used to support high volume applications has been to horizontally scale the applications to ensure that there are enough threads to handle the traffic.

Reactive Programming

Reactive Programming deals with writing code to handle requests in a non-blocking asynchronous manner. (Nonblocking I/O). This leads to better utilization of a system’s resources as a large number of requests can now be processed by a small number of threads. Node.js is built using this principle where a single thread handles and processes requests in an Event Loop.

Each request can be safely handed off to a backend system which will invoke the callback provided by the caller when it has completed the operation.

The Reactive Manifesto (see link at the end of this article) describes in more detail the expectations of a Reactive System.

Note: Reactive Programming in Java and the various libraries in use can make for a very long set of articles. Please see the resources section at the end of this article to learn more.

At a high level, it answers a few questions:

  • How can we process a large number of concurrent requests and leverage threads that would otherwise be idle waiting for a request to complete? While this may not necessarily improve the latencies seen by our users, it solves for better resource utilization and scaling of our applications.
  • How can a consumer receiving data from a producer signal back to the producer that it is getting overwhelmed? This is called BackPressure. If you consider a pipeline made up of reactive components (clients, services, database libraries etc.) each component could be consuming data at a different rate and any component that is receiving more data than it can handle can signal the producing application to slow down.

Going Reactive:

  • In Java 5, Futures were created to return a “promise” upon invocation of an operation. The client would query this Future to see if the operation had completed and to get the result of the operation
  • CompletableFutures were introduced in Java 8 to add more features such as request chaining whereby after performing operation A, the next operation B could be invoked with the result of A.
  • RxJava (Java 6) is a library created by Netflix to handle processing streams of data and supports managing backpressure where a client application (subscriber) can control the amount of data that is sent by the producer
  • With Spring 5, Reactive Streams are used to communicate between the various async components. This is achieved by using Reactor, which is an internal implementation of the Reactive Streams specification. As it heavily leverages Java 8, new applications can use this approach to gain the most benefits. Reactive Streams have also been incorporated into Java 9 in the java.util.concurrent.flow package.
  • The Spring Webflux module in Spring 5 is used to support reactive web applications over HTTP. It also supports WebSockets. This article deals with how we can use Spring WebFlux to create a simple web application that will consume a stream of events.

The Building Blocks:

In Reactor, there are two main building blocks that are used for communication between consumers and producers.

  1. Mono<T>: It represents a Stream with either a zero or one value [0..1]
  2. Flux<T>: It represents a Stream with 0 to many (possibly infinite) values [0..n]

A sample usage would look like the one below where the findPetById method returns a Mono if there is a pet matching the input id. The second method findAllPets returns a Flux or stream of all the pets in the system.

@GetMapping(value = "/pets/{id}")
Mono<Pet> findPetById(@PathVariable String id){
return petRepository.findPetById(id);
}
@GetMapping(value = "/pets")
Flux<Pet> findAllPets(){
return petRepository.findAllPets();
}

Demo

Here’s an hypothetical use case:

  • Let’s say we have an application that fetches adoptable pet data from animal shelters across the country (Petfinder.com in this case) and persists it into a MongoDB database
  • Our Spring WebFlux application will consume this data in real time as it comes into the system. In this case, the data is pushed to the browser as it is persisted in the Mongo Database
Note: More realistic applications deal with streaming data such as Stock quotes, Tweets, IoT readings etc.

In Figure 1 below, the animation shows information about the various German Shepherd dogs that are available for adoption as it comes into the system or in other words as this information is being streamed to the browser.

Figure 1: Proof of concept showing streaming adoptable pets data sent to a browser

Let’s see how this works. In Figure 2 below, we see the same components that we would normally see in a traditional Spring Boot application.

  1. We have a controller (PetController) that is invoked by the browser to load a stream of pets (Adoptable German Shepherds from the zip code 07078 in this case)
  2. The controller talks to a PetService implementation that in turn talks to a PetRepository. The PetRepository extends from a ReactiveMongoRepository that adds the reactive behavior to the Mongo Database
  3. The data in the Mongo database is persisted as a Capped Collection (a collection with a finite number of entries and with a finite size)
  4. The data is pushed from the database to the server and ultimately to the client as a Flux<Pet> stream over a connection that stays open even after the client has received all the data. Behind the scenes, the Mongo Tailable cursor is used to fetch this data.
Figure 2: The Spring WebFlux components
Note: While Figure 2 shows the PetService and PetServiceImpl components, I have used the PetController to directly talk to the PetRepository in my proof of concept.

The code for fetching the stream of pets is in the PetController as shown in Figure 3 below. To simulate real time data, I am sending each record with a delay of 2.5 secs. Note the stream type of TEXT_EVENT_STREAM_VALUE which indicates to the browser that the data is being sent as Server Sent Events.

You can also create a sample record in Mongo and it should show up within a few seconds.

Figure 3: Streaming data to the client using a Tailable Cursor

Finally, the data received by the browser is handled by the pets.html file using a Thymeleaf template, where an EventSource is created to listen to the stream of events that are sent by the server

<script>
var evtSource = new EventSource("http://localhost:8080/pets");
evtSource.onmessage = function (event) {
var data = JSON.parse(event.data);
var table = document.getElementById("pets-table");
var row = table.insertRow(1);
var cell1 = row.insertCell(0);
var cell2 = row.insertCell(1);
var cell3 = row.insertCell(2);
var cell4 = row.insertCell(3);


var email = data.email;
var shelterAddress = '<p>' + data.address1 + '<p>' + data.city + '<p> ' + data.state + '</p>'

cell1.innerHTML = '<td> <img class="img-thumbnail" src="' + data.thumbnailImage + '" width="150" height="100"/> </td>';
cell2.innerHTML = '<td>' + data.name + '</td>';
cell3.innerHTML = '<td>' + email + '</td>';
cell4.innerHTML = '<td>' + shelterAddress + '</td>';

}
</script>

Conclusion:

  • We saw how we can use Spring WebFlux to consume a stream of events. While this example showed the Controller based approach to consuming these events, we could have also used a Functional Programming based approach to process the events from the stream.
  • Reactive Programming may not be the right solution for every web application. Use with caution as there will be a ramp up time to understand the intricacies of this programming model as well as the challenges of debugging issues in a fully async system.
  • Also important to note is that official reactive JDBC libraries are not yet available. Therefore any interaction through JDBC with a traditional RDBMS would still be a blocking operation at this time.

Resources:

  1. Github https://github.com/nmallya/springwebflux
  2. Spring WebFlux documentation: https://docs.spring.io/spring/docs/5.0.0.BUILD-SNAPSHOT/spring-framework-reference/html/web-reactive.html
  3. The Reactive Manifesto: https://www.reactivemanifesto.org/
  4. Mongo Capped Collections: https://docs.mongodb.com/manual/core/capped-collections/