Do you need Reactive Programming for your Project?

Mary Reni
Javarevisited
Published in
5 min readMar 29, 2021

A simple explanation for Java Engineers with a case study

With new technologies and frameworks coming into the Software Engineering stack every single day, it has indeed become a curse for the Developers to jump and adapt to any new technology that created a buzz. But unless your project demands it, there is absolutely no need for us to adapt to it.

Photo by Jacek Dylag on Unsplash

Take a good look at the image above.

What is happening there?

  • Tap water was flowing
  • A glass tumbler was used to catch the water stream to store it
  • Glass tumbler is full, water overflows
  • The Person is reacting to the overflow by closing the tap, hence stopping the water stream

Now imagine a Software system with a heavy load of incoming data. Let us imagine we are developing a Data Analytics platform with real-time data.

First, we are going to need a service to receive and transform the tracking data from the User systems, so that the Data Analytics people can use the data for their Analytics work.

Our simple system is going to look like this:

  • A microservice to receive the tracking data
  • A simple in-memory queue for adding the tracking data
  • Another consumer service for picking up the data, aggregate it, transform it, and then insert into the database

Let us take an e-commerce website for the end-user system. For simplicity’s sake, let us assume we are tracking only one website and our client-side script does not aggregate user clicks. We are going to receive one incoming message for every click on the e-commerce website.

Everything is going well. The website is doing moderately well, so we are receiving a decent amount of data and our systems are doing great.

Now assume that the e-commerce website owner decides to push the marketing massive, and he is going to open Black Friday deals as well. He expects the user-clicks to be millions per minute.

Let us think about how our services could be impacted?

  • Our consumer service, which does database insert per click, could massively underperform
  • queue size could hit the limit.

How to prepare our systems to manage the increased load?

We bring in the Megatron Kafka for the latter. As long as the size per message is within few kilobytes, Kafka is the King.

How about our Consumer service? It is going to need to do really well.

What happens when we have more data than we can handle? Do we drop them? Do we buffer them? Do we handle them differently?

With our current consumer service, there is no option to decide the above. It simply has to work really well.

Let us first think about how can we improve consumer service. We are still working for only one client (our e-commerce website).

  • Make the service horizontally scalable, add several services to do the processing parallelly. We can achieve this by partition the key with Kafka. For example, one service for one product area in the website.
  • Instead of making one db call per message, make bulk inserts.

We can also make improvements in our client-side script to aggregate user-clicks (maybe time-based or count-based) and send them instead of sending them every time.

Even after doing the above (much more improvements can be done, but this is not our focus here), what if we are still having a burst of incoming data which the service cannot handle?

This is when the reactive model comes in.

With an event-driven system like ours, even after optimizing our service, sometimes we get more than what we can handle.

If the consumer can communicate/react that he is overwhelmed, the producer could adjust his data flow like our tap water scenario.

Let us take Project Reactor for our case study. Project reactor is one of the finest documented frameworks out there.

Our System would look like below now.

In our example, let us imagine we have written a class that would consume from the Kafka topic and creates a dataflow out of it, we can call this class TrackingEventPublisher.

Below is a simple test class to simulate our dataflow pipeline.I have created two simple data classes, TrackingData and TransformedData to showcase how the data transformation happens with this pipeline.

Project reactor gives Flux to produce a sequence of items. I am just creating a sequence of two items from a simple list, for learning purpose here.

Simple reactive processing pipeline

In the above code, we split the data persistence layer into a subscriber. We did this, as we think the DB persistence could be the bottleneck in our case.

On the publisher side, the data is received, it is then transformed and pushed into the subscriber. In our case, there are only a couple of data transformation steps, but in real worlds transformations, there would be multiple db calls, or multiple rest calls in the data flow. We would need to find the bottle-neck and manage it accordingly.

When the subscriber is slower than the publisher, i.e., when the database persistence is happening slower than the transformation step, data would be accumulated on the publisher side.

With reactive programming, we can handle this and inform the publisher on how to handle this situation. This is called Backpressure.

In our case, the reactor publisher will not push data into subscriber until there is a request from the subscriber. If the subscriber is busy processing and had not requested for new data from publisher, then it means that publisher is hitting the backpressure signal.

In our example, I have provided this onBackpressureBuffer operator to buffer those data stream until it hits 1000. It would drop anything after that.

For example, if the subscriber can only write 10,000 data objects into the database per second, but if the publisher is trying to push 11,000 data objects into it, the publisher would hit into backpressure signal(i.e. subscriber is lagging by 1000 now). Now, the publisher would buffer the data instead of trying to downstream it. This is called backpressure handling.

There are a number of strategies to handle backpressure like this.

Backpressure handling is one advantage in reactive programming, others include non-blocking handling to data requests and so on.

Every framework provides a number of operators like Project Reactor does for Flux, to do the data transformation and make the processing robust and resilient.

With Reactive Programming, we could transform and handle the data better, but still we should handle the services well like making it scalable or improve the db queries, optimizing it and so.

Reactive Programming is not a Panacea for every problems we face in our software systems.

Continuous monitoring, refactoring and improvements are what make our systems healthy in the long run.

--

--