Reactive programming: principles, standards, implementation in Java

Ruslan Yaniuk
11 min readJan 6, 2022

--

Preface

In this article, I will go from the basics of the reactive paradigm up to its implementation.

What is reactive programming?

Reactive programming is a declarative programming paradigm that facilitates the automatic propagation of data change. Here declarative means that we can’t imperatively describe control flow and make side effects, however in the real world reactive paradigm is usually fused with object-oriented programming or functional programming. To make the automatic propagation of data change an expression is used as a dynamic data source that produces the data over time. Here is an example in pseudocode

1. sideLegnth = 5;
2. square = sideLength * sideLength;
3. print(square) // printed 25
4. sideLegnth = 10;
5. print(square) // printed 100

This behavior is possible because at line number 2 we set an expression “sideLength * sideLength” as a dynamic data source rather than a constant data produced by this expression at the moment of execution. So each time when we read the “square” variable our application executes “sideLength * sideLength” and returns the result as a variable value.

Datastreams

Datastream is data that is available over time in chunks usually in a sequential manner. For instance, during the phone call, the voice of the speaker is encoded and transmitted sequentially over the transmission channel, so the recipient receives this signal byte by byte although technically it can be implemented with protocols, packets, and buffers on top. We can introduce data streams into reactive programming by considering expressions like “sideLength * sideLength” as a data stream because it’s also a dynamic source of data that can produce new values over time as we can control the output of this stream by changing the “sideLength” variable.

Specifics of automatic data change propagation

To make things work in reactive programming we need to decide how to connect a variable “square” with the data stream “sideLength * sideLength”, so the variable is always in the consistent state within the application. There are three well-known approaches to do this:

  • Pull (polling): a consumer (the variable) queries its source (e.g. data stream) for a new value by itself with some time interval.
  • Push: as opposed to the pull approach the data source pushes available values into its consumers without any work from the consumer’s side.
  • Push-Pull: a hybrid of the preceding two approaches which provides a more sophisticated mechanism where the data source pushes a lightweight notification to the consumer letting it know that the data is available, so then the consumer can decide how and when to load this data.

The core concept of these approaches is well described by the Observer design pattern from the GoF book.

As a result of evolving reactive programming, the ReactiveX project provides a standard API for implementing reactive processing of data, by extending the ideas from the Observer pattern. Along with the documentation of the extensions, it provides the contract for Observable and Observer entities so the pattern eventually looks like this

The Observable type adds two missing semantics to the Gang of Four’s Observer pattern, to match those that are available in the Iterable type:

  1. the ability for the producer to signal to the consumer that there is no more data available (a for-each loop on an Iterable completes and returns normally in such a case; an Observable calls its observer’s onCompleted method)
  2. the ability for the producer to signal to the consumer that an error has occurred (an Iterable throws an exception if an error takes place during iteration; an Observable calls its observer’s onError method)

With these additions, ReactiveX harmonizes the Iterable and Observable types. The only difference between them is the direction in which the data flows. This is very important because now any operation you can perform on an Iterable, you can also perform on an Observable.

An Observable communicates with its observers with the following notifications:

  • OnNext — conveys an item that is emitted by the Observable to the observer
  • OnCompleted — indicates that the Observable has been completed successfully and that it will be emitting no further items
  • OnError — indicates that the Observable has terminated with a specified error condition and that it will be emitting no further items
  • OnSubscribe (optional) — indicates that the Observable is ready to accept Request notifications from the observer (see Backpressure below)

An observer communicates with its Observable using the following notifications:

  • Subscribe — indicates that the observer is ready to receive notifications from the Observable
  • Unsubscribe — indicates that the observer no longer wants to receive notifications from the Observable
  • Request (optional) — indicates that the observer wants no more than a particular number of additional OnNext notifications from the Observable (see Backpressure below)

So the simplified example of interactions between the observable and the observer could be depicted as follows

The ReactiveX defines two types of Observables: “Hot” and “Cold” Observables.

When does an Observable begin emitting its sequence of items? It depends on the Observable. A “hot” Observable may begin emitting items as soon as it is created, and so any observer who later subscribes to that Observable may start observing the sequence somewhere in the middle. A “cold” Observable, on the other hand, waits until an observer subscribes to it before it begins to emit items, and so such an observer is guaranteed to see the whole sequence from the beginning.

In some implementations of ReactiveX, there is also something called a “Connectable” Observable. Such an Observable does not begin emitting items until its Connect method is called, whether or not any observers have subscribed to it.

In other documents and other contexts, what we are calling an “observer” is sometimes called a “subscriber,” “watcher,” or “reactor.” This model in general is often referred to as the “reactor pattern”.

The motivation of the reactive approach

Before we go to the standards and implementation we need to answer the most interesting question: what benefits can we get by using the reactive approach? This question can be answered by the reactive manifesto which determines reactive systems with the next characteristics/benefits

  • Responsive: The system responds in a timely manner if at all possible. Responsiveness is the cornerstone of usability and utility, but more than that, responsiveness means that problems may be detected quickly and dealt with effectively. Responsive systems focus on providing rapid and consistent response times, establishing reliable upper bounds so they deliver a consistent quality of service. This consistent behavior, in turn, simplifies error handling, builds end-user confidence, and encourages further interaction.
  • Resilient: The system stays responsive in the face of failure. This applies not only to highly-available, mission-critical systems — any system that is not resilient will be unresponsive after a failure. Resilience is achieved by replication, containment, isolation, and delegation. Failures are contained within each component, isolating components from each other and thereby ensuring that parts of the system can fail and recover without compromising the system as a whole. Recovery of each component is delegated to another (external) component and high availability is ensured by replication where necessary. The client of a component is not burdened with handling its failures.
  • Elastic: The system stays responsive under varying workloads. Reactive Systems can react to changes in the input rate by increasing or decreasing the resources allocated to service these inputs. This implies designs that have no contention points or central bottlenecks, resulting in the ability to shard or replicate components and distribute inputs among them. Reactive Systems support predictive, as well as Reactive, scaling algorithms by providing relevant live performance measures. They achieve elasticity in a cost-effective way on commodity hardware and software platforms.
  • Message Driven: Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensures loose coupling, isolation, and location transparency. This boundary also provides the means to delegate failures as messages. Employing explicit message-passing enables load management, elasticity, and flow control by shaping and monitoring the message queues in the system and applying back-pressure when necessary. Location transparent messaging as a means of communication makes it possible for the management of failure to work with the same constructs and semantics across a cluster or within a single host. Non-blocking communication allows recipients to only consume resources while active, leading to less system overhead.

There are two characteristics mentioned above that provide not just a better solution for a problem but also regard to the variety of comparison tests makes it a money-saving approach.

Back-pressure

When one component is struggling to keep up, the system as a whole needs to respond in a sensible way. It is unacceptable for the component under stress to fail catastrophically or to drop messages in an uncontrolled fashion. Since it can’t cope and it can’t fail it should communicate the fact that it is under stress to upstream components and so get them to reduce the load. This back-pressure is an important feedback mechanism that allows systems to gracefully respond to load rather than collapse under it. The back-pressure may bubble all the way up to the user, at which point responsiveness may degrade, but this mechanism will ensure that the system is resilient under load, and will provide information that may allow the system itself to apply other resources to help distribute the load, see Elasticity.

Backpressure is optional; not all ReactiveX implementations include backpressure, and in those that do, not all Observables or operators honor backpressure. An Observable may implement backpressure if it detects that its observer implements Request notifications and understands OnSubscribe notifications.

If an Observable implements backpressure and its observer employs backpressure, the Observable will not begin to emit items to the observer immediately upon subscription. Instead, it will issue an OnSubscribe notification to the observer.

At any time after it receives an OnSubscribe notification, an observer may issue a Request notification to the Observable it has subscribed to. This notification requests a particular number of items. The Observable responds to such a Request by emitting no more items to the observer than the number of items the observer requests. However the Observable may, in addition, issue an OnCompleted or OnError notification, and it may even issue such a notification before the observer requests any items at all.

An Observable that does not implement back-pressure should respond to a Request notification from an observer by issuing an OnError notification that indicates that backpressure is not supported.

Requests are cumulative. For example, if an observer issues three Request notifications to an Observable, for 3, 5, and 10 items respectively, that Observable may emit as many as 18 items to the observer, no matter when those Request notifications arrived relative to when the Observable emitted items in response.

If the Observable produces more items than the Observer requests, it is up to the Observable whether it will discard the excess items, store them to emit at a later time or use some other strategy to deal with the overflow.

Non-Blocking

In concurrent programming, an algorithm is considered non-blocking if threads competing for a resource do not have their execution indefinitely postponed by mutual exclusion protecting that resource. In practice, this usually manifests as an API that allows access to the resource if it is available otherwise it immediately returns informing the caller that the resource is not currently available or that the operation has been initiated and not yet completed. A non-blocking API to a resource allows the caller the option to do other work rather than be blocked waiting on the resource to become available. This may be complemented by allowing the client of the resource to register for getting notified when the resource is available or the operation has been completed. The most trivial example could be a client-server communication using HTTP protocol and because HTTP is a synchronous protocol the client must wait for a response after sending the request. So in the blocking approach, the thread writes a request, waits until the response arrives, and if it arrives then processes it. In the non-blocking approach, the thread writes a request, subscribes for a notification, and continues execution, so we can reuse this thread for instance to make another HTTP request and when the response arrives the thread will get a notification thereby it can decide when to process the response.

You may be wondering how does the “response: Observer” entity know when the response data is available, so then it can start emitting it? It depends on the implementation. For instance, in Java, there is the “java.nio” package which provides Selector and Channel classes to support non-blocking IO.

But the most common approach is the event loop. The basic application which incorporates this approach can be depicted as follows

In the Java world, one famous HTTP server is Netty and it uses the event loop to organize reactive calls. Its implementation of the event loop provides an ability to register non-blocking IO channels so each event/task which requires IO operations can register the channel and the handler so for example when the data arrives the server thread (not the one in the event loop) publishes an event into the event queue and further processing is done by the event loop. The event loop can be run by single or multiple threads, in Netty, the default number of threads is equal to the number of CPU cores but not less than 4.

The ReactiveX API

We’ve already seen the core specification of Observer and Observable entities but it also provides documentation for the next components:

  • Operators — a list of commands to transform Observable
  • Single — a simplified version of Observable that emits only one item or an error
  • Subject — a sort of bridge or proxy that is available in some implementations of ReactiveX that acts both as an Observer and as an Observable.
  • Scheduler — incorporates support for running Observers concurrently. Mainly used with operators.

Also, The Reactivex project accompanies its explanations of the components with “marble diagrams”. Here is how marble diagrams represent Observables and transformations of Observables:

Such diagrams are widespread around the Project Reactor documentation as well.

Reactive programming in Java

There are several projects each provides support of reactive programming for Java applications:

Conclusion

The reactive approach is definitely worth taking look at it if the system is highly dependent on the IO operations but it may cost some deep investigation into the topic to make a complete understanding of its principles and implementation.

--

--