EXPEDIA GROUP TECHNOLOGY — DATA

Fully Reactive Stream Processing with Apache Kafka and Project Reactor

What it means to write fully reactive code in asynchronous streaming processes

Sage Pierce

Published in

Expedia Group Technology

7 min readFeb 16, 2021

“Streams” of snow flowing down a mountainside — Picture by Harrison Candlin on Pexels

During my time at Vrbo™ (part of Expedia Group™), I’ve spent more than a few brain cycles pondering stream processing. I’ve enjoyed my journey of learning how to write reliable, observable, and performant message-driven asynchronous processes. Part of this saga even involved developing and open sourcing Expedia Group’s own reactive streaming framework, Rhapsody.

Nowadays, I enjoy any opportunity to implement or tweak reactive streaming processes. I actually prefer developing these asynchronous processes over typical request-response oriented paths, e.g. REST or gRPC. Typically, the input for stream processes is already sitting around just waiting to be consumed (i.e., on a Kafka topic) rather than needing to be generated (i.e. through an HTTP endpoint). This equates to an immediate turnaround between writing initial streaming code and being able to verify correctness and load-handling against real data.

In my opinion, fully reactive processing techniques are nuanced to accomplish, but once done a few times, become comfortable ways to implement today’s applications that are responsive, elastic, and resilient. My goal is to provide further encouragement for such techniques, and show what those techniques look like.

Starting with an example

Let’s first start with an example of what I consider to be fully reactive code, and then explain what makes it fully reactive. This example comes from a project I’m currently working on within Vrbo and uses components from Rhapsody.

This example is the definition of a reactive stream that subscribes to Kafka Record values, extracts data from those values (Function<T, Optional<CacheableData>> dataExtractor), and caches the data:

Note that the example code delegates to other resources that are themselves fully reactive; Otherwise this example would not be fully reactive.

What is “fully reactive”?

I like to rely on the Reactive Manifesto to define what “reactive” means. This manifesto talks in terms of overall systems, but also mentions that such systems are composed of smaller components (i.e., the code we write) that are themselves reactive.

A diagram showing reactive concepts, which are described in the following sections — Reactive traits and relationships to one another

I’ll add some of my own commentary on the four core characteristics of judging a process’ “reactiveness”, namely being message driven, elastic, resilient, and responsive (maintainability and extensibility are, in my opinion, general side effects of writing the clean code that achieves high values in the core metrics).

In my experience, these characteristics are subjectively measured on a general spectrum; the overall reactivity of a process or system therefore also exists on a spectrum, such that any given implementation of a system may be judged as more, less, or equally reactive in comparison to another. To me, “fully reactive” means each of these characteristics are either maxed out or arbitrarily configurable in a way that enables efficient use of available resources (CPU, memory, network, etc.).

Message driven

One might think that being message driven is simply true or false based on whether or not a process is consuming/reacting to messages from something like a pub-sub or event bus (such as Kafka). In reality, whether or not the entirety of an async process is message driven is a bit more complicated.

Reactive systems rely on asynchronous message-passing to establish a boundary between components … [enabling] load management, elasticity, and flow control by shaping and monitoring the message queues in the system and applying back-pressure when necessary… Non-blocking communication allows recipients to only consume resources while active, leading to less system overhead.

The key phrases to qualify being message driven are “asynchronous message-passing”, “back-pressure”, and “non-blocking communication”. In order for any process to be fully message driven, these aspects need to be applicable to all components of that process. Those components need to be tolerant of asynchronous boundaries. Those asynchronous boundaries need to support back-pressure (fast producer, slow consumer scenarios). Communication with external resources, like databases or APIs, should be non-blocking.

Looking back at the example code, there are both hidden and explicit asynchronous boundaries. KafkaValueFluxFactory hides one boundary between a Thread that interacts with the underlying KafkaConsumer (initializing, polling, offset committing, etc.) and a Thread that pushes polled Records through the pipeline. There’s an explicit asynchronous boundary at the publishOn (with a back-pressure buffer of size 256) between the Thread pushing/transforming Records through the pipeline and Threads that initiate the caching process. The caching process itself has more asynchronous boundaries as it communicates with a database/cache, all in non-blocking manner.

Elastic

Elasticity in a reactive process may be the hardest to explicitly show in code.

Reactive Systems can react to changes in the input rate by increasing or decreasing the resources allocated to service these inputs.

Outside of automatic application scaling (i.e. via instance count, memory, or CPU, all of which are good examples of elasticity in a reactive system), elastic resourcing in a reactive process is not usually straightforward to spot. It’s usually provided by the implementation of lazy resource allocation and efficient resource reuse. Fortunately, this type of resource usage typically comes for free when using non-blocking libraries. These libraries inherently have to use some sort of resource pooling for things like threads (synchronous protocols, like HTTP/1), connections (databases), channels (HTTP/2, gRPC), etc. Typically, these resources are elastic up to some configurable cap.

In our example, usage of the bounded elastic scheduler in publishOn provides elasticity where threads will be reused when few groups have data to publish (low load), and more threads will be in use when there is a lot of data being processed (high load). More elasticity is hidden in the caching process which relies on non-blocking database communication. In the case of the project I am currently working on, we use Mongo’s Reactive Streams Driver (delegates to lower level Async Driver under the hood) which implements elastic connection/channel pooling.

Resilient

When do you think about errors? Do you plan for them when you’re initially writing the code, or only after they happen and cause an incident?

The system stays responsive in the face of failure

Errors should be treated with just as high priority as data. In other words, fully reactive processes plan for and gracefully handle errors in ways that don’t get you paged at 3am. Errors will happen, but if you can make recovery from errors as automatic and graceful as handling data, the process will maintain responsiveness while whatever condition that caused the error in the first place is remedied.

The main contributor to resiliency in the sample code is the ResubscribingTransformer which handles all upstream errors (like failures to poll, commit offsets, deserialize Records, etc.) by letting the current subscription die and subsequently resubscribing.

Responsive

What’s the latency between a message being available for consumption and a process fully reacting to it? Is it well-known or unbounded? If it’s known, is it within SLOs?

The system responds in a timely manner if at all possible… Responsive systems focus on providing rapid and consistent response times, establishing reliable upper bounds so they deliver a consistent quality of service.

Any given input in to a reactive process should produce an observable response in some known and bounded amount of time. In other words, it should be a real-time process. In an ideal world, for any given message ready to be consumed, a subscribed reactive process would have immediately-available resources to handle that message. Practically, however, we know this can’t always the case, and this can be due to many reasons including:

Limited resources — Network, CPU, and memory limitations put an upper bound on how many resources are available to process messages
Need for in-order processing — Many (if not most) practical message handling processes need to handle related messages in a certain order. Backups of related messages can therefore degrade responsiveness
Temporary degradation — Messaging infrastructures may become temporarily unavailable; event storms happen

Ignoring temporary degradation (which should be handled via resiliency) and unavoidable in-order processing overhead, a fully reactive process should have proverbial “levers” in place to tweak apparent responsiveness. Going back to the example code, responsiveness can be tweaked via the “grouping” being applied to messages (via groupBy). A higher number of groups (controlled by grouping modulus) makes more processing rails available to handle messages, and vice versa. Meanwhile, related messages (messages with the same “data id”) are guaranteed to be processed in the same order they are received.

Benefits of fully reactive code

So reactive code and reactive systems exhibit the above characteristics, but what’s in it for you? In my opinion, quite a bit:

Super scalable — With far fewer (zero) CPU cycles spent blocking, much more “stuff” can be accomplished per instance of an application
Interchangeable — Especially if you’re using the Reactive Streams API, reactive components can be interchanged among various infrastructures (Kafka, RabbitMQ, AWS Kinesis, etc.); These components can even be applied to reactive request-response based processes (i.e. gRPC) just as well as streaming processes
Get paged less — With resiliency and errors treated with the same priority as data, reactive code is more likely to gracefully handle failure scenarios and/or provide clear messaging about errors encountered

Conclusion

Reactive programming is becoming more and more prevalent, and I believe that’s a good thing for performance and efficiency. I hope this article provided you some insight in to what fully reactive stream processing looks like, as well as some encouragement to integrate reactive characteristics in to your projects.

Learn more about technology at Expedia Group