Expedia Group’s Rhapsody Is Now Open Source
An asynchronous event processing library that builds on the Reactive Streams specification
For a significant portion of my tenure at Expedia Group™, asynchronous event processing in Java-based applications has been a personal focal point. Much of my experience with asynchronous processing, or “streaming”, has had to do with heterogeneous infrastructures, like Apache Kafka®, RabbitMQ, and database change data captures (CDCs). Prior to digging into this realm of engineering challenges, there appeared to be no prevalent way of linking disparate messaging endpoints together, little consistency among the growing abundance of asynchronous code, inconsistent observability (if any) around asynchronous processes, and a general lack of confidence in the reliability of asynchronous event processing applications.
In various tech blogs internal to Vrbo™ (part of Expedia Group), I documented my journey into asynchronous event processing based on Reactive Streams. Born of this journey was an internal common Reactive Streams library used by dozens of applications throughout the Vrbo ecosystem. Daily, these applications handle tens of millions of events, ranging anywhere from <1 transaction per second (TPS) to >1000 TPS. The goal of this library has been to provide a framework for reliable, scalable, and observable asynchronous event processing, all based on an API so deliberately abstract that it inherently lends itself to long term evolution and immediate flexibility.
I am now happy to announce that this internal library has been repackaged and open sourced under Expedia Group’s Open Source Software Foundation as Rhapsody. In addition to this announcement, I’ll use this post to point out some of the highlights of Rhapsody.
Kafka, RabbitMQ, Amazon SQS, Amazon Kinesis, Apache Pulsar… and on, and on, and on. The number of asynchronous eventing platforms is ever-growing, and so too is the number of frameworks built around them. Typically, developers are tasked with writing some process against one of these infrastructures that consumes messages, applies some sort of transformation and/or enrichment, and produces the result back to the same infrastructure.
But what happens when the source and destination infrastructures are heterogeneous? Depending on the infrastructures in play, it’s hit-or-miss on whether there’s an out-of-the-box solution for processing from one infrastructure to another. And if there is such a solution, it may require some concessional tradeoffs.
For example, imagine that a company is transitioning its events from RabbitMQ to Kafka. You might choose to leverage Kafka Connect to forward and/or process all messages from RabbitMQ queues on to Kafka topics. Certainly, this approach would work, but it has some drawbacks:
- Only works if using Kafka
- Resulting code will (in all likelihood) be Kafka-specific
- Requires Kafka Connect application and possibly two “hops” (RabbitMQ → Kafka, Kafka → Kafka), depending on implementation choice: If the transformation is a Kafka Connect transform, it will have to be included in the Kafka Connect application; else, replicated messages would be processed between Kafka topics by a Kafka stream
Experience has shown that solutions like this become brittle over time if (among other possibilities) the destination changes from Kafka, if the processing becomes anything different from a one-to-one transformation, or, when considering high message throughput, if the transformation requires communication with external I/O-bound resources.
Clean coding principles suggest the best way to code similar functionalities (like message consumption and production) in the face of incongruent APIs (like RabbitMQ Messages vs. Kafka Records) is via abstraction. Fortunately, the folks at Pivotal, Netflix, and Lightbend/Typesafe have been working for years on just such an abstraction, called Reactive Streams. Pivotal and Netflix offer their own Reactive Streams implementations in Project Reactor and ReactiveX, respectively.
Leveraging Reactive Streams (with Project Reactor as the composition library) allows Rhapsody developers to write the “meat” of data processing pipelines agnostic of the infrastructures in play. For example, check out this runnable sample of message processing from RabbitMQ to Kafka. The “transformation” logic in this sample, though small (
String::toUpperCase), is only based on the value types being processed, and not the source or destination of that data. In fact, a sample showing the inverse of this process is structurally identical, even though the source and destination have been switched.
We’ll also note that those samples demonstrate the high level of customization and configurability available to users of Rhapsody. The sample code starts with nearly the bare minimum configuration that any application connecting to RabbitMQ and Kafka would need. The remaining code shows the declarative fluid nature of code written against a rich, reactive API, like Flux from Project Reactor.
At-least-once processing guarantee
On the surface, at-least-once processing seems like a pretty simple mandate. Indeed, most platforms I’ve seen inherently implement at-least-once processing, and in the case of Kafka, may promote exactly-once. In my experience, at-least-once is a table stakes requirement that satisfies a vast majority of event processing use cases.
With Reactive Streams, guaranteeing at-least-once processing is a devil-in-the-details challenge. Reactive Streams, and an implementation like Project Reactor, only provides governance over the exchange of data across asynchronous boundaries, and lacks any semantic/convention for communicating logical processing completion through those exchanges. That asynchronous bounding, parallelism, and decoupling of data processing from error emission, all inherent to the definition of Reactive Streams, obfuscate the answer to the following question:
Has this data been fully processed without causing an error?
In attempting to provide a platform-agnostic solution to answering this question, Rhapsody introduces the idea of acknowledgeability. The idea of being acknowledgeable is straightforward: An original data item provides callbacks to be propagated with it and its downstream transformations/aggregations, and is considered to be “in flight” until one of those callbacks is executed. Those callbacks abstract away the infrastructural specifics of what should (or should not) be done when a data item is either fully and successfully processed, or if its processing is exceptionally terminated.
In order to be acknowledgeable, acknowledgeability must be idempotent and have no restriction with regard to the order in which acknowledgeability is executed (i.e., no coupling between emission order and execution order). In addition, a correct acknowledgeable pipeline must guarantee that every emitted acknowledgeable is acknowledged at some point in the future (must not lose or “drop” data items).
Implementing this behavior is easier for certain infrastructures than others. With RabbitMQ, for instance, acknowledgeability is easily implemented due to the support of single ack/nack. With other infrastructures, like Kafka, we have to be careful not to commit up to a record’s offset until it and all the records that came before it (in the same topic-partition) have been acknowledged.
The following animation represents how Rhapsody applies the concept of acknowledgeability to the processing of Kafka records. Data is being processed in parallel which results in out-of-order completion of data items. In this case, Rhapsody (by default) implements an “acknowledgement queue” to ensure we don’t commit the offset for any record which itself or any of the Records that came before it have not finished processing.
In practice, acknowledgeability allows Rhapsody users to parallelize processing of records in a way that is decoupled from the physical partitioning of Kafka records, while still maintaining an at-least-once processing guarantee. This can allow for parallelization higher than that of the physical partitioning of a Kafka topic, while still offering logical in-order execution if a logical grouping of data can be applied (represented in the previous animation by color grouping).
What is this stream doing?
Meaningful observability into distributed event-driven systems is plainly invaluable. If there isn’t at least some form of adequate logging, figuring out what any asynchronous process is doing or has done may be exceedingly difficult.
Rhapsody aims to provide visibility into infinite asynchronous Reactive Streams processes by integrating with popular APIs for logging, metrics, and tracing. These APIs include SLF4J, Micrometer, and OpenTracing.
Rhapsody’s integration with OpenTracing combined with acknowledgeability enables a whole new level of observability into what an asynchronous process is doing. Check out this runnable sample to see how message production and consumption can be traced, as well as how each stage of a pipeline is traced and timed. For the visually inclined, a run of these traces can be visualized via Haystack:
The road to open sourcing Rhapsody has been a long and challenging journey of learning. I am both excited and anxious to be contributing it back to the open source community as it shows what’s possible for those familiar with Reactive Streams, and what’s possible for applications built on it.
If you’re interested in learning more about Rhapsody, want to contribute, have recommendations, or have questions, leave a comment and check out the GitHub Repository! As of today, Rhapsody has integrations with Kafka and RabbitMQ, with plans to investigate and add integrations for Amazon Kinesis and Amazon Simple Queuing Service, and would be great contributions for anyone looking to jump in!