Nifty tool-chain for CQRS application development with read-model projection
Event-driven applications are constantly gaining in popularity across industries, especially in areas where large legacy systems provide the core domain of a business, such systems often cannot be replaced, but still, need to interact with modern architectures. To achieve this, a well-known pattern is the command query responsibility segregation (CQRS). Because this pattern adds extra complexity due to the asynchronous behavior, in a recent project here at ELCA we were looking for a good solution on how to support local development, testing, and debugging through a nifty tool-chain.
We have selected Spring Cloud Stream for the event processing, Kafka as the streaming platform, and MongoDB for storing the read models. In this post, I would like to outline some interesting implementation details based on a condensed scenario focusing on the read model projection part of CQRS.
A closer look at the topology
Condensed from the real project, there is one Kafka topic receiving change events as soon as a person record is being modified in the core system, where the particular event is being generated through change data capture. Because the core system stores one or more addresses separately and there is no guarantee that a person has at least one address created at the same time, we need to join the persons and their addresses creating a root aggregate for further read model projections. Also worth mentioning is each event is expected to contain not only the state change but also the full history of previous changes (maybe not exactly by the book).
For those folks being interested, the join method may look very similar to the one below:
As mentioned, every event also contains the history, hence we can optimize and configure the involved Kafka topics to be compacted (no event sourcing needed) — this enables switching from KTables to KStreams in our join method, doing an in-memory conversion to internal KTables where no more physical helper-topics and unnecessary data duplication will need to be created by Spring Cloud Streams.
To boost performance with lightweight scaling, as well as overcoming the latency for message schema validation against the registry and writing the models into MongoDB within the particular worker thread, the co-partitioning feature is being used to have multiple threads processing the person-address joins. Here is a short look into the Spring settings configuration:
The tool-chain
For the impatient 😃, an example declaration of the tool-chain can be found here: docker-compose.yml
Coming back to the illustrative architecture view from above — As the change data capture part heavily depends on a large legacy system is reconsidered too expensive to be set up locally, a custom message producer tool has been implemented. Using the same local Avro schema registry, this custom tool integrating the Avro client library allows producing new events locally as well as being flexible in testing changes to the event schemas not to be published to the real registry right now. Concerning the Avro schemas, these are being curated in a separate code repository where for local development a separate maven plugin is being used to upload the schema files into the local registry.
For an easy exploration of the Kafka topics without having to query them per CLI, Kafdrop has been added to the tool-chain. Furthermore, this tool provides useful insights about topic partitions, offsets, events, and much more.
To be able to verify the read models being projected and stored in the MongoDB, the official Mongo Express tool is being used, offering interesting document and query statistics, but also a wide range of other useful features making it very easy to analyze and debug stored information.
Another may be rather small but useful detail to mention, is about the creation of the initial topics in Kafka — so once the local infrastructure is up and running, a simple text file containing the needed topic configurations can be used to declare these topics as compacted:
And last but not least, the Makefile to specify the needed helper command for topic creation:
Conclusion
Even if this blog post contains project-specific aspects and adoptions, it may still provide a good starting point for your own streaming application development. However, regarding testing and debugging, unit and integration tests often are not enough. Due to the complexity of asynchronous messaging, a supporting tool-chain is essential. Furthermore, such a tool-chain could also be deployed into one of the available container orchestration platforms, like OpenShift which is widely used at ELCA.
For those folks looking for a working demonstrator, while not caring too much if it is Java or Kotlin, please have a look at my read-model-projections example.