CQRS in the Cloud with Spring Boot.
I wanted to write this article for a long time. As I find usages for that pattern in almost every project I worked on (Although i’m not always able to use it). But first some background.
CQS
The ancestor of CQRS, concept first popularized by Bertrand Meyer. In 1986 (So oooold) At it core it stated method can either:
- return a something but not modify a state of the organization.
- or it can return a void or null or nothing and modify a state.
It gave us something called Idempotence a concept best described: “Question should not change the answer”. CQS is not a principle in itself instead it is a principle suggestion (Martin Fowler).
CQRS
Concept coined by Greg Young went step further and methods ware replaced with objects. And so the code that modifies the data should be in one place and the code that read the data should be in another. I come from the java world so in my case this usually results in two different projects.
That’s the picture that most of us start with. God, old classical three layer architecture. Fronted that exchanges information with the application (business layer) and relational database that stores data.
This is very simple. And simplicity should always be something we aim for (KISS — keep it simple stupid, rule for all developers especially juniors) For a long time this picture might be how our application looks like.
As long as the code is maintained properly and with good design in mind we can avoid a lot of accidental complexity that would be introduced if we start to mess around. But there comes a point of time when perception of data from a point of view of the query is not the same as the perception of data from a point of view of an update. Meaning our screens start to look very different then our tables. As soon as someone proposes “Let’s create a view …” you know that happens.
At that point you go on and start creating a view after a view usually splitting your business logic between java and sql. It happen to me. This is a good point to consider slicing your app in to two. One handling commands. Another dealing with queries, presentation of data to the user. If you do that, you implement CQRS.
There are several benefits of this approach:
- Things on the command side can get very complicated. You can tackle that complexity at a heard of software design ( shout out to all fans of Eric Evans :)) by using things like bounded context, aggregates and other concepts from Domain Driven Design.
- Querys are very simple (if you have a new member in the team he/she can start there, even when they screw up no big deal, as an added plus). You can forget about transactions, ORM, Hibernate sessions, You fetch object in to a DTO and push it to the screen.
- Because queries can have much more relaxed consistency model (that can be eventually consistent) we can cache the result. So there is a performance boost. Not to mention that when you look at the traffic they are much more abandoned. Sometimes for one command there is 1000 queries. For shops like amazon the number is probably much higher.
- You can use different type of databases (careful with that one, use only if it reduces complexity). Screens and queries are, usually, tightly coupled. You might reduce complexity by moving to some nosql flavor. And normalize the data to fit the problem. Queries are usually what you need to scale, often by a significant factor. And we know that RDBMS are not great at horizontal scaling (because of ACID :)), while thinks like MongoDB or Redis can be.
- Good fit for distributed systems, queries run under relaxed consistency. So that can be called asynchronously with much less thinking (usually you don’t have to worry about transactions, state change, error handling). It is also worth mentioning that MongoDB, Couchbase, Cassandra in Redis come with the reactive driver, MySQL, Oracle DB does not, although they are thinking about it.
- Maps well into a REST Methods (there is no cognitive friction that occurs if we our commands and queries were exposed via single SOAP endpint). We have our GET-s for the reads and POST, PUT, PATCH, DELETE for the writes.
Watch out for:
- Eventual consistency, for some business models it can be quite a concern.
- Events, not really necessary but it is a good practices to update your relational db in the context of transaction and just emit event that will signal that the read model has to change.
- A lot of accidental complexity that might be not needed. This is very hard to judge. All I can advice is that take a breath before go CQRS rout, ask a question on stack overflow, take a walk around the office and ask for advice. Speaking from experience there is a lot of monsters in the bushes and you can see the until they start to attack you. If your problem space is simple and your db model maps well to the UI don’t mess around use CRUD approach and save your strength for something else.
What cloud has to do with CQRS
Well nothing really. You implement it the same way you would do in on premise. Cloud gives you more flexibility on the site of infrastructure. But other then that not much else. I mentioned this only because everyone wants to do microservices in the cloud this days.
This is not that complicated. Greg Young himself said that this is a dumbest pattern ever. It is simple but it is not easy. Eventual consistency, asynchronous execution are huge complexity busters and we should always hesitate when playing that card.
Things to consider
Ok despite my warnings you decide that moving our architecture from CRUD to CQRS will save the world. What are the main decision that you have to make:
- How will we query our read model. I was in a project where there was quite complex search screen. And after some analyses we decide the we will use the relational databases and table that looks exactly as that screen. With indexes created on the appropriate columns it worked very well. On another problem (this was the same project) we use hazelcast because we had to serve hard to compute object to the particular user. So the distributed hash map was the best solution. The point of CQRS is to reduce accidental complexity and one of that ways you can do that is to use the right tool for the job (think Polyglot Persistence, man I love buzz words).
- What can I remove from my write model. As our project grows business data start to mix with some technicalities, bunch of flags indicating this record is processed or exported show next to the invoice amounts. This all has to go. If you can’t see, in the future, having this clean write model don’t do CQRS. At least don’t lie to your self that this will benefit the project. All it will do is to add another tik mark on your CV. And more complexity to the project.
- Should I try applying DDD (Domain Driven Design) to command part of the system. The answer is yes. After you are done every column in the database have to have a business meaning. And tables have to become aggregates, business entity that are understood the same way by the business person, programmer, and domain expert (if exist). The database becomes a single source of truth for the organization, and the code becomes description of how you company makes money. I will write more about DDD. If for not any other reason to solidify my own knowledge.
- Should I do a monolith or distributed system. The answer: it depends. First I’m not talking about microservices here. Just either two modules on one machine copied bunch of times or two modules on two different machines copied a bunch of times. I generally like to start with two modules in one bundle, it is easier to refactor that way. In the long run it depends on: scaling needs, consistency needs, cost of infrastructure, desire to make life complicated, etc :).
- How will I synchronize my read model. The answer: it depends. On the consistency needs of the problem. On some models the requirement for consistency was so strong that we updated read model in the same transaction as the write model. Is some other cases we do a batch processing at night that pulled the data for some daily reports. However most of the cases (finger in the air estimate 90%) should be covered by events. Here my advice is always use external message broker. Even in monolith. It decouples the read and the write model stronger then application level events (spring events, go channels etc.).
- And the last. I will faze it in the form of a statement. No fucking dependency between the command and the queries modules (minus some extreme cases of super strong consistency). If you have a event passed from one module to another do not, I repeat do not create the library that will contain DTO as a dependency to both modules. They will change at different speeds and build of one should not trigger the rebuild of another. The ancient GO proverb says: A little copy is a lot better then a little dependency. Stick to it. Parse the object to string using json format and then reverse it on the other site. And once we are on the subject one queue for one message type. No switch statement or any type of conditional logic to differentiate between messages this is a bad,bad pattern. (it violates open/close principle).
Example
I created a example to show more or less how I like to attack this problem. Example is in JAVA using String Boot. Command writes to H2 in memory database and using ActiveMQ (also in memory) I’m passing the events that trigger the writes to in memory (this is how i like to work as everything is on my machine)MongoDB. The cool part is that the command site of things is blocking and transnational. And the read site is fully non blocking. The reactive API implemented in spring 5 gives us this ability. And we should take advantage of it as often as we can. Without over engineering. I’m using gradle so just import it to your ide and start having fun. I used Java 9 so add “ — add-modules java.xml.bind” flag if you are using 9 yourself. Take a look at the test written. I don’t have 100% tests that i like to have but, I wanted to finish this article before my vacation, but it also shows how I test the reactive parts, endpoints, and the business rules that come out of my domain.
Some Additional Reading
Blog post from the inventor himself (Greg Young):
Blog post from Martin Fowler:
Blog post from Udi Dahan: