Be careful when building your CQRS read model using domain events

Adrian Chlebosz
the-stepstone-group-tech-blog
7 min readJan 19, 2023
Photo by Federico Beccari on Unsplash

Microservice architecture gained significant popularity in the past few years. Despite all the benefits, it unfortunately comes with previously unknown issues as well. One of them is the problem of gathering all the necessary information to serve it to the front-end part of the application. The most intuitive solution, that almost immediately appears in developers’ minds, is to use an API composition pattern. Although appealing, it’s not the ideal solution. Especially in the case of systems intended to be highly available and fast, calling multiple services just to serve one view might not be acceptable. We need to come up with the different idea to support queries. Let’s give CQRS a try.

CQRS

The CQRS pattern, as described by Chris Richardson, is an alternative to the API composition pattern. Instead of gathering all data from multiple sources at the time of request handling, it’s possible to build a specialized read model storing all required information pieces ahead of time. This read model has the form of a separate database used only by the edge service and populated with, in fact, duplicated data. Below we can see, how the high-level architecture of such an approach might look alike.

However, CQRS is not a silver bullet to the problem of queries in systems built with microservices architecture. Read model needs to be populated — that’s the easy part. Additionally, entire solution needs to take into account changes in the database’s schema (e.g. coming from the fact that one more field needs to be displayed in the UI) — that’s the difficult part.

Building read model

Let’s look at the image below. Both events have a straightforward structure consisting of only two fields. Additionally, we need only one piece of information from each event. Edge service is responsible for extracting and appending it to the correct object in the database. It can find this object by the id present in both of the domain events.

In the end, when we consumed all waiting domain events, the first version of the read model is ready. We can use it for serving data to the UI for now. Please notice, that we just made a silent assumption that all the domain events will be accessible. It doesn’t need to be true, because of the reasons I will describe in the following section.

Possible problems with read model

Backfilling — no data accessible

With time the product evolved and it turned out, that there was a need to serve more data to the UI. It’s a scenario that cannot be ruled out. Let’s say, that “more data” will be one additional field, which appeared in SomethingHappened event some longer time ago.

From the engineering point of view, the solution to this problem seems to be relatively easy. There’s a need to:

  1. Consume all the events of the particular type one more time.
  2. Locate the correct object in the database using the id present in the domain event.
  3. Update the stored object.

Unfortunately, there’s a high risk that such an approach will turn out to be a naive one. It’s all because of the fact, that at least some part of the events might be already deleted by the message broker. If we use Apache Kafka, then the default retention period is equal to 7 days. In the case of AWS Kinesis Data Streams, it’s only 24 hours. Unless we previously anticipated such a scenario, now our hands are tied. There’s no clean and safe way out. So, how can we resolve this new problem?

  • Of course, we can look back to the API composition, but it goes hand in hand with another assumption, that service publishing events of type SomethingHappened expose read API for the entity we are interested in. It’s not guaranteed though.
  • One last hope we might have now is that domain events were archived in the persistent storage and are accessible for services like ours. If not, then we are lost and will be forced to implement dirty solutions.

Building and backfilling — a lot of format changes happened

Let’s say that both for building and backfilling our read model, all domain events are here, ready to be consumed. But, we are late to the party and, in the meantime, the domain event producer introduced lots of breaking changes in the event’s schema.

From the consumer perspective, there are 2 problems now.

  • If we want to save information transferred by the x field, then we need to somehow find this piece of data for older events. That’s sort of the same problem as described above.
  • We will need to implement a custom deserializer (or many of them) to make sure, that we can deserialize all the messages in the stream. That’s not only additional work we’ll need to do but also a candidate for a challenging and dirty task, as events were produced with incompatible schemas. It’s also worth noticing that even though we’ve implemented a custom deserializer, it’s not a one-shot activity. There’ll be a need to maintain it most probably extend because producers will be changing the schemas they produce messages with.

Building and backfilling — event ordering

Let’s consider the case when you need to use multiple domain events to build your read model.

  • CompanyCreated
{
"id": "68d2a8d8-eea1-44ea-bbd3-1533f223b0f4",
"taxOfficeId": "de6a9b5a-e74c-4145-bfd0-71e3e6ee7689",
"name": "Easy Invoicing",
"size": 100
}
  • TaxOfficeClerkChanged
{
"id": "de6a9b5a-e74c-4145-bfd0-71e3e6ee7689",
"name": "John Doe"
}

Our invoicing system requires us to display the company dashboard consisting of the company’s name, its size, and the name of the tax office clerk we report to.

{
"companyId": "68d2a8d8-eea1-44ea-bbd3-1533f223b0f4",
"taxOfficeId": "de6a9b5a-e74c-4145-bfd0-71e3e6ee7689",
"companyName": "Easy Invoicing",
"companySize": 100,
"taxOfficeClerkName": "John Doe"
}

In such a case, we need to make sure we first consumed the CompanyCreated event until we are ready to consume the corresponding TaxOfficeClerkChanged event. We can solve this problem by saving all TaxOfficeClerkChanged events in a separate temporary database and then searching for the name of a clerk there when we receive the CompanyCreated event. However, we need to notice, that this way we are significantly increasing the data duplication, possibly creating new problems, that we’ll need to address in the future. For example, if the TaxOfficeClerkChanged event contains personally identifiable information and we saved it, then we’ll need to remember that this is another place to consider when talking about GDPR compliance.

Domain knowledge leaking

Besides purely technical issues described earlier, there’s also one, that won’t kick us immediately. Let’s consider the following scenario:

  1. Producer emits event of type InvoicePaid. Inside the event, there’s a piece of information about the amount of money paid.
  2. We listen to this event and based on the amount of money paid, we deduct some money from our customer’s account in the invoicing system.
  3. On the producer’s side, the change happens. The product would like to introduce a completely new payment way and provide the possibility to do it in installments. After all, the invoice is paid and the InvoicePaid event should be triggered. There’s no need to care who paid (bank or customer) if an invoice is paid. The difference is that now, we shouldn’t deduct money from the user’s account, it should rather be subtracted monthly.
  4. We didn’t know about such a change, so there were no changes on our side. The effect is that on the release day when users try out the new feature, they see the new contract with the bank, but also the full amount of money deducted from their accounts. That’s a serious issue, we wouldn’t like our customers to doubt for even a moment if we’re the right people to manage their money.

The example above has proven, that even if we overcame all the previously described issues, there still might be one waiting around the corner. There’s however significant difference between them. During the feature testing, we have a high chance to catch the previous ones, because something won’t be working fine. In case of a domain knowledge leaking, we won’t catch it however hard we try, because of a very simple fact — at the moment of testing, everything’s working fine…

Alternative

As proven, building a read model based on domain events might be a long, error-prone way. Although it doesn’t mean, there is no way to build them safely. Instead of using domain events, we can utilize entity events. Along with other interesting concepts, they are described in the article introducing Data Mesh Architecture. In very short words, entity events bring a complete snapshot of the changed entity. In a mature system, every change might trigger both domain events and entity events. The first of them will be used to orchestrate the business process. The second of them will be used for analytical purposes as well as for building read models in the different parts of the organization.

Conclusion

In the final words of this article, I’d like to say, that it’s of course not guaranteed to meet all the problems described above when you just start building your domain event-based read model. Some of them might appear, and some of them might not. However, the points I raised are not imaginary. They happened in the real world and the key point of this article is to make you aware of them. Thank you for reading to the very end and happy experimenting!

Read more about the technologies we use or take an inside look at our organisation & processes. Interested in working at The Stepstone Group? Check out our careers page.

--

--