Introduction to CQRS

The essential concepts that every developer should know

Amanda Bennett
Microservice Geeks

--

A Simple Definition

The Command and Query Responsibility Segregation (CQRS) pattern separates read and write operations for a data store. Reads and writes may take entirely different paths through the application and may be applied to different data stores. CQRS relies on asynchronous replication to progressively apply writes to the read view, so that changes to the application state instigated by the writer are eventually observed by the reader.

A Brief History

For many decades, systems were built around the notion of a monolithic database, or several very large databases, in the best case. In those days, the load profile was assumed to be a constant, with some growth dialled in, and systems were typically designed with ample headroom in mind. Database Administrators talked big and walked tall. Growth, when it happened, was summarily dealt with through the addition of resources — faster processors, more memory, speedier disks, wider I/O buses, expanded network bandwidth, etc. These systems weren’t designed to scale in quite the same way that we are accustomed to seeing now — vertical scalability was the norm. And in all fairness, Moore’s law made this design philosophy tractable.

With the dawn of the age of information, data-centric systems evolved to accommodate elastic load profiles and operational uncertainty (read: proverbial organic matter hitting the fan). Throwing more hardware at the problem was no longer considered a sustainable design philosophy. Moreover, it was ridiculed.

Old traditions fell to new thinking. Microservices and Event-Driven Architecture (EDA) started playing a key role in partitioning the problem space and decoupling systems—leading to improved operational agility, resilience and scalability. A relatively new kid on the block — CQRS — played well with both of these paradigms, providing a more concrete pattern for partitioning the problem space — in the read vs write plane.

How CQRS Works

Consider, for a moment, a run-of-the-mill rideshare service. Traditionally, the client apps would query the main database for drivers and their locations. At the same time, drivers would send commands to the main database to update their locations in real-time. The database queries would crosscheck for driver and user locations and respond to the client apps and drivers accordingly. These sorts of queries can put an enormous strain on the database. Worse, they must execute in near real-time; otherwise, the user experience degrades noticeably.

In a traditional, N-Tier application model, queries and commands are handled in the same vein. By and large, both reads and writes traverse the same service logic (via controllers, models, etc.) and end up poking the same database, as illustrated below.

Traditional (N-Tier) application model

This makes the database a giant I/O bottleneck. This is particularly problematic when dealing with relational databases, as the latter are not generally known for their phenomenal scalability. Credit where it’s due: relational databases are excellent for maintaining data integrity, isolating transactions and applying updates atomically, but they are inherently limited in other ways. They are not “growth friendly”, to put it mildly.

CQRS can help alleviate the stress of querying an operational data store, by enforcing the segregation of queries and commands. Instead of queries and commands operating on the same set of tables, the data retrieval and manipulation routines are split, taking divergent paths through the application. This all but eliminates read-write contention, taking enormous stress off the database.

The following summarises how segregation is achieved:

  • For each update to the write-side database, the writer publishes an equivalent command onto an event streaming platform like Apache Kafka or Amazon Kinesis, or even a traditional message queue such as RabbitMQ or ActiveMQ.
  • The resulting event stream may optionally be conditioned with stream processing frameworks like Apache Flink or Apache Spark, transforming the data into a format that is optimised for subsequent querying.
  • Progressively building a persistent view (the read projection), where data is queried from and displayed to the end-users.

The segregation model is illustrated below. The write path is highlighted in red. There is an asynchronous aspect to processing writes — shown in orange. The read path is marked in green.

An application model utilising command-query segregation

The simple above example uses multiple databases to separate the read and write persistence concerns; however, we could have just as easily used a single database with multiple tables, albeit to a less effect. The use of separate databases is more common; it aids scalability and allows us to pick the optimum persistence technology for the types of data queries and manipulations that we envisage.

CQRS can be used effectively in any architecture that relies on fast-moving data with vast query volumes, be it user-generated data—transactions, social media feeds, clickstream, or machine-generated data — metrics, logs, and so forth. CQRS allows a system to better evolve over time and prevents update commands from causing merge conflicts.

When To Use CQRS?

  1. Scenarios where the volume of read operations is significantly higher than write operations.
  2. Where one team of developers can focus on the write side and another team can focus on the read side. There is nothing limiting CQRS to a single read-side: multiple independent read projections may coexist, and therefore, multiple teams may be assigned to the task.
  3. Cases where the access patterns for writing vary significantly from those for reading. For example, transactional writes vs reporting and analytics on the read side.

Benefits of CQRS

  1. Scalability. CQRS lets us scale reads independently of writes.
  2. Security. The segregation principle can be applied to information security as well. There is no need for a reader to mutate the read-side state; therefore, the security permissions can be tightened accordingly. This helps enforce the Principle of Least Privilege (PoLP).
  3. Availability. If the write side goes down, the system will be unable to perform updates; however, users will still be able to see their data. On the other hand, if the read side goes belly-up, the system can fall back to querying the write-side database. (Although this fallback is rarely implemented in practice.)

Challenges of CQRS

  1. Complexity. There are more moving parts with CQRS. We have a write side, a read side, and typically an event broker or message queue in the middle. We also tend to adopt multiple persistence stacks: often a relational database on the write-side is complemented by a NoSQL product on the read-side.
  2. Consistency. Due to the asynchrony of CQRS, reads inherently lag behind writes — meaning that the read-your-own-writes consistency is forfeited.

Conclusion

CQRS partitions the internals of your application into a read and a write side, allowing you to design and optimise the two paths independently. Implementing CQRS in your application can maximise its performance, scalability and security, with some trade-offs in data consistency and architectural complexity.

Was this article useful to you? We’d love to hear your feedback! Hit us up on Twitter and tune in to this blog for more great news, insights and resources from the exciting world of microservices.

--

--

Amanda Bennett
Microservice Geeks

The world’s second-most boring Software Engineer who lives in a shoe with her two dogs.