CQRS: Data Sync Using Pub/Sub

Prabhakaryedavally
AirAsia MOVE Tech Blog
5 min readNov 2, 2022

by Prabhakaryedavally

What is CQRS?

The Command and Query Responsibility Segregation (CQRS) pattern separates reads and writes into different models, using commands to update data, and queries to read data.

CQRS depends on asynchronous replication to incrementally apply writes to the read view so that changes to the application state initiated by the writer are eventually observed by the reader.

  • Commands should be task-based, rather than data-centric. (“Book a flight ticket”, not “set OrderStatus to BookingConfirmed”).
  • Commands may be placed on a queue for asynchronous processing, rather than being processed synchronously.
  • Queries read the data from the read-side database.

Why Do We Need CQRS?

CQRS provides separation of concern and allows us to have simpler models focused on different use cases. In a complex application, this can avoid having a single model that becomes unmanageable due to complicated validation logic for writes and many data transfer objects (DTO) for reads.

These models can be optimized for reads or writes and map closely to DTO. The separation of concerns also ensures that another service that interacts with the query side cannot update data and only has read access.

The other reason CQRS is often used is scalability. When we have two microservices, they can be scaled independently to provide high performance. This is especially relevant when there is a large inconsistency between reads and writes.

When To Use CQRS?

  1. Scenarios where the volume of reading operations is significantly higher than write operations.
  2. Where one service can focus on the write side and another service can focus on the read side. There is nothing limiting CQRS to a single read side, multiple independent read projections may coexist depending on the use case.
  3. Cases where the access patterns for writing vary significantly from those for reading. For example, transactional writes vs reporting and analytics on the read side.

Use-case

When a user of the AirAsia super App accesses the My Bookings page, AirAsia client apps would query the main database for fetching users’ booking information (Upcoming and Past orders). At the same time, LOBs (Line Of Business) would send commands to the main database to update the user’s order booking status.

The database queries would crosscheck for final order booking status and respond to the client apps and LOB’s accordingly. These sorts of queries can put a huge load on the database. Worse, they must execute in near real-time, otherwise, the user experience degrades noticeably.

The following summarises how segregation is achieved:

  • For each update to the write-side database, the writer publishes an equivalent command onto the GCP Pub/Sub queue
  • Pub/Sub internally pushes order update events to the Unified Order History Service
  • Unified Order History Service validates and transforms the data into a format that is optimized for subsequent querying and writes to the read-side database
  • Progressively building a persistent view (the read projection), where data is queried from and displayed to the end-users.

The segregation model is shown below. The write path is highlighted in red. There is an asynchronous aspect to processing writes — shown in orange. The read path is marked in blue.

An application model utilizing command-query segregation

The simple example above uses multiple databases to separate the read and write persistence concerns, however, we could have just as easily used a single database with multiple tables, though to a lesser effect.

The use of separate databases is more common, it supports scalability and allows us to pick the optimum persistence technology for the types of data queries and manipulations that we predict.

CQRS can be used effectively in any architecture that relies on fast-moving data with vast query volumes, be it user-generated data — transactions, social media feeds, tweets streaming, etc. CQRS allows a system to better evolve over time and prevents update commands from causing merge conflicts.

Advantages of CQRS

  1. Scalability. CQRS lets us scale reads independently of writes.
  2. Security. The segregation principle can be applied to information security as well. There is no need for a reader to mutate the read-side state, therefore, the security permissions can be tightened accordingly.
  3. Availability. If the write side goes down, the system will be unable to perform updates, however, users will still be able to see their data. On the other hand, if the read side goes belly-up, the system can fall back to querying the write-side database.
  4. Independent scaling. CQRS allows the read and write workloads to scale independently, and may result in fewer lock contentions.
  5. Optimized data schemas. The read side can use a schema that is optimized for queries, while the write side uses a schema that is optimized for updates.
  6. Security. It’s easier to ensure that only the right domain entities are performing writes on the data.
  7. Separation of concerns. Segregating the read and write sides can result in models that are more maintainable and flexible. Most of the complex business logic goes into the write model. The read model can be relatively simple.
  8. Simpler queries. By storing a materialized view in the read database, the application can avoid complex joins when querying.

Pros:

  • Separating write activity from read activities allows you to use the best database technology for the task at hand, for example, a SQL database for writing and a non-SQL database for reading.
  • Read activity tends to be more frequent than writing, thus you can reduce response latency by placing read data sources in strategic regions for better performance.
  • Separating write from reading activities allows the use of the best database technology (a Cloud SQL database for writing and a non-SQL database for reading).
  • Provides more efficient scaling of storage capacity based on real-world usage.

Cons:

  • Complexity. There are more moving parts with CQRS. We have a write-side, a read-side, and typically an event broker or message queue in the middle. We also tend to adopt multiple persistence stacks: often a relational database on the write side is complemented by a NoSQL product on the read side.
  • Consistency. Due to the asynchrony of CQRS, reads inherently lag behind writes — meaning that the read-your-own-writes consistency is lost.
  • Using the CQRS patterns means that more database technologies are required hence there is an inherent cost either in terms of hardware or if a cloud provider is used, utilization expense.
  • Using a large number of databases means more points of failure, thus companies need to have comprehensive monitoring and fail-safety mechanisms in place to provide adequate operations.
  • Requires expertise in a variety of database technologies.
  • Higher cost in terms of hardware or if a cloud provider is used.

Conclusion

CQRS divides your application into a read and a write side, allowing you to design and optimize the two paths independently. Implementing CQRS in your application can maximize its performance, scalability, and security, with some trade-offs in data consistency.

--

--