Are Autonomous Services and CQRS More Expensive?

John Gilbert
4 min readSep 20, 2019

--

A common misconception I encounter regarding autonomous services and the event-first approach is that it is more expensive then a more traditional microservices approach. It is easy to understand why this might seem so. After all, autonomous services each maintain their own copy of the data that they need from upstream services. This redundancy would seem more expensive than maintaining a single source of truth for each domain. But the reality is that there is much more data redundancy in the traditional relational approach then meets the eye and a lot less flexibility.

  • First, a database must maintain a transaction log of all the mutations
  • Next, the tables (that we all think of) contain the latest state of the data
  • Then, every index is another copy of the data
  • Ultimately there are materialized views, which are yet another copy of the data, plus any indexes they require
  • Storage must be allocate ahead of time with enough headroom to support growth, which results in costly under utilization and a risk of shortages
  • We also need to run multiple write replicas of the database across availability zones
  • Plus we need some number of read replicas to support the current volume of consumers, but the number of consumers will continue to grow
  • Ultimately the consumers will resort to maintaining their own in-memory cache when the latency of the centralized data proves insufficient
  • And finally, all this should be duplicated in multiple regions, which are often passive, thus adding more under utilization cost

All this traditional redundancy adds up quickly!!! It adds up even faster if every team maintains their own database cluster.

But the operational complexity also adds up quickly, which tends to drive teams towards a shared resources model. This ultimately results in a massive database cluster that is shared across microservices, each with their own schema instead of their own database. Every new microservice increases the competition for these scarce resources, driving performance down and costs up as additional capacity is added.

Worse yet, the risk of a costly catastrophic failure bringing down the entire system goes through the roof, because there are no bulkheads between the services. And ultimately the velocity of each team is greatly reduced due to the increased cross-team collaboration that is needed for every single database change. Sigh…

In contrast, the lean and transparent data redundancy of autonomous services is actually more economical than the bulky and opaque data redundancy of the traditional approach. Our objective is to turn the database inside out and ultimately turn the cloud into the database, such that we spread the load out across seemingly limitless cloud resources. Here is how this unfolds:

  • Each autonomous service produces events as their state changes
  • The event streams and data lake act as the system wide transaction log
  • Downstream autonomous services consume events of interest and store the necessary data in tables that are tuned to their specific needs
  • The tables are effectively indexes and materialized views all in one
  • Less data is stored because only the fields of interest are retained
  • Read capacity is allocated and owned by each service so there is no competition with other services for resources
  • The tables act as a cache that is always up to date with the latest events
  • The cloud-native databases are fully managed with high availability, across zones, out of the box at no additional charge or effort
  • Storage grows dynamically and you pay only for what is used, so there is no under utilization
  • TTLs can proactively expire old data
  • And multi-regional, multi-master, active-active support is often a simple configuration, so again there is no under-utilization of passive regional storage

Most importantly and the whole point of autonomous services and the event-first approach is that the risk of a system wide failure is significantly reduced, because there are no shared resources thanks to the bulkheads provided by the asynchronous event streams and these redundant data caches.

This is also referred to as lean data, because each service replicates exactly what it needs, instead of the database wide replication that is generally required by traditional approaches. Lean data also enables self-sufficient, full-stack teams to be lean and iterate rapidly, because they are not competing with other teams to deploy their database changes.

Ultimately the lean data approach of autonomous services reduces cost, reduces risk and increases flexibility.

For more thoughts on serverless and cloud-native checkout the other posts in this series and my books: Software Architecture Patterns for Serverless Systems, Cloud Native Development Patterns and Best Practices and JavaScript Cloud Native Development Cookbook.

--

--

John Gilbert

Author, CTO, Full-Stack Cloud-Native Architect, Serverless-First Advocate