Great article with a lot of good tips for people getting their hands sticky.
It’d be good if you could write an article about how the nature of your functions and ops changes as you begin to add functionality across the same slice of your entity data model.
My experience showed that having your four CRUD functions pointing at the same DynamoDB table was great until you had more complex query requirements such as searching. DynamoDB works great for some query types but not for others. So we would then have to not only store data in DynamoDB (source of truth) but also build a search index (eg. Algolia). This meant we had the appropriate data store for search type reads and also still had the CRUD capable DynamoDB database for the rest of the operations including some more simple ‘Read’ types.
In addition to that we would also send the same data out to Redshift for analytics purposes because those types of relational read requirements were just not possible within the runtime data storage model.
This could be done using DynamoDB streams which gave you two good characteristics:
- The data goes into the DynamoDB table immediately so you know it’s consistent in your source of truth database.
- The update of your search index happens out-of-band and doesn’t slow your initial request down.
The cons of that approach:
- Things are happening out-of-band so you lose your strong consistency on secondary data sources.
- You need to be careful with limits with DynamoDB streams
It’s possible to outgrow the service limits of DynamoDB streams (which we did) so you need to then move to more expensive services such as Amazon Kinesis. Those types of services tend to introduce more hands-on scaling management (serverless?)
This is just one aspect to consider with CQRS. The issue really became an issue of keeping the data consistent and handling all the error cases as well as the operations complexity of managing all these data stores (schema migrations etc.) alongside making sure that you are updating all the different versions of your data at the right time and monitoring that it’s indeed happening (which it doesn’t always).
In my opinion this is where Serverless really gets tricky for the masses. Serverless is deceptively easy to get into when you start because evangelists appear to solve the ‘tricky’ issues for you in blog posts (environments, secrets, dynamodb instead of an RDBMS) but they rarely dig deeper into the issues that arise once you actually have real business problems to solve.
TL;DR: It’s easy to use serverless to build a simple signup form, but it gets tricky when you have to manage a complex highly-scalable application. The operational complexity grows exponentially.