On Serverless and Data Rigidity
(This blog started as a tweet thread, and became a blog. It started out as looking at data and why NoSQL is a better fit for serverless applications than RDBMS)
One of the elements I constantly get asked about is why I am so against RDBMS with serverless. I’ve tried to explain the idea of flows of data (see Serverless Best Practices and On Data Lakes and Data Flows), and it doesn’t quite hit the mark it seems
The interesting thing though is where I get the most agreement from: experienced CTOs of sizable companies and those that have built highly scalable systems.
If you’ve ever had to scale a system that is tightly coupled to an RDBMS, you will know the pain of it. Eventually you have to decouple, make judicious use of caches and various other things. This is because code is easy to change and data schemas in an RDBMS are not.
Yes I know there are various ways of updating an RDBMS (migrations), but over time this becomes more complex and clumsy. If you ever need to do a major remodel of the schema, it can seriously impact your application through downtime and major code changes.
I have started to call this property of how complex it is to change the data schema of an application “Data Rigidity”.
A Data Rigidity of 0 = no rigidity, or perfectly flexible application
A Data Rigidity of 1 = a totally rigid and inflexible application.
Why not Data Flexibility? Well, it’s relatively easy to spot when something has become inflexible. It’s relatively difficult to identify when something is flexible. We’re looking at trying to identify and remove rigidity.
The issue of data rigidity arises from not assessing the impact of data change over the lifetime of an application — and make no mistake, the majority of applications I’ve ever worked with increase their data rigidity over time.
What about code though? If you couple the code to your data structures this can cause real problems.
A lot of people have used ORM (Object-Relational Mapping) + RDBMS (and still do) and that tightly couples code and RDBMS. I think this is great for prototyping, but leaves you with both data rigidity and code inflexibility.
Often the ORM + RDBMS approach will go along with a monolithic approach to an application (even behind a microservice/REST API). This tightly couples code to data and leads to [code + data] rigidity.
Unfortunately, if you tightly couple your code to your data, and your data is highly rigid you end up with a major problem. Now, as with a lot of things, this may not be immediately obvious, but time is key.
So, what about NoSQL? How does that help here? Well, if you simply replace the RDBMS with a relational model in a NoSQL DB (of which there are many sorts) then you are misusing your tech. If your aim is to reduce your data rigidity to as close to zero as possible then simply replacing one tech with another without changing your approach is probably the worst way of attempting to do this.
Each NoSQL tech is there for a specific set of use cases. So how does NoSQL decrease data rigidity? Well, one size does not fit all, so just saying “use X technology” is unhelpful. If someone tells you that you should use MongoDB for everything (for example), then that’s likely not helpful.
Data rigidity is still linked to code though. Data that simply sits in a database, is doing exactly what it’s supposed to do and cannot be described as rigid, so there is a link between the two.
So, what is the link between code and data? Initially the code and data are approximately in sync. But code changes over time and business priority changes over time tend towards code updates. Unless it’s a full rewrite, then data rigidity changes.
If you utilise an RDBMS your data rigidity is almost certainly high — although not always so. If you use a NoSQL solution, and you (say) use a lot of related tables and a monolithic codebase, you probably have a similar data rigidity…
If you separate out into microservices, or even smaller serverless functions, then you will likely decouple your data from your codebase in much bigger ways, so long as your data is separated across those elements.
Here is a completely unscientific set of graphs that illustrate what I’m trying to say (yes, they are the same graph with a different axis… that’s the point):
This decoupling of your codebase from your data is what decreases your data rigidity. If your data is across multiple services/functions then there is a data coupling, and you will see greater rigidity.
Why does this matter? Because data rigidity and code rigidity matter for feature velocity.
Spending time making your codebase more flexible to accommodate a rigid data structure takes time. It decreases feature velocity and increases the opportunity for bugs to be created.
The fact is that this rigidity only becomes a problem over time. All the solutions start of being relatively flexible, and then get more rigid. This is the element that a lot of developers fail to take into account, mainly because they don’t need to during the build of a project.
So, why do I dislike RDBMS with serverless solutions? Yes you need to know how they scale, but it’s also the inherent data rigidity of the technology. It makes it harder to develop over time.
NoSQL solutions are built for specific use cases. Understand them and use them. There’s a reason why Amazon built DynamoDB. Look it up, read about it, understand it, and then deploy it in relevant ways for your application.
Know your technologies, and build your applications using those technologies in the right way. When you build with low data rigidity, you create the conditions for higher feature velocities with more flexible applications.
Use RDBMS for relational data, but recognise that to retain application flexibility, other technologies are probably more appropriate. RDBMS just aren’t a requirement any more.
So don’t create a data schema prior to writing code as many still do. That way lies data rigidity.
If you do use RDBMS (and you can!), at least have a very good and robust migration process if you do and decouple your code by avoiding ORMs.