The Argument Against Schemaless

The lure of schemaless. Every developer feels it at one time or another. Annoyed by dealing with migrations or some other restriction imposed by an evil RDBMs many developers, myself included, have gone looking for a solution. Schemaless looks like the solution to all the RDBMs woes. A program can put whatever data into it (JSON) and then get it back out. What could possibly go wrong? Turns out a lot.

Schema-less Saves Development Time

At first it feels like a schemaless database saves development time. It does in fact save time in getting off the ground, and has a great use case for prototyping. The problems start to show themselves when doing a full development cycle where software ends up released to production. Schemaless has traded up front time for testing time later. Without any promises made from the database about what data may come out, the client now has to be tolerant of data it does not expect, and more testing must be done. How else would the developer know if an unexpected piece of data got into the database? What about a column name change?

Adds Flexibility

Another argument is that schemaless adds flexibility. Since there is no such thing as schemaless, this flexibility is an illusion. Either an explicit or implicit schema exists, but a schema does exist. Maybe I am old fashioned, but I prefer my schema live in a system that has been purpose built to do just that — manage schemas. Which leads me to my final point.

Data Outlives Programs

Data is often the one constant across many programs. Version 10 of a program will typically still use data from version 1. Other programs will be built that share the same database. If the schema is outside the database all other programs must discover and replicate the same rules that the original program implemented. I have seen an argument that a web service could be built and put in front of the schemaless database to enforce the rules, but isn’t that just an RDBMs, albeit one likely poorly implemented?

Schemaless may[1] allow prototypes to built more quickly, but I think RDBMs should still be the goto for most projects. The best thing about schemaless is that there should be plenty of clean up work for me to do in the future.

[1] There are so many tools to make dealing with RDBMs schemas easy and automatic, schemaless may not even excel at prototyping.