Learn to use a NoSQL database, but not like an RDBMS

Kirk Kirkconnell
3 min readJul 2, 2019

--

It keeps happening. I keep reading posts or talking to people having problems with NoSQL databases, and so many times, they blame the tool. NoSQL databases may no longer be the new kid on the block, but many people continue to misunderstand when, why, and how to use them.

Let’s concentrate on the primary problem, data modeling with NoSQL databases, especially for a document, wide column, and key-value NoSQL databases. Some people still try and use them like they do an RDBMS, but perhaps worse. They create a schema in the NoSQL database like they would a relational database schema and then perform what I call a “naive migration.” They then use the database as a dumping ground for data, hoping to make sense of it with a query language to work through the data. Those actions can work ok with some NoSQL databases, but they do not benefit from using a NoSQL database. Inevitably they wonder why the NoSQL database does not perform/scale well or gets too expensive. They complain, then perhaps return to using relational databases because “NoSQL didn’t work for me.”

If you are doing these naive actions, you are most likely failing to understand why these NoSQL databases even exist, much less what they do best, the trade-offs you can make with them, the power they bring to the party, and more than likely, how best to model data for easy access. You’re using a NoSQL database like you do a relational database. Please stop.

NoSQL databases perform and scale best when your schema is designed to model the application’s access patterns, the frequency those patterns are called, and the velocity they are accessed. The goal should be for the answers to these access patterns to be precomputed, and asking questions of the data (querying) is rare. This is true whether it is key-value access, wide column, or JSON documents. Can you ask questions in NoSQL databases? Sure, but that is not where most of them shine. You take a hit on performance, scalability, cost, or a mix of those. The more you try to use them as a general-purpose database, the more you get into the “jack of all trades, master of none” arena that RDBMS has unfortunately been shoehorned into. For best performance, scalability, and cost, asking questions of your data should be the minority of the requests in OLTP-type NoSQL databases.

I propose a seemingly simple task: the next time you think about creating a new application with a NoSQL database or migrating from an RDBMS to a NoSQL database, first document ALL of that workload’s access patterns. What exact data is needed, when, and at what velocity? This should guide you as you create a schema. One that is perhaps more complex on the surface, but the application can assemble the ID/Primary Key/Partition Key or whatever, so it is not asking questions but just getting the data efficiently. Once you have that, figure out what questions you need to satisfy with a query. You need the right balance of quick and cheap data access for most things and then queries only when needed.

--

--