Why Your NoSQL Database Should Be Multimodal

Couchbase

Published in

Couchbase

4 min readMar 25, 2020

Why Your NoSQL Database Should Be Multimodal

by Andrew C. Oliver

IDC recently published its list of the biggest innovators in multimodal databases. However, what the heck is a multimodal database? First off, I hate the term multimodal, I prefer “general use” or “evolved.” In essence, a multimodal database is what you always assumed a database was.

In a literal sense, multimodal means more than one mode, so a key-value store and a document database together for instance. There are a lot of places where you just need lookups. In the RDBMS these were “primary key lookups.” These are often done in memory (aka a cache). However, most modern applications rely on JSON as a data structure and it is rare that you don’t need to query the underlying fields, so the document database is still king. There are other combinations, but you at least need these modes in most organizations.

For most of computing history, databases were assumed to be multimodal or — to be more precise — general use. When I started my career in the mid-90s, at Ericsson Cellular, we used Oracle (and later a combination of Oracle and SQL Server) as an operational database. We also used Oracle as a data warehouse. Specialty databases like Teradata were out there, but we didn’t need them for the amount of data we were processing at the time.

The growth of the web and so-called digital transformation changed that, we’ve now got more data than we can ever do anything meaningful with and the number of users interacting with that data has massively increased. Additionally, the types of data we’re capturing and holding have expanded.

With the NoSQL revolution, we started paying more attention to other database types such as graph databases, key-value store database, and document databases. These stopped being mere theory and were put into production.

At first, the purveyors of these new databases began on the attack. “You don’t need joins” and “SQL considered harmful.” The promise was scale. And having watched the Slashdot effect and massive e-commerce outages every holiday, it was clear that sticking a webserver and an application server in front of a relational database wasn’t really working on the modern Internet. Moreover, Object-Relational Mapping tools weren’t keeping up with the complexity of data and table structures didn’t easily capture the modern structure of changing data without a lot of painful transformation.

It turns out that NoSQL databases were just immature. MongoDB and Couchbase both added ACID transaction support. Couchbase among others added SQL support. Couchbase even added an MPP (data warehouses like Teradata have MPP support). Essentially most of the “you don’t need” or “considered harmful” were just vendor marketing or enthusiastic developers trying to talk people out of their objections to immature technology.

So we started using NoSQL databases. It turns out there are many kinds of NoSQL databases. The simplest type is a Key-Value store. Key-value stores scale so well because they don’t do anything. Okay, that’s an exaggeration, but they merely associate a key with a value. It is basically a hashmap. Some really are just distributed hashmaps. And it turns out that data is still more complicated so we started using Document and Graph database — among others.

Ultimately, most of your applications don’t need massive scalability, however, they do need reliability, disaster recovery, and flexibility. Few applications have tens of thousands of users and terabytes of data, but many change frequently. The old relational databases have become less general use not because they changed, but because we did. Namely — the types of applications we develop, the speed at which we develop them, how often we change them, how and when we use them, and even how often we use them — have all changed.

Some of our applications do have higher scalability requirements, if not today then soon. However, having a different database for each application or worse parts of each application is a lot to manage and a lot of diverse skillsets to build or hire. Instead, we need a database that can handle all of our needs, we need a “multimodal database.”

A general use or multimodal database should handle some basic common requirements like queries, joins, fast reads/writes, transactions, SQL, and at least limited analytics queries. A modern database must handle JSON structures to avoid doing a bunch of ETL in our operational system.

A general use or multimodal database also should handle full-text search. Even back in the day, we had a lot of text matching queries that didn’t justify a full dedicated search engine (or the ETL process required to enable one). However, the old SQL “like” and wildcards don’t do a good job and are really inefficient.

You may not need SQL but your analysts do. Nearly every large company has an army of “analysts” who were teethed on Excel and speak SQL as a first language. These people must be fed. Either your database needs to do it or you get to write an ETL process to get it into one that does.

So yeah, you don’t really want to have to deploy an RDBMS, a document database, a key-value store, and a search engine — you really want to deploy a multimodal database.

Andrew C. Oliver learned to code when he was 8. He founded the Apache POI project and served on the board of the Open Source Initiative.

Written by Couchbase