Making Sense of NoSQL from a Relational Database Perspective

Published in

Geek Culture

6 min readSep 22, 2021

I know. Everyone would have been familiar with NoSQL by now. However, I recently had to illustrate to a group of engineers on how a NoSQL database differs from a relational database and why it is preferred in certain situations.

While many developers have already experienced NoSQL implementations, there are still a good number who are experts in relational databases but are only just getting exposure in non-relational concepts.

I decided to include some of those illustration here in hopes that it would help anyone coming from a relational database to understand the underlying data structure and main intentions of NoSQL databases.

Introduction

In the early days of the internet, storage space was at a premium and internet population was only a fraction of what it is today. Many early software was also developed by corporations for internal use where there was little need for a high performance, horizontally scalable data store. A lot of focus was on data consistency and storage optimization, which led to database normalization being an important consideration in data storage design.

Much has changed on the internet since then. Storage is a lot cheaper than it was and being able to serve high traffic with immediate response and near zero downtime is now an expectation on any internet facing websites. This change drove the gradual adoption of non-relational databases.

NoSQL databases are designed for speed and availability by simplifying how an application retrieves data and enabling that data to be replicated across multiple machines. With that said, it also comes with its own drawbacks. Many NoSQL databases sacrifice ACID transactions (Atomicity, Consistency, Isolation, Durability) which are common in relational databases. They may also require additional work to fit to new application requirements.

When developing a new solution, consider the purpose and use cases when deciding on which type of database to use. Many architectures integrate both SQL and NoSQL in their solution to get the best of both worlds.

Relational Database Data Structure

In order to understand how a NoSQL database work, let’s first do a quick comparison of how a relation and non-relational database stores data. Consider the example of a typical blog or article website like Medium. There are two types of information here: articles and the author who wrote the article.

In a relational database, we would generally have two tables: Articles and Authors:

When a reader views an article on the website, the database would join these two tables and return the resulting view to the calling application which is usually an API. Since JavaScript Object Notation (JSON) is the de facto standard for representing structured data on the web today, the resulting data would be parsed into JSON and sent to the frontend website which displays the article and author to the reader.

Data flow of an article website using relational database

While this is an over simplified example, consider a real-life scenario where there are more than two tables: posts, author, tags, category, related articles, comments, etc. Coupled with hundreds of thousands of records, the resource cost of lookup and joining these tables to serve readers would be expensive especially on a high traffic website.

NoSQL Data Structure

Now let’s imagine if we were to simplify the above data flow by removing table joins and parsing to JSON format. In this case the API would directly read a readily available JSON and respond it to the frontend website. This makes things a lot faster and is the basis of what NoSQL database is built upon. With NoSQL the flow of data will now be like this:

Data flow of an article website using NoSQL database

The same relational tables in our example would look like the following when presented in a NoSQL database. Note that records in relational database are called documents in a NoSQL database.

“Records” in a NoSQL database are known as documents and is usually in JSON format.

Performance and Reliability

When we combine the article and authors as a single document, we eliminate the joins for these two entities. Each document is now a standalone entity, which makes it easy to place them across different servers because there is no relationship between documents. We are able to achieve both horizontal scalability and high availability when our data is spread across multiple machines.

Documents are spread across multiple machines in a NoSQL database

Horizontal Scalability

Documents are spread out across multiple servers so no single machine would run out of storage space or computing resource. This allow for infinite data growth while accommodating high rate of requests from your website.

High Availability

The same set of documents can be duplicated and on different servers to provide an alternative backup in case there is outage in one of the server.

Drawbacks of NoSQL

Application Specific Data

In many cases you would need to design the structure of your documents according to what your application require. While this optimizes storage and performance, it also makes it less flexible than its relational database counterpart. Building a new application would usually mean needing a different set of documents specific to that application.

Maintenance

Consider our article website example with article and author information in our JSON documents. Now imagine there is a need to add a new field such as author biography in our documents. All of our previously indexed documents do not contain this field, so it means we have to re-index all of them to include the authors’ bio. This however can be avoiding if the information is completely new where previous documents do not need to include it.

Storage Size

You may have noticed that our author information is duplicated across all documents, because many articles can have the same author. This would mean more using more storage space, and is also something to keep in mind when designing your architecture especially if your database is hosted on a Cloud service which charges by storage usage.

Learning Curve

Unlike a relational database, there are several types of non-relational databases including key-value and document store. Some concepts like the inverted index in Elasticsearch is also important to learn when implementing as a solution. While having different choices is a good thing, it also bring about a steeper learning curve.

Summary

NoSQL databases are designed for performance, scalability and availability by simplifying data retrieval and enabling its data to be replicated across multiple locations.

The storage mechanism is vastly different from a relational database. Records are known as documents in a NoSQL database, and are generally stored in JSON format which your application can readily use without additional joins and parsing.

With that said, it also comes with its own drawbacks. Trade offs usually have to be made on flexibility and storage space. When developing a new solution, consider the purpose and use cases when deciding on which type of database to use. Many architecture patterns combine both SQL and NoSQL in their solution.