Making Sense of NoSQL from a Relational Database Perspective
I know. Everyone would have been familiar with NoSQL by now. However, I recently had to illustrate to a group of engineers on how a NoSQL database differs from a relational database and why it is preferred in certain situations.
While many developers have already experienced NoSQL implementations, there are still a good number who are experts in relational databases but are only just getting exposure in non-relational concepts.
I decided to include some of those illustration here in hopes that it would help anyone coming from a relational database to understand the underlying data structure and main intentions of NoSQL databases.
Introduction
In the early days of the internet, storage space was at a premium and internet population was only a fraction of what it is today. Many early software was also developed by corporations for internal use where there was little need for a high performance, horizontally scalable data store. A lot of focus was on data consistency and storage optimization, which led to database normalization being an important consideration in data storage design.
Much has changed on the internet since then. Storage is a lot cheaper than it was and being able to serve high traffic with immediate response and near zero downtime is now an expectation on any internet facing websites. This change drove the gradual adoption of non-relational databases.
NoSQL databases are designed for speed and availability by simplifying how an application retrieves data and enabling that data to be replicated across multiple machines. With that said, it also comes with its own drawbacks. Many NoSQL databases sacrifice ACID transactions (Atomicity, Consistency, Isolation, Durability) which are common in relational databases. They may also require additional work to fit to new application requirements.
When developing a new solution, consider the purpose and use cases when deciding on which type of database to use. Many architectures integrate both SQL and NoSQL in their solution to get the best of both worlds.
Relational Database Data Structure
In order to understand how a NoSQL database work, let’s first do a quick comparison of how a relation and non-relational database stores data. Consider the example of a typical blog or article website like Medium. There are two types of information here: articles and the author who wrote the article.
In a relational database, we would generally have two tables: Articles and Authors:
When a reader views an article on the website, the database would join these two tables and return the resulting view to the calling application which is usually an API. Since JavaScript Object Notation (JSON) is the de facto standard for representing structured data on the web today, the resulting data would be parsed into JSON and sent to the frontend website which displays the article and author to the reader.
While this is an over simplified example, consider a real-life scenario where there are more than two tables: posts, author, tags, category, related articles, comments, etc. Coupled with hundreds of thousands of records, the resource cost of lookup and joining these tables to serve readers would be expensive especially on a high traffic website.
NoSQL Data Structure
Now let’s imagine if we were to simplify the above data flow by removing table joins and parsing to JSON format. In this case the API would directly read a readily available JSON and respond it to the frontend website. This makes things a lot faster and is the basis of what NoSQL database is built upon. With NoSQL the flow of data will now be like this:
The same relational tables in our example would look like the following when presented in a NoSQL database. Note that records in relational database are called documents in a NoSQL database.
Performance and Reliability
When we combine the article and authors as a single document, we eliminate the joins for these two entities. Each document is now a standalone entity, which makes it easy to place them across different servers because there is no relationship between documents. We are able to achieve both horizontal scalability and high availability when our data is spread across multiple machines.
Horizontal Scalability
Documents are spread out across multiple servers so no single machine would run out of storage space or computing resource. This allow for infinite data growth while accommodating high rate of requests from your website.
High Availability
The same set of documents can be duplicated and on different servers to provide an alternative backup in case there is outage in one of the server.
Drawbacks of NoSQL
Application Specific Data
In many cases you would need to design the structure of your documents according to what your application require. While this optimizes storage and performance, it also makes it less flexible than its relational database counterpart. Building a new application would usually mean needing a different set of documents specific to that application.
Maintenance
Consider our article website example with article and author information in our JSON documents. Now imagine there is a need to add a new field such as author biography in our documents. All of our previously indexed documents do not contain this field, so it means we have to re-index all of them to include the authors’ bio. This however can be avoiding if the information is completely new where previous documents do not need to include it.
Storage Size
You may have noticed that our author information is duplicated across all documents, because many articles can have the same author. This would mean more using more storage space, and is also something to keep in mind when designing your architecture especially if your database is hosted on a Cloud service which charges by storage usage.
Learning Curve
Unlike a relational database, there are several types of non-relational databases including key-value and document store. Some concepts like the inverted index in Elasticsearch is also important to learn when implementing as a solution. While having different choices is a good thing, it also bring about a steeper learning curve.
Summary
NoSQL databases are designed for performance, scalability and availability by simplifying data retrieval and enabling its data to be replicated across multiple locations.
The storage mechanism is vastly different from a relational database. Records are known as documents in a NoSQL database, and are generally stored in JSON format which your application can readily use without additional joins and parsing.
With that said, it also comes with its own drawbacks. Trade offs usually have to be made on flexibility and storage space. When developing a new solution, consider the purpose and use cases when deciding on which type of database to use. Many architecture patterns combine both SQL and NoSQL in their solution.