Relational vs NoSQL Database

Chao Zhang
SFU Professional Computer Science
11 min readFeb 13, 2021

A comprehensive comparison to help you to choose the right database for your application

Authors: Chao Zhang, Rong Li, Zhicheng Xu

This blog is written and maintained by students in the Professional Master’s Program in the School of Computing Science at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/pmp}.

Before computers came into our lives, everything was recorded on paper. We are flooded with all kinds of information and thanks to computers, we can interact with them. A database management system (DBMS) is a piece of software that manages and centralized the data. There are many software solutions for DBMS but they primarily serve the same purpose, which is providing data to applications. This article will first explore the history of database development, then discuss the difference between the relational database and NoSQL(non-SQL) in-depth, and finally, examine the future challenges of the database world.

A brief history of database development:

Back in the 1960s, the hierarchical model and the network model were born. Both models were organized into tree-like structures and access data following pointers between parent and child nodes. The explicit links present between nodes make navigating very easy and the fastest possible solution is needed at that time. As a result, these two models were used from early 1960 to 1970. The hierarchical model cannot support complex relationships between records (each record can have only one parent) while the network model allows each record to have more than one parent but it is not getting dominant over the hierarchical model mainly because the relational database came into the picture.

The relational model is a special case of the network model. It presents data in tabular form rather than tree-like structure. Furthermore, it has a schema that helps to maintain the consistency of the data and supports different types of relationships such as One-to-Many, Many-to-Many. More importantly, most relational databases use Structured Query Language (SQL) to access data. However, not highly scalable is one of the most important limitations of relational databases. With the advent of Web 2.0 (Read-write web), there is a vast amount of data flowing to the Internet every second. That’s why NoSQL comes to the attention. Since NoSQL uses different data structures, it is generally faster and easier to scale up. Nevertheless, there is no such thing which can be one-size-fits-all. In the following sections, we will dive deeper into the difference between them.

Network Model: The network model where a node can have multiple parent nodes

What is a relational database?

A relational database stores data in tables containing rows and columns, where each row represents a record and each column represents an attribute. Relationships are established through Primary and Foreign keys. A primary key is a unique identifier that can lead to a specific row of data in the table, and it can be used as a foreign key when referenced by another table. The connection between the primary key and the foreign key creates a relationship between tables. I believe the concept will be best explained by an example.

Example of relations

The image above illustrates a one-to-many relationship among three tables. ‘Id’ is the primary key for all three tables. The ratings table contains movie rating records that are then associated with the user who gives the rating. In addition, a user can leave multiple reviews and a review can only belong to one user. The same goes for movies.

Now we know the concept of a relational database, but how do we query the data? Good question! You can communicate with relational databases using SQL, which is the standard language designed for managing the data housed in a relational database. SQL allows create, retrieve, update, and delete records using a few lines of code. Again, let’s understand it by using some examples.

SELECT Users.First_Name, Users.Last_Name, Ratings.Rating, Ratings.Movie_Id
FROM Users
JOIN Ratings ON Users.Id = Ratings.User_Id

The above SQL demonstrates how all the rating records for all users would be retrieved. The resulting table would look like this.

Result table after join

A very important thing to note is that a relational database has a fixed schema, meaning that each table is predefined and can only store predefined fields. For example, our Users table can only store users’ names and email addresses. When new data is added to an existing table, it has to follow the table’s schema. Keep that in mind when we dive into the NoSQL database.

NoSQL database

A NoSQL is a database that does not use a tabular schema to store data. Instead, NoSQL databases use a storage model that is optimized for the specific requirement of the type of data being stored. Imagine you’re tasked with managing large amounts of unstructured data, such as random social media information, customer surveys, or text from streaming data. There is no schema you can predefine for data like these, and it’s very hard to store such data in an RDBMS. But you can store it with a NoSQL database.

Below is an example of a MongoDB document in which you can do some simple queries. Let’s assume this document is called “users”.

Example of MongoDB document

To find the user whose first name is Bob we can write

db.users.find({ first_name: “Bob” })

This will return all the information related to this user. To update certain fields of a user, we can write

db.users.update({first_name: “John”}, {$set: {gender: “male”}})

which will add a new field gender to the user whose first name is John. Simple right? Of course, these are just some basic operations of MongoDB, there are more complex and powerful operations you can find on their official document.

Data Storage

After a brief introduction to both data models, let’s dive deeper into the underlying difference on data storage level. In the relational data model, the DBMS stores data as relations, which are represented as tables on disk physically. These tables are stored as a collection of records, therefore the data storage of the relational data model is row-oriented. Because of this table structure and each table is restricted by a fixed schema, relational databases are only capable of storing structured data. Contrarily, NoSQL databases allow to store unstructured or semistructured data. Unstructured data typically means data formats that could not be rearranged to a column-row structure. One can think of this blog post as an example of semi-structured data: while the title, author, and the upload date are all structured data, the content is unstructured.

To handle different applications with a variety of different unstructured data formats, four types of NoSQL databases are utilized in the current market.

Four types of NoSQL databases

Key-value databases(eg: Redis, DynamoDB):

As the name suggested, these databases store data as key-value pairs. This type of database is relatively easy to understand. By referencing a key, the corresponding value can be fetched. The biggest advantage of this approach is the capability to store large data entries. It can store a large variety of different types of data formats. These databases only care about storing data while leaving the applications to figure out how to parse and interact with the fetched value.

Columnar databases(eg: HBase, Cassandra):

Data is stored as column-based on disk. Instead of forming a table like what relational databases do, columnar databases store columns separately on disk where every record in the same column is stored together in the same region. It is optimized for applications that fetch columns often. Similar to relational databases, there are primary keys, but the purpose of primary keys is mainly to keep records with the same primary key into one node.

Document databases(eg: MongoDB, CouchBase):

Document databases are used to store semistructured data into document formats such as JSON and XML. These databases have no schema which granted great flexibility for applications to manage their stored contents.

Graph databases(eg: Neo4j):

Graph databases store data and relations in a graph where entity instances are represented as nodes and relations are represented as edges. Due to the schema-less nature of graph databases, it is more flexible than a relational database in terms of adding more logic and different data points. In addition, graph databases can take advantage of some powerful graph associated algorithms such as finding the shortest path.

There are often misunderstandings of NoSQL databases not storing relationships, in fact, NoSQL databases store relations but in very different ways. Some NoSQL databases believe that if entities are related, then they should be grouped and fetched together in the first place.

Advantages and disadvantages for RDBMS and NoSQL

Now, you already have a pretty good understanding of both relational and non-relational databases. Let’s take a look at the advantages and disadvantages of both.

Advantages and Disadvantages of Relational databases
Advantages and Disadvantages of NoSQL databases

ACID vs BASE

One of the reasons why relational databases are still in the dominant position in the current market is the properties of ACID.

ACID Properties Source

It is extremely important to have ACID properties guaranteed in some applications such as banking or health-care systems. Especially in OLTP applications. OLTP stands for Online Transaction Processing. OLTP application typically has a high frequency of short queries that updates the database often and requires an immediate reflection of changes. One of the most classic OLTP applications is ATM. Imagine a customer withdraws his money, if the backend database doesn’t update this deduction immediately, then this customer could withdraw again. ACID properties are here to prevent users from withdrawing more money than they actually have. However, as the data has increased exponentially, the traditional all-in-one approach is not sufficient enough to process such large requests. Therefore, using a distributed system to host databases is required in some applications.

CAP theorem is used to measure different types of databases in a distributed scenario.

CAP Theorem graph source

Relational databases satisfy consistency and availability while losing partition tolerance. It is unrealistic to join relations that are hosted in different nodes across the network due to the extensive amount of traffic loads. Note that the term “consistency” in ACID and CAP has different meanings. “Consistency” in ACID denotes that only valid data following constraints and rules can be written to the database, while “consistency” in CAP means all the associated nodes see the same data in a distributed setting. While ACID focuses on the “CA” part of CAP, BASE is a set of properties that the majority of NoSQL databases follow, which focuses on the “AP” part of CAP. The BASE stands for basically available, soft state and eventual consistency. BASE NoSQL databases are not compromising consistency completely. The idea of eventual consistency is introduced so that every associated node would update the change, but with some delay which could cause the state of the database to change over time by itself (soft state). Such delay can be intolerable in some applications such as the baking scenario that we discussed above. Therefore, it really came down to the types of applications to decide which database to use.

Can new technologies replace the traditional RDBMS?

In the current world when the computation is extremely fast and the explosion of the internet is producing huge amounts of unstructured data. Hadoop comes into the picture. Few people argued that Hadoop is a program model rather than a database system. Hadoop is made up of components such as Hadoop Distributed File System (HDFS), Yarn, and MapReduce for parallel processing. Hadoop allows distributed storage and processing of a huge amount of data in any format i.e. Big Data. Many new technologies are emerging nowadays and we are keeping improving the database/services we are using. There are a lot of great debates among these two technologies. So let’s try to find out what Hadoop is meant for.

OLAP VS OLTP

OLAP vs OLTP:

Hadoop is good at processing data in any format (structured, semi-structured, unstructured) and on a large scale. The main feature of the RDBMS includes the ability to use tables for data storage while maintaining and enforcing certain data relationships. RDBMS works better when the data volume is low, probably gigabytes of data. Hadoop will be more suitable for answering questions in OLAP since it can aggregate large volumes of historical data. However, RDBMS may just hold the current students’ information and be able to answer questions like “How is my son doing at school?” (shown in the above figure) consistently and rapidly.

There are other differences between these two such as Hadoop is designed to be highly scalable and fault-tolerant while RDBMS is expensive to scale up but maintain ACID properties.

Hadoop is not a replacement for RDBMS and it is entirely a different perspective in the world of databases that aim for different problems. As shown in the figure, many organizations are comfortable with SQL & NoSQL database combinations. This reinforces the thesis that one size does not fit all and maybe Hadoop should run in parallel with RDBMS to better fit in your situation.

2019 Database Trends

Conclusion:

From the history of the database, we see how to go from tree-like databases to RDBMS in order to enforce the relationship between data records. With the massive user-generated content through the internet, NoSQL comes into play. We have discussed the tradeoff between them and different use cases. Additionally, many new database solutions such as NewSQL and Hadoop are emerging in the market. We should stop arguing which one is the best tool but rather treat them as complementary and use the right tool for the right job.

Reference:

Khazaei, Hamzeh & Fokaefs, Marios & Zareian, Saeed & Beigi, Nasim & Ramprasad, Brian & Shtern, Mark & Gaikwad, Purwa & Litoiu, Marin. (2015). How do I choose the right NoSQL solution? A comprehensive theoretical and experimental survey. Journal of Big Data and Information Analytics (BDIA). 2. 10.3934/bdia.2016004.

Relational Database Definition https://www.omnisci.com/technical-glossary/relational-database

RELATIONAL VS. NON-RELATIONAL DATABASES https://www.pluralsight.com/blog/software-development/relational-vs-non-relationaldatabases#:~:text=Some%20of%20the%20more%20popular,used%20for%20a%20single%20application.

Database history https://medium.com/@rpolding/databases-evolution-and-change-29b8abe9df3e

DigitalOcean. (2021, January 23). A comparison of nosql database management systems and models. Retrieved February 13, 2021, from https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database-management-systems-and-models

Graph databases for beginners: Acid vs. base explained. Retrieved February 13, 2021, from https://neo4j.com/blog/acid-vs-base-consistency-models-explained/

--

--