SQL vs NoSQL Databases — Are They Really That Different?

Balaji Jayasankar
The Startup
Published in
7 min readAug 2, 2020
Photo by Taylor Vick on Unsplash

What are Data Models?

Data Models are an abstract concept. They provide a logical structure to how data is stored on a database.

There are a number of Data Models, but we will mainly be focusing on two kinds of Data Models in this article — Relational and Document Data Models.

In the Relational Data Model, data is stored as a collection of entities — usually as rows in a table. Each entity has a well-defined collection of attributes, analogous to columns in a table. There exist well-defined relations between entities in different tables. These relations serve as a link between different sets of entities(tables) and allow us to access data in different ways while maintaining the underlying structure of the database, i.e storing data in tables.

In the Document Data Model, data is simply stored in the form of key-value pairs. The key is generally in the form of a string and is used to uniquely identify an entity or data object being stored. The data itself, while it can be in any format, is generally stored in the form of JSON or XML.

There is one key difference between the two types of Data Models that we can see here. Unlike in the Relational Data Model, there are no well-defined attributes or relations for an entity in the Document Data Model. We will explore the implications of this in the further sections.

What is a Database?

In the Computer Science domain, a database is a structured collection of data, typically stored on a single machine, or a cluster. There are different types of databases, each offering its own unique set of functionalities. Generally, all databases are expected to provide one key function — the ability to store and retrieve data.

Every database adheres to a Data Model. Databases following the Relational Model are referred to as Relational or SQL Databases. Databases following the Document Model are referred to as Document, Non-Relational or NoSQL Databases.

The problem with the term “NoSQL”

Referring to databases not following the Relational Model as NoSQL is a bit misleading. Many modern NoSQL Databases come with an in-built SQL-like querying functionality.

There is an interesting story behind why the term NoSQL came into existence. Up to early-mid 2000s databases following the Relational Model, commonly called SQL databases, were the norm. The term NoSQL was intended as a Twitter hashtag used for a meetup on open source, distributed, non-relational databases in 2009. However, the term gained popularity in the developer community, and people started referring to non-relational databases as NoSQL.

The term since then has been redefined as “Not Only SQL”.

Comparing SQL and NoSQL Databases

It is worth noting that NoSQL databases can follow data models apart from the Document Model, for example — Graph databases. However, in this comparison, we will mainly consider the Document Data Model, as it is the most popular one.

A crucial and often overlooked point is that the choice of Data Model is just one part of the overall design of any database. There is a wide range of features and functionalities provided by different databases, some of which are linked to the underlying data model, and some are not. For the sake of comparison, we will be looking at five features which are frequently discussed when talking about SQL vs NoSQL databases — Schema, Querying, Replication, Partitioning and Transaction Isolation.

Schema

As discussed earlier, SQL Databases have a well-defined schema, i.e a collection of attributes and relations between the different types of entities. NoSQL Databases are sometimes referred to as schema-less.

Saying that NoSQL Databases have no schema is not entirely correct as well. Data in NoSQL is generally stored in JSON or XML format and is usually parsed into a fixed schema when read by an application. Therefore, rather than saying schema-less, we can refer to NoSQL databases as having schema-on-read.

Conversely, modern SQL Databases can provide us with schema-flexibility similar to that of NoSQL. We have the ALTER TABLE SQL command to easily change the schema if required. Newer versions of SQL databases support the JSON attribute type, which lets us store unstructured data in a relational table.

Querying

Data from SQL Databases is primarily retrieved via SQL Queries. SQL stands for Structured Query Language and is a standard language for accessing and manipulating data in a relational database.

In most NoSQL databases, primarily those following Document Model, the typical way to retrieve data is through the associated Primary Key or Document ID. However, many modern NoSQL databases support SQL-like querying.

NoSQL Databases typically use Secondary Indexes to query unstructured data. We won’t go into the specifics of Secondary Indexes but in a NoSQL context, parts of the document that are to be queried can be stored in a Secondary Index, on which the query is run. This gives us SQL-like query performance on unstructured data, provided the right Secondary Indexes are used.

The querying and indexing features vary in different NoSQL databases, and every database might not provide us with all the functionalities. However, it is safe to say that it is possible to achieve our querying requirements in most cases using NoSQL Databases.

Replication and Partitioning

Replication and Partitioning are essential features of Distributed Data Systems. Distributed Data Systems is a complex topic, which I am planning to cover in another series of posts. However, for the sake of comparison, I will briefly define these terms.

Partitioning is the concept of splitting up a database into multiple components called partitions. These partitions can be then distributed among several machines. Partitioning allows us to have high throughput and lets us handle databases that are too large to fit into a single hard drive.

Replication is the concept of storing multiple copies of the same partition across different machines. This gives the database durability and availability in case of failure in a single machine.

Most NoSQL databases were designed with the concepts of Distributed Data Systems in mind, and therefore have these features by default, whereas most SQL databases were designed to be run on a single machine.

That being said, there is nothing stopping an SQL database from having Replication and Partitioning as features. In fact, Teradata, a Relational Database built in the 1970s, has partitioning as one of its key features. Many relational databases today have the features of Distributed Data Systems like replication and partitioning.

Transaction Isolation

Transaction Isolation is also a complex topic which I will not be able to fully cover in this post. However, I will try to provide enough context so that we can meaningfully do our comparison.

A Database Transaction is a single unit of logic or work done on a database. It can be as simple as a read or a write, or a complex series of steps involving multiple reads, writes and queries.

Transaction Isolation is a concern which comes into the picture when multiple transactions are running on a database at the same time. The problem arises due to multi-threaded execution and is made worse by Replication and Partitioning.

The problem in this scenario is — How can we execute multiple transactions parallelly on a database such that each database transaction produces a result that is logically sound. Databases attempt to solve this problem by providing Transaction Isolation Guarantees. These range from weak guarantees like Read Committed, where only individual reads and writes are covered, to Serializability, which guarantees that every transaction is executed such that it effectively looks like they took place one after the other, with a set of other levels of Transaction Isolation in between.

SQL databases are typically associated with strong Transaction Isolation, whereas NoSQL is associated with weak Transaction Isolation.

The main point to note is that the transactional isolation guarantee provided by any database is completely independent of the underlying Data Model. That is, both SQL and NoSQL databases can theoretically have any level of Transaction Isolation Guarantee.

I am planning to cover all of this in more detail in another series of posts.

Are SQL and NoSQL databases different?

The answer is a bit ambiguous. It is a fact that SQL and NoSQL databases are fundamentally different in their schema. SQL imposes a rigid schema, whereas NoSQL does not have this constraint. However, after comparing the various features of the databases, we see that in practice, they both have a lot in common.

If one has to give an answer, we can say that yes, SQL and NoSQL databases are different, but a lot of the same things can be achieved using either of the two.

How do I decide which database is best suited to my application?

From what we have learned, it is evident that we should not blindly consider SQL vs NoSQL as a deciding factor while choosing a database.

There are a lot of other factors to consider, some of which are:

  • Cost
  • Performance
  • Scalability
  • Durability
  • Functionality
  • Transaction Isolation Guarantees

We should focus on whether the database is capable of serving our needs, and the choice of SQL vs NoSQL is secondary.

Conclusion

I hope that reading this post has given you better clarity on how to compare the different types of databases, and has busted any misconceptions you might have had when comparing SQL to NoSQL.

As a side note, a part of what inspired me to write this post was reading Designing Data-Intensive Applications by Martin Kleppmann. Reading the book has greatly improved my knowledge of how Data Systems work, and I am sure that it is a worthwhile read for anybody interested in building software. Link to the book down below.

--

--

Balaji Jayasankar
The Startup

Software Development Engineer. Building data intensive applications for UK’s largest retailer.