Everything you should know about NoSQL database — System Design

Sourav Bhattacharjee
The Startup
Published in
5 min readMay 13, 2020

It is hard to choose between relational (RDBS) and non-relational database (NoSQL) while designing a system. A fair understanding of the limitations of these two will make it easier to make a decision.

SQL vs NoSQL

Before digging deep into the NoSQL database, it’s important to know the limits of the relational database. The relational database has been around for over the last 4 decades and they work well. The data is well structured, and records are kept into tables. Tables consist of rows, primary keys, unique keys and tables can join between each other. Another important feature it supports is transactions and its properties are termed as ACID. The ACID property consists of 4 different attributes as described below:

  • Atomicity: It guarantees that if a multi-step transaction happens, it needs to complete all the steps for a transaction to be successful.
  • Consistency: It ensures that the database should be consistent before and after any transaction.
  • Isolation: It guarantees that there can be multiple transactions occurring concurrently without any interference with each other.
  • Durability: It ensures that the data is stored persistently and no need to worry about the data even in case of server crash or power failure.

But, with the advancement of Big data technologies, the traditional SQL-based database was less equipped to manage rapidly expanding data volumes and the growing complexities of data structures. Some of the disadvantages of relational databases are as follows:

  1. Schema: The schema of the relational database is fixed, and you have to design it before-hand. For rapidly growing applications it’s tough to assume the complexity of the data and then design it.
  2. Data Structure: As the data structures supported by the relational databases are limited, we need to do a lot more join to get the desired data and joins are always costly.
  3. Scaling: It is difficult to scale the relational database as the data grows very rapidly.

Do we have any workaround for this?

Fortunately, yes! we have.

Denormalization is one of them. Expand a single table and add more columns to it, so that join can be avoided while fetching results. This improves the read performance but introduces data anomalies.

Another technique is Sharding. In this technique, the database can be divided and pieces of it can be stored in different servers. This improves both read and write performance, but it is very difficult to manage.

So, to conclude. relational databases have many advantages and some disadvantages as well which we can work around but poses some other challenges too.

Whereas NoSQL databases naturally allow denormalization of data, scalability and flexible schema. Are these things coming for free? No! Nothing comes for free in this world. It’s relaxing on the ACID constraint. Though ACID is an important constraint for some applications but not for all. Last but not the least NoSQL databases provides a new way of querying large and more complex data structures which is not possible in relational databases. That is one of the key factors to use NoSQL for data science.

Scaling techniques

Advantages of NoSQL Databases:

  • Flexible Schema: It typically provides a very flexible schema. Very easily the schema can be changed, based on requirements.
  • Horizontal scaling: It allows to add cheaper, commodity server whenever required. Whereas SQL databases needed scale-up vertically whenever exceeds the capacity (Migrate to a larger server).
  • Faster Queries: One key principle of NoSQL databases is “Data that is accessed together should be stored together”. So, queries typically work without join which makes the queries faster.

Some of the Examples of NoSQL database are Redis, Dynamo, CouchDB, MongoDB, Cassandra, HBase and Neo4J. It’s not a surprise that many people faced many different problems with relational databases, and they ended up designing different type of NoSQL databases.

Type of databases
  1. Key-Value Stores: It is like a dictionary where you know a key and you can retrieve its value. It is one of the simplest types of NoSQL database. This kind of databases is really useful for caching. Example: Redis and Dynamo.
  2. Document Databases: Instead of row and columns, data is stored in a document. Documents are grouped to form a collection. One of the advantages of this kind of databases is that each document can have a different structure. This kind of databases is really useful for content management and to store user profiles. Example: CouchDB and MongoDB.
  3. Wide Column Databases: The idea of a column and a table in a wide column database is different from what it is in a relational database. For example, in a wide column database, the data is denormalized, columns are not at all fixed, they can change anytime. As an example, we can add columns on the fly in our application and it’s even the case that rows in the same table can have different columns. And like document databases, the values here can be complex structures such as arrays and lists. This kind of databases is really useful for time series data, logging and other write-heavy applications. Example: Cassandra and HBase.
  4. Graph Databases: Applications where relations are better represented in a graph, these databases are very useful there. These types of databases are mostly used for social networks, knowledge graphs etc. Example: Neo4J.

Relational and NoSQL both databases are great in terms of data management. There is no straightforward answer which one is better. It all depends on the requirement which helps people to choose based on above-mentioned parameters and their trade-offs.

--

--

Sourav Bhattacharjee
The Startup

Software Engineer at Microsoft | IIT Kharagpur | Passionate about putting technologies together to design a system.