NoSQL databases — An Introduction

Animesh Gaitonde
Analytics Vidhya
Published in
7 min readSep 21, 2019
NoSQL Databases

Origin of the term NoSQL

Internet businesses, Enterprises used traditional Relational databases during the dot com era. In the mid-2000s, with the proliferation of the internet, companies like Amazon & Google saw surges in traffic and data. Relational Databases such as MySql, Postgres, Oracle, etc couldn’t scale well. Amazon came up SimpleDB and Google introduced BigTable to overcome RDBMS’s limitations. The entry of these two non-relational databases sparked interests in the Tech community. In 2009, Johan Oskarsson had organised a meetup to discuss distributed non-relational databases. To popularise this meetup, he used a hashtag #NoSQL on twitter and this gave birth to NoSQL databases. In this article, let’s take a tour of limitations of RDBMS, the working, capabilities and misconceptions of NoSQL DBS.

Scaling a Relational Database

Let’s imagine we start an internet business in the mid-2000s. Our business picks up and we witness exponential growth in the website traffic. Our customers complain about slow loading of web pages. What do we do next? We ask our smart DBAs to optimise database queries & use indexes to improve website performance. Few months down the line, we start receiving complaints again. Vertical scaling now comes to our rescue. Investing a few dollars more on buying bigger servers does solve our problem.

Vertical Scaling

There are real limits about how far we can go with vertical scaling of our databases. What if our website wants to enter a new business and store videos, images, chats and all forms of other data?

Horizontal Scaling

The answer is simple. Since a single machine can store a limited amount of data we have to resort to horizontal scaling. We buy 100s of big servers and distribute the data and traffic over these machines.

SQL databases are not designed for horizontal scaling. Joining dataset & data aggregation from many machines introduces complexity in our design.

Traditional Relational databases such as MySQL, PostgreSQL, etc support A.C.I.D (Atomicity, Consistency, Isolation & Durability ) transactions. No-SQL databases are B.A.S.E (Basically Available Eventually consistent) compliant. Let’s take a tour of the features of NoSQL databases.

Features of NoSQL databases

1 — Schema-less

Relational Databases have a rigid schema. Users have to go through many iterations to model data. Altering the data type of an attribute becomes a nightmare for developers, leads, and DBAs.

Modify database schema

NoSQL overcomes this limitation by providing a flexible schema. These databases abstract out the data storage and internal working from the users. They provide support to store user-defined data structures. For eg: data can be stored in the form of a JSON object. Users have the flexibility to add, replace or remove attributes from the data.

SQL and NoSQL are analogous to statically typed and dynamically typed programming languages. SQL databases are like C, C++ where you define the data first and later store values. NoSQL offers python like capabilities where you assign any value to a variable, & it works.

SQL is analogous to C, C++, Java
NoSQL is analogous to Python

Being Schema agnostic, NoSQL databases are also termed as schema-on-read databases. You only need to know how the data is stored while reading the data.

Flexible schema shortens the development time. You no longer have to go through many iterations of data modelling & design. Developers can store & retrieve whatever they want. The only downside of the schema-less design is that it increases the risk as there is a lack of control. It’s only a threat if a developer modifies a production system bypassing the development process.

SQL databases support null values for columns. For eg:- A bank application webpage has many optional fields like street name, nickname, etc. If the users don’t populate optional fields, the database will still reserve space for these columns in case the users update them in future. In the NoSQL database, you don’t pass the null entries and storage is hence optimised.

Schema-less doesn’t imply any random garbage can be stored in the database. For example, if a database column supports JSON data type, the JSON must be well formatted. The application will get an error if it tries to store a malformed JSON object.

2 — NonRelational

Relational databases organise data into rows and columns. You can store data in many tables and the tables can have different relationships. To fetch the data, you can join the tables on the value of an attribute. The application performance degrades when the number of tables to be joined goes in double digits or higher. There is a significant drop in speed in case the application joins tables stored on different database servers.

NoSQL databases are denormalised. There is no concept of the relationship between records in NoSQL databases. This means instead of you only store the aggregate data in a single table instead of scattering it across different tables.

Following are the major advantages of the above approach:-

  • Query speed- Speed increase significantly as only a lookup on key attribute is needed and there is no need to join many tables
  • Storage & Retrieval- Simply save and get a single record

For instance, when you are designing a food delivery app using RDBMS, you’ll create multiple tables- one for users, restaurant, orders. In a NoSQL database, a single orders table can have a restaurant, user data duplicated across many rows. The downside of data duplication is overcome by the above-mentioned benefits.

Complex ER diagram

You can avoid creating complex ER diagrams and writing complicated SQL queries. With NoSQL databases, you can speed up your development and focus on getting things done.

3 — High Scalability & Availability

When Google published its paper of BigTable, it defined BigTable as “a distributed storage system for managing structured data that is designed to scale to a very large size”. NoSQL database can store petabytes of data on many computers.

With the exponential increase in the amount of unstructured data, it became difficult to store data on a single machine. Relational Databases need specialised hardware to serve load without compromising performance. Hence, to scale it became essential to design a system which would store data on a cluster of computers and efficiently retrieve the same.

NoSQL databases use commodity servers which are cheaper than High performing servers. As the data storage requirement increases, more commodity servers can be added. NoSQL databases distribute the data evenly across a cluster of servers using Consistent Hashing algorithm.

Consistent Hashing

NoSQL databases can replicate the data on many machines. Data can be still accessible if any of the servers dies or crashes. Thus, NoSQL databases are highly available.

Data Replication

4 — Open Source

Open source development makes NoSQL software unique. Few open-source vendors release an open‐source product and also sell enterprise add‐on features. These companies have a RedHat-like business model.

Following is a list of few open-source NoSQL databases:-

  • MongoDB
  • Cassandra
  • Redis
  • Voldemort
  • HyperTable
  • Neo4j

Common Misconceptions of NoSQL databases

1 — NoSQL is a single type of database

NoSQL databases are classified on the type of data and their internal working. Following are the different types of NoSQL databases:-

  • Key-Value- These databases work like HashMap and can store any kind of value. Few examples are Redis, Voldemort and Aerospike
  • Wide columns store- The names and format of columns can vary across rows. Cassandra, BigTable, and Hypertable are wide columns store
  • Document store- Databases such as CouchDB, MongoDB and DocumentDB are capable of storing data in the form of JSON, XML documents
  • Graph databases- Databases like Neo4j internally model entities are graph nodes and relationships between entities are indicated by edges between the nodes

2 — Risk of data loss using NoSQL

Since NoSQL databases compromise consistency over availability, there may be instances where every read doesn’t follow the most recent write. However, these databases are eventually consistent and hence guarantee data durability.

3 — NoSQL is just a buzz word

Amazon, Google, Microsoft, IBM and Oracle and many other large corporations have built NoSQL databases and are leveraging their capabilities in production systems. Large software companies only invest in technologies if they see a profit, hence NoSQL is no longer a hype.

4 — Enhanced RDBMS will replace NoSQL

Highly distributed features of NoSQL are being integrated with RDBMS technology, which has resulted in the emergence of many NewSQL databases. NewSQL databases have overcome most of the criticism related to RDBMS technology. However, NoSQL databases are built for solving different data problems using different data structures.

References

--

--

Animesh Gaitonde
Analytics Vidhya

SDE-3/Tech Lead @ Amazon| ex-Airbnb | ex-Microsoft. Writes about Distributed Systems, Programming Languages & Tech Interviews