SQL or NoSQL — Which one to choose ?

Vivek Singh

Published in

Nerd For Tech

8 min readOct 4, 2021

TOPICS

What is SQL and NoSQL ?
What is ACID and CAP ?
Transactions
Schema
Normalization
Scaling
Queries/Data Retrieval
Archiving Data
Is SQL slow ?
Drawbacks of SQL and NoSQL
When to use SQL? When to use NoSQL ? — Answers are based on all topics mentioned above.

SQL and NoSQL

SQL Databases are relational databases. NoSQL is a non-relational or distributed database.

SQL requires you to define and deploy your schema before you start working on implementing business use cases. NoSQL is flexible and dynamic because data can be created in a NoSQL database without having a proper schema first.

ACID and CAP

CAP — consistency, availability, partition tolerance

This term used with distributed systems, also named as Brewer’s theorem, states that it is impossible for a distributed data store to provide more that two of the three features simultaneously.

Consistency

Every read receives the most recent write or an error. If there are multiple parallel writes and reads in the system, every read will always return the correct write or the last write done on the system.

Availability

Availability in a distributed system ensures that the system remains operational 100% of the time. Every request gets a response(non error) regardless of the individual state of a node. This does not guarantee that the response contains the most recent write(Consistency).

Partition Tolerance

To continue operating even when a node fails to respond. The node which failed will be backed-up by a secondary node, so although some messages may fail while the primary node is down, and the secondary node replaces it; As a system it will be fault tolerant.

MongoDB is a CP data store. It provides Consistency, and partition.

It resolves network partitions by maintaining consistency, while compromising on availability. MongoDB is a single-master system. Mongo can have only one primary node that receives all the write operations. When the primary node becomes unavailable, the secondary node will be elected as the new primary node. As clients can’t make any write requests during this interval of role transfer of nodes, the data remains consistent across the entire network.

Cassandra is an AP database — it delivers availability and partition tolerance but can’t deliver consistency all the time.

Hence it proves that only two features can be guaranteed out of the three in CAP.

ACID — Atomicity, consistency, isolation, and durability.

All changes to data are performed as if they are a single operation.
A transaction is a single logical unit of work which accesses and modifies the contents of a database. Transactions access data using read and write operations.
ACID properties are followed to provide consistency in data before and after a transaction.

Below image has been taken from GeeksforGeeks :

Transactions :

SQL databases are more suited to handle transactions as they provide ACID properties. We can still use NoSQL, however we will have to write additional logic in our own application to manage it.

NoSQL databases like Neo4j, MarkLogic provide ACID properties. However it is still not as robust as SQL Databases.

Generally speaking, NoSQL solutions have lighter weight transactional semantics than relational databases, but still have facilities for atomic operations at some level.

As you go through the article, you will understand the reason why transactions are easy in SQL in comparison to NoSQL. SQL was designed with the transaction feature in mind. SQL is robust with transactions, because all the related data is stored in the same node/machine. While with NoSQL we can have multiple nodes handling the related data(horizontal scaling).

For example all the related data for a table will be in a single machine for SQL, however for NoSQL we can use horizontal partitioning and hence related data is in multiple nodes. So transactions are difficult in NoSQL.

Most of the NoSQL tools relax consistency criteria of the operations to get fault-tolerance and availability for scaling, which makes implementing ACID transactions very hard.

NoSQL databases have eventual consistency (“eventually all accesses to an item will return the last updated value”) . But eventual consistency is hardly acceptable in transactional applications like banking.
The Banking and Financial sector prefer SQL because of Data Integrity and complex querying available.

Partitioning is possible with recent updates of MySQL like distributions, however makes the system complex and harder to manage.

Schema:

SQL follows schema first approach, and all the columns have a specific type like string, timestamp, integer defined in dbs.

NoSQL is schema less. Hence when you insert any data in a document, it could be either a string, an integer, a timestamp, anything literally.

[{“amount”: 9}, {“amount”:”hello”}, {“amout”: 282827199273}]

If you make any error while working with datasets in NoSQL, it becomes trickier and may result in error, if you’re not careful. Errors in data modeling or data inconsistency can cost you a lot.
Every production application has models and validation checks before inserting any data in dbs.
Express.js enforces schema on top of your NoSQL application. Hence there is very less margin of error.

However do you think the schema pattern in SQL takes care of the validation of data at BD level as well, and returns you a proper error while querying ? I think so.

Normalization

This is a DB technique which eliminates data redundancy and Insert, Update, Delete anomalies. There are 5 rules for this — 1NF, 2NF, 3NF, 4NF, 5NF

RDBMSs/SQL have inherent support for normalization and allow data to be managed as a separate concern, whereas NoSQL DBMS do not inherently support normalization, and require the client applications to carefully maintain integrity.

Scaling

Dividing the data across multiple nodes is called scaling in terms of db.

Horizontal scaling means that you scale by adding more machines into your pool of resources while vertical scaling means you add more computing power(CPU or RAM) to an existing machine.

Horizontal scaling in SQL is not impossible, but it takes more effort because of the transaction management being a primary feature in SQL databases.
Transaction or Consistency is not a key feature of NoSQL, hence horizontal scaling is easy using Consistent Hashing.

SQL uses vertical scaling, NoSQL uses horizontal scaling.

Queries/Data retrieval :

Use case : Get all orders with user details .

Operation in SQL

Two tables in SQL — users, orders

Query — select userdetails, order from user, orders where user.id = orders.user_id

Single query fired ? Yes.

Same operation in NoSQL:

Get the orders based on userid

Get user details based on userid

Two queries fired ? Yes

It’s easier to manage and use SQL when we have related data distributed across tables. While working with related data where we need multiple joins, we can go ahead with SQL because we can write complex queries in SQL and the computation of the complex queries are faster in SQL. NoSQL is faster for simple queries like get , insert, filtering, ordering. It becomes hard to query and manage joins in NoSQL if we have related datasets stored in multiple collections.

FYI — From Mongo 3.2 we have $lookup to perform operations between two collections.

Archiving data

Archiving means one can store and access huge volumes of historical data for storage and analytical purpose.
Most of the time we use NoSQL for archiving data, as we know that we do not need the data to be normalized or ACID compliant.

Choosing a DB for analyzing depends on a lot of factors, for example the type of data one is analyzing, how much data one has and how quickly you need it. For example, relational DB is best suited for applications like user behavior analysis. If the data fits into a spreadsheet, then it is better suited for a SQL-type database such as Postgres, BigQuery; as relational databases are good at analyzing data in rows and columns. For semi-structured data like social media, texts or geographical data which requires large amounts of text mining or image processing, NoSQL type databases such as mongoDB, CouchDB works best.

Is SQL slow ?

Did you hear somewhere that NoSQL is faster ? If yes, no that is not the case.

The only reason people have this general misconception that SQL is slower is because they do not try to understand the reason. SQL schemas can run complex queries which can use many tables as part of the join. While with NoSQL we do not have the provision to run complex queries. SQL would perform the same way as NoSQL if you use simple queries over a specific table. Hence the reason people say NoSQL is faster is only because of complx queries which can run on a SQL server unlike NoSQL.

The best approach to understand which one is faster will completely depend on your business use case. What is the CRUD operation that you would want to perform ? What is the vendor or implementation you would choose when selecting the type of schema you want ? There will be a lot of questions like ‘Speed for what?’ What about type of applications in relation to more availability, clusters, partitions, archiving, requirement criticality?

NoSQL applications can perform all data access and modification using primary key-based operations in order to optimize for a NoSQL K/V store.

Advantages of SQL databases are that they are typically more performant when dealing with more complex queries.

Drawbacks :

SQL

SQL cannot be scaled horizontally.
SQL isn’t fault tolerant.
Schema is defined. DB Modification requires effort.
Not advised for application which need Petabytes of data to be stored.

NoSQL

Do not provide Transaction management or ACID support.
Schema less. Hence no validation at DB level.
Transactions are difficult to manage.
Can’t run complex join queries.
Cheaper

When to use what ? -SQL or NoSQL ?

Use SQL if your DB data can be fit into a single machine, without the requirement of scaling up.
Data Integrity is a key feature and you want your business to be ACID compliant. Use SQL based databases.
Use SQL if you have complex joins and related data. Complex queries are easier to write in SQL and computation is fast as well.
NoSQL seems to work better on both unstructured and unrelated data. If you’re not sure about schema, and you know that there could be multiple changes required in near future. Use NoSQL
Use NoSQL if you know you need to store huge data and you know that a single node will not be able to sustain the capacity.
Use NoSQL if most of the queries you have in your systems are filtering, conditioning, grouping or ordering; meaning when you have simple queries.
Use NoSQL if you have frequent modifications to DB and are not sure about schema design
Use SQL for transaction management use cases.
Use NoSQL for graph like data and real time applications like IOT, Gaming

Note : The better solutions are the crossover databases that have elements of both NoSQL and SQL. I have worked on multiple applications which had both a NoSQL and SQL flavor to handle the business requirements.

Conclusion

Please let me know in comments if you think any point needs to be seen from a different perspective. All the follows and claps are highly appreciated. Cheers!