SQL vs NoSQL: An experienced Software Engineer’s perspective

Skrew Everything
From The Scratch
Published in
11 min readApr 13, 2020
Ok, I’m not that old 😅

There are 1000s of articles on “The difference between SQL vs NoSQL” and we don’t need more of those articles but still many people keep writing the same points in every articles. As a result, many beginners can recite the pros and cons like a parrot but tend to struggle while deciding when to use and which one to use.

The problem is not with the beginners but with those redundant articles which contribute no value.

The struggle to decide is inevitable even if you read 100s of articles more because everything is just a copy and paste of previous articles.

I’m not gonna repeat all those redundant points here. Instead, let’s look at the “SQL vs NoSQL” in a different perspective — A Senior Software Engineer’s perspective.

In this article, I’m gonna talk specifically about MySQL vs. MongoDB as they are the most popular ones.

So, When you should use MySQL?

MySQL is not just a database but a relational database. If you want to enforce some relations between the data, then you need a relational database i.e., MySQL.

Seems little vague right? Lets see what you can actually achieve from this relational databases that everyone keeps talking.

Consider a scenario where you have built a webapp where a User can comment on Posts.

This is the minimum schema for that table:

Visual Representation of the schema

So, what type of constraints we can think of?🤔

  1. userName should be unique for every user.
  2. Only users present in the Users Table can write a post
  3. Only users present in the Users Table can comment under the posts present in the Posts Table.

Can you guess a general flow for every table for inserting new rows?

For Users Table:

For Posts Table:

For Comments Table:

So, this is how the naïve approach is when there are no relationships in the MySQL.

We need to check all the constraints in our application server. And this is what you need to do in MongoDB and in MySQL(if you don’t create any constraints or relationships).

This is too much code to write, makes too many requests to database and the transfer of data between database server and application server. Also, there is a possibility of introducing bugs in the code and becomes hard to maintain the code if the complexity of the constraints and relationship increases.

Let’s see how to simplify this process and delegate this whole checking to the database server.

Just introduce some relationships into the schema and you are good to go

The above relationships states that:

  1. userName is unique(defined in the schema but you can’t see here visually)
  2. A user in the Users Table can post any number of Posts
  3. A user in the Users Table can Comment any number of times on any number of posts present in the Posts Table.

So, if a user is not present in the Users Table tries to post a Post, then MySQL automatically throws an Error and no insertion takes place and same thing happens with the Comments Table if either user or post is not found in their respective tables.

Now, instead of checking for whether an insertion can be done explicitly in the application server by you, MySQL takes care of that for you. So, you can just execute only one INSERT command instead of executing 2–3 commands for insertion operation.

Here is the schema used to define tables and relationships and constraints:

Consider another scenario where if the user deletes their account, we have to delete all their posts and comments from all the tables.

In MongoDB, you have to write a command to delete all the posts and comments with the userID, who deleted their account. If you make a mistake, there is a probability to delete the posts and comments with userID as NOT it’s owner. You might think “this is a simple operation and how can I screw up?”. This is just a simple example. In large production systems, you have 1000s of tables and you have to explicitly mention every one of them to delete it correctly. If you miss a single table/collection, then you start to have orphan records, which leads to data integrity problems.

In MySQL, while creating foreign keys, you can provide extra information on what to do if the parent record is edited/deleted. It is called “CASCADING

You can read more about here👇🏻👇🏻👇🏻

Now, you can delegate this responsibility to the server, like what to do when you update and what to do when you delete the parent record. This helps to reduce the bugs or more accurately prevents us from introducing unnecessary bugs. This will in turn helps us to maintain data integrity.

A new thing to remember for you,

MySQL Database actually holds business logic in them

If anyone says “Databases are not supposed to have business logic”, they are wrong or they don’t understand the concept of relational databases. So, next time, hit them with this explanation.

So, When you should use MongoDB?

There are use cases for MongoDB where it excels and where we don’t need the overhead(we will talk about this below) of a SQL.

MongoDB can be used for storing Sessions data/notification systems etc…

In MongoDB, there are no relationships, no normalization, no schema.

So, to store the data of a Session, which doesn’t require any relationships, doesn’t require normalization and schema can change based on what type of data you want to store and there is no need to parse the retrieved data(in JavaScript) as the data returned from the MongoDB is JSON. The same concept applies for creating a notification system where a document is a userID and fields are numbered sequentially for each notification or can be saved in an array. You can just query the MongoDB for new notifications using userID and display them directly, without any processing.

Consider a scenario where you are storing the Session data in the database and like every other session, you need to delete those records after some time for the expiration of the Session.

Can you guess a general flow for deleting the existing expired records?

You need to write a background program or write a function which needs to run for every “n” seconds.

But in MongoDB, you can delegate this to the Database Server. The only work you have to do is to specify the expiry date and MongoDB takes care of it automatically.

You can read more about this functionality here 👇🏻👇🏻👇🏻

Consider another scenario where you have to store logs for the last 7 days and keep deleting the old data and if the logs exceed storage limits, the logs need to overwrite the old data to store new data.

Can you guess a general flow for this?

This is how you need to do if you are using MySQL. But if you are using MongoDB, you can use the combination of “Capped Collections” and “TTL” to do all this for you without writing a single line of code.

Capped Collections:

Capped collections are fixed-size collections that support high-throughput operations that insert and retrieve documents based on insertion order. Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.

Insertion Order:

Capped collections guarantee preservation of the insertion order. As a result, queries do not need an index to return documents in insertion order. Without this indexing overhead, capped collections can support higher insertion throughput.

Automatic Removal of Oldest Documents:

To make room for new documents, capped collections automatically remove the oldest documents in the collection without requiring scripts or explicit remove operations.

Consider the following potential use cases for capped collections:

Store log information generated by high-volume systems. Inserting documents in a capped collection without an index is close to the speed of writing log information directly to a file system. Furthermore, the built-in first-in-first-out property maintains the order of events, while managing storage use.

Cache small amounts of data in a capped collections. Since caches are read rather than write heavy, you would either need to ensure that this collection always remains in the working set (i.e. in RAM) or accept some write penalty for the required index or indexes.

Let’s talk about the overheads that everyone keeps mentioning without any explanation

MySQL has more overhead than MongoDB. If you still haven’t figured that for yourself by now, read the article again.

Most of the Software Engineers whom I interviewed always say one thing about why MongoDB is faster than MySQL

If you are a developer, then most probably you know that the most of the CPU time is wasted on waiting for I/O. So, most of them I interviewed gave me the same answer: “MongoDB performs faster I/O and that’s why it is faster than MySQL

The thing most of them don’t realise is, both of them are open-source projects (GPL) and if MongoDB is implementing a new faster strategy for I/O, then nothing is stopping MySQL team from pulling that code and pasting it in MySQL.

So, why insertions are slower in MySQL than in MongoDB?

Why MySQL is slower than MongoDB is because: MySQL has to check for a lot of rules before inserting, deleting and updating the data. This slows down the MySQL a lot for insertions. Look at the above flowcharts for yourself. For MongoDB, we are doing all the checks explicitly and then inserting into the database. Most of the benchmarks present on the internet doesn’t log the time during the checks we were performing. They only log the time needed for insertions.

But in MySQL, the database server performs all the checks implicitly based on our constraints and foreign keys before inserting into the database. And if you haven’t guessed it, benchmarks measure all these timings.

In MongoDB, an insertion command → Inserts the data

In MySQL, an insertion command → Check all the rules and insert only if it satisfies all those rules.

This is why MySQL has more overhead than MongoDB and effects the insertion throughput.

If you actually log only the time taken to insert into the databases with more complex relations like in real world instead of dumping data, you see that both are not so apart in benchmarks.

So, why retrieving data is slower in MySQL than in MongoDB?

In MongoDB, all the data of an entity is stored in a single document. So, if you want a particular data about a certain entity, then MongoDB uses Indexes to find the location of the document in the filesystem and retrieves that data. The document is stored in one location, so it is easy to retrieve that in one go.

In MySQL, all the data of an entity is stored in different tables. So, if you want a particular data about a certain entity, then MySQL uses many Indexes to find the location of the record in the filesystem and retrieves that data which is scattered around the filesystem and requires to visit different parts of the filesystem.

This slows down the MySQL greatly when compared to MongoDB.

So, what’s the summary?

MySQL is not slower than MongoDB without any advantages and MongoDB is not the database to choose for every project just because you see a bunch of articles claiming MongoDB is faster than MySQL. You also need to look at why MySQL is slower than MongoDB at what costs.

Let me summarise for you for the last time:

MySQL is slower because of the normalization, foreign keys, constraints, associations(relationships between tables) etc but reduces programmer bugs to maintain data integrity perfectly and MySQL does all the heavy lifting behind the scenes. But MySQL requires heavy optimisation from your side to perform fast and efficiently.

MongoDB is faster because there is no normalization(everything is stored in one place), foreign keys, associations etc but the programmer has to take care to maintain the data integrity. And to make sure performance doesn’t degrade with large data, you have to denormalize the data and store the same data in multiple collections or documents and if you want to change the data in one location, you have to programatically update all the locations(if you miss any location by chance, then your data will be inconsistent and data integrity is 💨💨) and let’s not forget about the increase in storage size.

A suggestion in choosing your next database for your personal project

Honestly, it doesn’t matter. The traffic you are expecting is too small to notice any difference between MySQL or MongoDB.

Remember, Twitter and Facebook still uses MySQL for their parts of databases. If they can use it, then you can use it without worrying about the performance in the initial stages of your projects.

So, don’t stress on which one to use. The more important thing is your IDEA. Focus more on that than on choosing types of servers, frontend, backend and database.

Just use whatever you know in your project. If you need an improvement to meet new traffic, then you will be obviously in a position to hire someone to do it from the revenue you get from the traffic.

So, don’t stress. Use whatever you know and improve it with time when the need comes.

Liked it? Then 👏🏻👏🏻👏🏻 it and share it!

--

--

Skrew Everything
From The Scratch

A wannabe artist 👨‍🎨, but can’t draw 😫. A wannabe athlete 🏃‍♂️,but can’t run 🥵.Found my peace with coding 👨‍💻 and writing ✍️. Twitter.com/SkrewEverything