Why nonrelational databases

Raffaello Ippolito
4 min readJul 3, 2023

--

When talking about databases we always think of the classic tables full of data. They are convenient, compact, and above all, they are tools we are already used to working with, even for those who do not work in IT. However, there are still other ways to store data, and this is where the split between relational and nonrelational databases occurs.

We refer to a database as a relational database if it follows a model consisting of one or more tables, with each table representing an entity, each column an attribute of that entity, and then each row corresponding to an occurrence. Finally, the various entities can be linked together by relationships. This model is called indeed the relational model.

There are, however, a number of databases that do not follow the relational model and are therefore called nonrelational databases, the best known and most popular of which is certainly MongoDB. There are different nonrelational tecnologies but for this article we are going to refer to Document store type, which is the one MongoDB belongs to. This type of database does not store data in tables but in documents; each document corresponds to an object written in json format.
This difference in paradigm leads to a number of important differences. The use of documents instead of tables allows a much more flexible management of records, in this sense it can be said that this model is more resilient. As a direct consequence, far more complex objects can be stored in a single document than could be stored in one row of a table; in fact, to achieve the same level of complexity requires several tables related to each other.

To render the concept more effectively, let us try an example. Suppose we want to save information regarding the pupils of an elementary school. The school will have a number of classes, and each class will have a group of pupils. Since the pupil is a rather complex entity, having first name, last name, date of birth, grades in the various subjects, etc. it is unthinkable to provide in the class table a column for each of this information for each of the students, also and especially because the number of students is not a fixed number. It is therefore necessary to create a second table for the pupils and to add to the attributes of this table also the class they belong to by means of a unique identifier (a key) that will be used to perform a join operation between the tables. Using MongoDB it is not possible to perform the join operation, nor is it even necessary, in fact in a single document it is possible to save the class with the whole list of its pupils and all their information.
Anyway, it is not all doom and gloom though, suppose the students are university students instead, here a problem arises. While at school each student belongs to only one class, at university a student may participate in a multitude of courses, and saving information regarding students in all the documents regarding the courses they take leads to heavy duplication of data which makes it not a very efficient system.

Nonrelational databases and Big Data

When it comes to Big Data the use of nonrelational databases is often associated, and this is precisely because of their caratteristics.
If you do not know what big data is and its characteristics, I invite you to read my previous article What is Big Data.

Among the Vs of big data we include variety; in fact, data come from different sources and can be structured in different ways. The flexibility of nonrelational databases allow us to save all this data, where a relational database would instead raise an error.

Another V of Big data is volume. In the university students’ example we saw earlier it showed that using MongoDB would lead to higher memory usage, so how is it recommended for large databases? There are two reasons: scalability and speed of execution.
A relational database can be scaled only vertically by increasing server performance, in contrast, nonrelational databases can be scaled both vertically and horizontally, that is, by increasing the number of servers, which is usually less expensive.
Regarding speed of execution, relying on document usage, indexing helps the search for the interested data and in general the querying of the database is faster. The differences in execution time may seem almost imperceptible when working with small databases, however, it is as the volume increases that they become more and more noticeable.

To summarize, as usual for every application there is an appropriate technology stack and it is therefore difficult to say a priori what is best, but if you are dealing with big data it is worth trying to explore the world of nonrelational databases!

--

--

Raffaello Ippolito

Italian software developer and data analytics student. Graduated in Mathematics for Engineering talking about Big Data and Image Processing