A chance for NewSQL databases

Raul Gualda Calderon
Packlink Tech
Published in
6 min readMar 31, 2020

There was a time when only relational databases seem to be the place to store our data. They were first proposed in the 70s and have been with us since then. Decades of evolution have given time for the companies to offer robust and powerful databases with great features that have given developers with great systems to store their data.

But of course, relational databases have their drawbacks. Seeking an alternative to solve those problems an old concept became popular: NoSQL databases. Although this kind of databases has existed since the 60s, it was not until the 2000s that the term NoSQL was coined and they started to become widely popular.

Let’s see what kind of problems developers tried to solve with NoSQL databases and also what kind of problems NoSQL has brought.

CAP Theorem

CAP is an acronym for Consistency, Availability and Partition tolerance. But what does this mean?

  • Consistency: When we try to read a value, we always read the most recent write or an error.
  • Availability: Every request always receives a non-error response but we do not have the guarantee that this is the most recent write.
  • Partition tolerance: No matter what kind of problem we have in the communication with our nodes, the system continues operating.

It easy to see that there are some contradictions in these statements:

If you want to be consistent you cannot always be available.

The theorem states that only two of the previous statements can be met. Here is where we can make another classification for databases: AP vs CP.

  • AP databases: This is were most of the NoSQL databases can be classified. They are sacrificing data consistency in order to achieve availability. After a request they will always give a non-error answer to us but we cannot trust that this data has the most recent values.
  • CP databases: This is were most of the Relational databases can be classified. We can trust that the data they are responding is always the latest write but we could also receive an error.
Source: https://jvns.ca/images/drawings/cap.png
Source: https://jvns.ca/images/drawings/cap.png

ACID transactions

One of the most important features that relational databases have is transactions. The term ACID comes from:

  • Atomicity: Transactions are usually composed from multiple statements. Atomicity guarantees that all these statements are executed as single units. This means that either all of them are executed or none are executed.
  • Consistency: A transaction must bring the database from valid state to another valid state. There cannot be inconsistent data even if the transaction fail.
  • Isolation: In databases we often have concurrent transactions. Isolation guarantees that no matter how many transactions are being executed in parallel, the database will remain in the state as if all of them were executed sequentially.
  • Durability: Once a transaction has been committed it will remain committed even in a system failure. Transactions cannot be recorded in volatile memory.

Can a NoSQL database have ACID transactions?

NoSQL databases are AP databases, they have sacrificed consistency making almost impossible to achieve the previously described properties.

There is a huge debate about this question. There are people that claim that ACID is not contradictory with the concept of NoSQL. They claim that consistency can be achieved through time. This have created a new concept: Eventual consistency. This means that after a certain amount of time, all data in the database become consistent. Although this is true, the main problem with this approach is that we cannot be sure WHEN the database is consistent.

This reason why BASE concept was created (in chemistry BASE is the opposite to ACID). BASE stands for:

  • (B)asically (A)vailable: every read or write will use all the nodes available to get the maximun level of consistency but without the guarantee that full consistency was achieved.
  • (S)oft state: Because consistency can only be achieved through time. We can only give the probability that the current state is consistent.
  • (E)ventually consistent: If we wait long enough the system will eventually become consistent.

SQL language

This is the main language used to query Relational Databases. It has been with us since 70s and it have become an standard for any Relational Database. Developers have been mastering in this language for decades. Transition from one vendor to another is easy because, with some differences, the query language is very similar.

Why did some NoSQL databases decide to create their own query language?

I cannot find a good answer for this, but in my opinion this decision is a bad idea. A new query language was NOT required to create this kind of databases. SQL have demonstrated to be a very powerful language that allows developers to retrieve the data they wanted. It might need some enhancements in order to support the new features provided by NoSQL databases, so let’s improve SQL but there is no need for a new language.

NewSQL databases as an alternative

In the previous sections we have seen that although NoSQL databases resolve some of the problems that Relation Databases have, they also introduce new problems. It seems that NoSQL databases are creating as many problems as they are solving.

But is there any other option? Yes !!! NewSQL databases are here trying to solve those problems and giving us new interesting features.

The term NewSQL was first used in 2011. The idea behind this databases is to keep the ACID guarantee that Relational Databases have but adding the scalability that NoSQL databases provide.

These are some of the new features that offer when compared to Relational Databases:

Distributed database through several nodes.

Data is replicated in several nodes. If one node fails the database keeps responding. But this is not the classical master/slave pattern. In NewSQL databases there is no master node, so you can send a request to any node and it will find the way to give you a response.
Although data is replicated, it doesn’t mean that each node have a copy of everything. Only a configured number of nodes store a copy of each data. This way of storing data give us an opportunity to offer a new feature: Geo-partitioning.

Geo-partitioning

Table partitioning is a well known technique to improve our queries performance. The idea behind is to divide a large table into smaller parts without having to create separate tables for each part. With this idea in mind let’s move one step ahead and try to exploit data distribution through nodes.

We can set up our database to have nodes physically located in different countries. We can configure partitions that will store data in nodes physically near where data is going to be consumed. As long as we query for data that is stored in nodes near us, we will get a good performance. But if for any reason we need to query the whole dataset, we can do it, obviously with a lower performance. So rather than having different databases for each country/region, we can have a single database without performance degradation.

Are NoSQL databases so bad?

NoSQL databases are great and they have many interesting features that no other kind of database can provide. By reading this article, it might seem that NoSQL databases are a bad choice. But don’t get wrong. This was not the purpose of the article.

NoSQL databases are good for some specific use cases but they are not general purpose databases. They have some drawbacks that make them a bad choice for many use cases. And that is what I wanted to point out with this article.

Conclusion

If you want an alternative to traditional Relational Databases, please consider NewSQL databases as a real alternative.

Further reading

--

--