Antipatterns of using NoSQL

(by Daniel Podolsky, CTO of inCaller)

inCaller
5 min readAug 3, 2016

NoSQL is not a product, it’s not a technology, it’s not even a concept. It’s not even an approach to design. It is more of a statement of renouncement of several design patterns, which were dominant in the development of client-server systems for years.

I am in the IT business since 1990, and since 2000 I’ve been consulting various internet startups on matters of constructing effective and secure server systems. I currently hold the CTO position for the inCaller (www.incaller.org) — a unique mobile app that boosts phone call efficiency by adding metadata (text or stickers) directly to the standard call. Like so:

Now then. From my personal experience I can say with confidence, that NoSQL is a very powerful tool for boosting the productivity of overburdened systems and decreasing their overall cost of maintenance. And trust me, there are people on this Earth, who make quite a profit using NoSQL in their projects, or reducing their losses, at the very least.

The Power of NoSQL is rooted in the rejection of the redundant features of the DBMS. For the correct use of the NoSQL in a project, it requires the shift in the design paradigm, reduction, or perhaps complete rejection, of consistency and support demands in the process code, which are imposed by the use of the DBMS.

The decision to avoid NoSQL is not cowardice, but merely being precautious.

If you should choose NoSQL, there is a question of how to choose a NoSQL DBMS for your goals. There is a list of over 225 variations at http://nosql-database.org/ called the LIST OF NOSQL DATABASES. Simply reading it is quite a commitment. For me, the real choice of a NoSQL DBMS, is the choice between Aerospike and Cassandra.

In truth, though, the choice would not be between Aerospike and Cassandra, but between their read- and write-optimized specs.

Write-optimized DBMS, the most developed of which, I think, is Cassandra, are very quick at — surprise — writing data to disk. This is achieved firstly, by writing the data to the closest disc area available. This statement isn’t 100% accurate, but it’s fairly close to the truth.

Naturally, in the use of such an approach, the search of data when reading it, is no longer a trivial task — it would most likely require more than one action, some of which, in fact, would be the reading of information off the disk.

Now, to remedy the situation, Cassandra keeps compacting the database periodically, but even this is not of much help. Cassandra is quick when it comes to writing, a finely tuned Cassandra — even more so, but reading, especially alongside writing, can be a little “laggy”.

If this does not cut it for you — there’s always the read-optimized DBMS — Aerospike, for instance.

Aerospike is not too popular within the community, and this is largely because this product had only become available to a wide number of developers just 2 years ago. Until then, this was a purely commercial product, with a price that was a little too steep, to say the least. There was, of course, the free version, but the limitations of its use were such, that it never even occurred to anyone to use it for an actual project.

This is now well in the past, however, and Aerospike today is a DBMS with open source code and a free version for the community. If you are curious, you can check out (http://www.aerospike.com/products-pricing/) to see the difference between the payed and the free version and see that the free version now has everything necessary for its use in serious projects.

Alas, we have digressed. Aerospike is a write-optimized DBMS that simply works. You create a cluster, you load up your data and then you use it. Among all the write-optimized cluster DBMS’s the author had the chance of working with — and there had to be at least over 10 of those, Aerospike provides the best performance, the best scaling and means of monitoring and control.

The performance is particularly impressive. It is achieved by the following means:

  • An index, by which the search is conducted, is always in the memory load.
  • This index then finds, in a monosemantic fashion, the server, the disk and the dislocation on that disk, by which the data can be found in a single step.

You can’t say that this technique impacts the writing in a negative way, since Aerospike delivers quite impressive results while writing as well.

However, if you should run out of RAM (random-access memory), Aerospike will seize taking requests for data recording, even though there will be free space available on the hard drive.

Or, if the designated disk storage area runs out of continuously-unused space, Aerospike will also seize writing new data, until data defragmentation and compacting are performed. The procedure will start automatically, but will have low priority, so as not to disrupt data reading.

Now these, and a couple of other reasons that are quite more inconspicuous, make Aerospike the read-optimized DBMS of choice, in the author’s humble opinion ☺

Getting back to product selection, you should weigh in these particularities — reading or writing optimization. Generally speaking, you will be choosing between Cassandra and Aerospike in both theoretical and practical terms, especially if you run a productivity test.

You may ask why we at the inCaller company pay so much attention to matters of DBMS productivity and why we spend so much valuable human and calculation resource on their research.

My answer to that would be quite simple — we are working as a startup to create a high-load fail-proof system.

Being a startup means that we must put our ideas into action really quick, shortening the idea-realization-presentation-to-the-client cycle.

High workloads and system’s resistance to failure means that we require distributed data processing, horizontal scaling and the interchangeability of the system nodes.

--

--