Cassandra turned my world upside down

Jose Luis Cahisa
Virtualmind
6 min readNov 2, 2018

--

For those of us who went to College before the new millennium began to roll, NoSQL related technologies sounded a lot like the hippie movement. A counterculture, youth, and anti-establishment phenomenon, promoting some kind of “free love” with no “relationships”.

These new kids on the block were leaving aside our Sacred “4 Commandments” (or 5 or 6 depending on how “religious” you are) of Database Normalization. For a long time, those commandments were our guide through the darkness of data chaos. How dare they!!!

Well, it turned out that the chaotic world of data which required our expertise and knowledge to keep neat, sharp and handy, was a little ant… compared with the herd of elephants that our Web 2.0 friends had to face. And some of the basic rules of database design have changed in such a way, that we were forced to rethink many of the things we knew about.

My father used to write variable names with one letter to reduce the size of the program in memory. I still remember trying to read his code and then have some sleepless nights afterwards. Software development is always a trade-off. We are always looking for the right balance between storage, memory consumption, and processing power. When relational databases appeared, storage space was expensive compared with processing power. The volume of data to store was significant, so avoiding data duplication made a lot of sense.

These days, data storage cost has been reduced significantly, internet speed has increased and the amount of data that needs to be processed requires that we take much more care about processors.

A couple of things are worth mentioning. I’m not an expert in Cassandra or NoSQL, this article is about my honest first impressions when getting myself exposed to these technologies.

Also I know at this moment, there are many different NoSQL alternative technologies. This article is specifically about Cassandra. And to be even more specific, because Cassandra also has different versions, I will be talking about 2.1 and above.

All this preamble was to introduce this list of things that blew my mind when I started this journey to NoSQL databases. The first thing I want to share with you is that everything is upside-down in Cassandra… well, it turned my mind upside down at least ;-). The more I read, the more I find it is the opposite world of relational database design.

1. Data Duplication is good

If you want to put the relational world upside-down, this is the place to start. Because we are not talking about a little de-normalization to, for example, have three telephone columns in a row so we don’t create an additional table as we should.

We are talking about duplication of many different fields, including customer names, titles, dates, anything! It is better to duplicate than to join data (which we actually can’t).

It’s even well seen to have two different tables containing the same information (the fields), but a different structure of primary key, to satisfy different queries.

2. Create your model based on queries

When designing a relational database, we are thinking about the questions we will ask to the database, of course. But at the moment of creating the tables, we are modeling entities and their relationships. We trust that a well-designed database will be able to answer many different questions, and we will be able to create many different reports based on that data. As long as the data is stored, we will be able to create any query we need.

One of the data modeling principles (repeated over and over in Cassandra training), is “Know your queries”. All the design of the tables is created based on each specific query we are planning to run. This development process is known as “Query Driven Development” (QDD). You even need to decide the order in which you need the rows to satisfy your query, at design time!

As we are not allowed to make joins, all the data we need to answer the query needs to be stored in the same table.

3. Join when you write — not when you read

The quick subtitle above this line should have been “No joins”, but this would not be entirely true. In this upside-down world, you need to join the information when you write, because writing is cheap.

But joining tables when you read would be tremendously expensive because the information is stored in many different nodes.

Even joining data in the application is discouraged, because it means the database has not been properly designed. I have to be honest with you… when I first heard “No-Joins”… my hacker soul told me: don’t worry, you can do it later… in the application.

4. Writing is Cheap — Reading is Expensive

Writing in a relational database is usually an expensive operation. The database needs to check if the primary key exists or not, and return an error if you want to insert a key already present or update a key which is not there. Indexes need to be updated, relation consistency needs to be verified, and so on.

All those operations have been simplified to the extreme here, and if you want to write, the database just write. No verifications, just a simple and pure write operation, as fast as possible.

Reading on the other hand, in such a big volume of data, can be very expensive.

5. Upserts

They had to invent a new term to let everybody know their Insert and Update operations are not going to work as expected (from a relational world perspective).

This is in close relationship with the previous point. To be able to make write operations so fast, we need to give away something.

An Insert of a primary key which is already there will not return a duplicate key error… it will update the row of data instead. No duplicate errors to worry about.

On the other hand, an update of a primary key which doesn’t exist, will insert the data in the corresponding partition.

Cassandra has beautiful eyes, but she is not looking at these details: she has more important things to do ;-)

Final words

If I could take a DeLorean, go back to 1997 and have a talk with my Database Design 101 Professor, I imagine what a legendary arguing we could have, which would very likely end up with me being definitely dismissed from his class. Who wants to learn relational databases after all…?

Well, that probably never happened, otherwise I would be a carpenter right now, this post would have never been written and you would be spending your time on some other article. However, I’m still writing this post so the continuum time-space was not broken with this time travel exercise.

But I definitely had a lot of discussions with the inner version of my professor and all his teachings. The discussion forced me to break some real neural connections and create many new ones, which make feel so happy.

My respect for these new alternatives, even when they dare to challenge the normal forms. I’ll keep reading (and probably writing) about NoSQL and Cassandra, so keep posted.

--

--

Jose Luis Cahisa
Virtualmind

Programming since I was 10… (Imagine how big that program is)