“There is nothing either good or bad, but thinking makes it so.” (William Shakespeare, Hamlet)
One of the biggest challenges in today’s software planet is that applications require scalability, high availability and at the same time need to support too many I/O intensive reads and writes coming from thousands of concurrent global users who are not tolerant to any kind of downtime.
Regarding scalability, from the application point of view, there is a plenty of applicable solutions like Docker, AWS elastic beanstalk, auto scaling policies and others instead of using vertical scaling which is of course not the preferred approach in terms of cost and downtime requirement. The application should be developed in a stateless manner for it to be scaled effectively. However, even if we are able to scale our applications with respect to the load we currently have, the data layer is still our bottleneck.
By using classical RDBMs (Relational Database Management Systems) like PostgreSQL, MSSQL or Oracle etc., it is extremely difficult to address these issues. In order to support high I/O extensive queries, we need to use sharding and denormalized views at the same time. However, when we need to scale the DB, re-sharding is an absolute nightmare and at the same time, vertical scaling approach is preferable for RDBMs and it is getting expensive too quickly.
In order to support high availability, we need to replicate the database with master/slave approach but we have to create a watcher mechanism to switch between master to slave servers nonetheless we also need to watch the watcher mechanism for failures to obtain no single point of failure (SPOF) . In addition, when we replicate the data ACID can be problematic because of the lag between the servers. At the end of the day, it is not a trivial task to perform these requirements on RDBSs, we need to have dedicated development and operation engineering teams just for providing these features effectively.
To address specified problems with ease, Apache Cassandra DB comes to rescue. Apache Cassandra provides high availability, scalability, no SPOF and even Multi-DC support out of the box. It is an extremely fast DB in terms of reads and writes. You can add/remove commodity hardware online to the Cassandra DB without any downtime. You can scale horizontally with a linear and predictable performance increase.
According to the CAP Theorem, in a distributed system we have a chance to choose only 2 out of the 3 features: High Availability, Consistency or Partition Tolerance. If we choose High Availability and Partition Tolerance, we need to sacrifice from the Consistency. Cassandra is an AP database with respect to the CAP Theorem. It chooses Availability and Partition Tolerance over Consistency. However, Cassandra has a feature called Tunable Consistency so you can tune your consistency level according to your application’s needs.
How Cassandra achieve to address these issues lies behind its infrastructure. Apache Cassandra uses ring topology instead of master-detail or leader-follower approaches like other cluster systems. It is called masterless architecture. So every node in the ring is equal and able to perform any request.
There is also a replication factor parameter that you can tweak to set how many replicas (default value is 3) you are going to have in your cluster. When we write data to any node in the cluster, Cassandra handles the replication for us. So when any node becomes offline for any reason, your cluster will still be able to perform your queries without any downtime. Nodes are talking to each other in a peer to peer manner with the help of the very lightweight protocol called Gossip Protocol. Therefore, each node knows the other nodes are available or not anytime. There is no config server or Zookeeper etc. So, Cassandra is highly available.
As I mentioned above, you can add/remove nodes online to a Cassandra cluster. Cassandra has a hash ring structure that determines which partition of data will be stored in which node. When you add a node to the system, hash intervals will be updated and all the other nodes send the required data to the incoming new node on the fly. When the new node’s status becomes ready, it starts serving in the cluster like the other nodes. So, Cassandra is scalable for sure.
In order to understand why writes/reads are incredibly fast in Cassandra, we need to have a look at its write and read paths respectively. But before doing that I need to explain Cassandra table model a little bit.
Cassandra table model is so much different than other DBs. Primary keys in Cassandra, include two parts. First one is partition key and second one is clustering column. Partition key determines the node in which our table will be written to whereas clustering column decides the order of the data is written to the disk. Clustering column is a critical concept since reading sequentially from disk is much faster than the random disk access.
Cassandra forces you to query your data over these columns only. This limitation requires you to adapt two approaches while designing your application and data model. The first approach: You need to design your application data queries first and after that design your DB model. This has nothing to do with what we are used to do until now. However, it helps you to address performance issues in your application without happening at all. The second approach is that your data will be duplicated for your application’s different query needs. However, disk is getting cheaper every day. What we need to focus is read/write query performance instead. In engineering there is no silver bullet and there are trade-offs like this in general. You have to sacrifice something in order to gain some other things, but you need to decide which one to sacrifice properly. Cassandra has already made these decisions in the data layer in order for us to get better performance.
Any node in Cassandra cluster that the application connects to make a query is called a Coordinator node. When data comes to coordinator node, it decides its partition node and sends the data to that node in the ring. The node immediately writes the incoming data with its timestamp to the commit.log in disk sequentially and notifies the coordinator node that write operation is performed successfully. After that, commit.log will be rearranged in the memory structure called memtable. When the memtable size reaches a certain point, it is written to disk as it is and this structure is called SSTable and it is immutable. Updates and deletes are not reflected these SSTables, instead new SSTables are created. In regular intervals Cassandra merges these SSTables in order to increase the performance.
When we need to get the data back, the Coordinator node sends the request to all the replica nodes and merges the result and returns to the client. When it is needed to read from multiple SSTables, newer data always wins (remember the timestamp in the commit.log) and if there is any unflushed data in memory, it is also added to the query result. Cassandra also uses bloomfilter and caching mechanisms internally to read the data from disk very fast. In addition, Cassandra also performs read repairs on nodes when it is needed while reading the data. Since Cassandra’s consistency level is tunable, some nodes may not have the latest data.
In order to use Cassandra in your applications, you have two choices. Using open source Apache Cassandra or using DataStax Enterprise product. The latter one is based on open source Cassandra with some other performance improvements and extended support. It also provides integrated Multi-DC Search and integrated Spark for Analytics. Open source Cassandra is perfect for hacking whereas DataStax enterprise is perfect for using in deployment. You may also use DataStax Astra to use it as a service. It also provides free layer for development purposes.
If you want to get more information about Cassandra, I strongly recommend following DataStax from social platforms since they provide a lot of live sessions regularly to help you to get most out of Cassandra in your applications. You may also get certified for Cassandra from DataStax by studying the materials they provide on their academy website to become more confident with Cassandra DB.
 “Cassandra Architecture”, https://cassandra.apache.org/doc/latest/architecture/index.html
 “DataStax Academy”, https://academy.datastax.com/
 “Cassandra The Definitive Guide, Distributed Data at Web Scale.”, Jeff Carpenter & Eben Hewitt