When should you use Cassandra? Use Cases, Alternatives and Drawbacks

Abhinav Vinci
4 min readMar 21, 2023

--

Part 1: Cassandra- But why? Benefits of a columnar DB

How to decide which ColumnDB?

Factors when choosing column DB:

  1. Workload type and Query patterns: For workloads that require complex queries, aggregations, and data warehousing a columnar database like Vertica or ClickHouse may be a better choice than Cassandra.
  2. Consistency requirements: Cassandra prioritizes availability and partition tolerance over consistency, making it a better fit for applications that can tolerate eventual consistency. For strong consistency requirements, Hbase or Bigtable might be better.
  3. Data Size: If your data set is relatively small and can fit into a single machine, a simpler and more lightweight database [ex: Mysql] may be a better choice than Cassandra.
  4. Scalability: Consider how easy it is to scale the database and whether it can handle large amounts of data without sacrificing performance. ex: Cassandra scales linearly with data size.
  5. Performance, Availability, and Fault Tolerance: Consider how fast the database can perform reads and whether it can handle real-time data processing. Consider how the database handles hardware failures or network disruptions, and whether it provides replication and data redundancy features. ex: Cassandra is designed for high availability
  6. Cost: Columnar databases may have different licensing models and pricing structures. Consider the cost of the database, including licensing fees, infrastructure costs, and ongoing maintenance costs.
  7. Community Support: Consider size and activity of the community of developers and users, as well as the availability of resources and support. ex: Cassandra, HBase, Bigtable are the most popular column Dbs.

Alternatives to Cassandra

Apache HBase: Apache HBase is an open-source, distributed, columnar database that is built on top of Hadoop. There are also some scenarios where HBase might be a better choice than Cassandra.

  1. Ad-hoc queries: HBase provides a more flexible data model than Cassandra, which makes it a good choice for applications that require ad-hoc queries and data exploration.
  2. Complex data types: HBase supports complex data types such as arrays, maps, and nested structures, which makes it a good choice for applications that require more complex data modeling.
  3. Integration with Hadoop: HBase is integrated with the Hadoop ecosystem, which makes it a good choice for applications that require data processing using tools like Apache Spark and Apache Hive.
  4. Strong consistency: HBase provides strong consistency, which means that data is always consistent and up-to-date across all nodes in the cluster.

Bigtable: Google Cloud Bigtable is a fully managed, NoSQL database service that is designed for large-scale, real-time data processing.

  1. Integration with Google Cloud Platform: It is particularly well-suited for use cases that require integration with other Google Cloud services.
  2. Strong consistency: Bigtable also provides strong consistency. This makes it a good choice for applications that require transactional consistency.
  3. Managed service: Bigtable is a fully managed service on GCP, which means that Google handles the infrastructure, maintenance, and scaling of the database. If you prefer a managed service to reduce operational overhead, Bigtable might be a better choice.

ScyllaDB: ScyllaDB is an open-source, distributed, NoSQL database that is designed to be a drop-in replacement for Cassandra.

  1. Performance: Can handle more transactions per second (TPS) with lower latencies compared to Cassandra.
  2. Better hardware utilization: ScyllaDB can better utilize hardware resources, such as CPU and memory, compared to Cassandra, which can help reduce hardware costs.
  3. Has built-in support for real-time analytics with its Materialized Views feature, which can be used to create secondary indexes for faster data access.

When to use Cassandra over other columnar DB?

  1. Distributed architecture designed for high availability and fault tolerance: Like other columnar databases, Cassandra is designed to scale horizontally by adding more nodes to a cluster. However, unlike other columnar databases, Cassandra’s distributed architecture is designed for high availability and fault tolerance, making it easier to scale to large clusters with thousands of nodes.
  2. Tunable consistency: Cassandra’s tunable consistency model allows users to trade off consistency for availability, making it easier to scale applications across large clusters. Other columnar databases typically provide stronger consistency guarantees, which can make it more challenging to scale across large clusters.
  3. Easy to scale: Cassandra uses a peer-to-peer architecture that allows each node in the cluster to communicate with every other node. This makes it easier to add or remove nodes from the cluster without disrupting the system. Cassandra uses a data distribution model known as consistent hashing, which ensures that data is evenly distributed across nodes in the cluster. This makes it easier to scale read and write operations across large clusters.
  4. Lower Cost: Cassandra is an open-source database that is free to use and has no licensing costs.

Drawbacks of Cassandra

  1. Complex: Cassandra is a complex database system with a steep learning curve. It requires specialized knowledge to properly set up and configure, and optimizing performance can be challenging.
  2. Eventual consistency: Cassandra uses an eventually consistent data model, which means that there can be some delay between updates being made on different nodes. This can result in data inconsistencies if not properly managed.
  3. Storage overhead: Cassandra uses a wide row model, which means that there is some storage overhead when storing data with different column values. This can result in increased storage requirements compared to other database systems.
  4. Hardware requirements: Cassandra is optimized for running on high-performance hardware, which can be costly. It also requires a significant amount of RAM for optimal performance.
  5. Limited query options: Cassandra’s query language (CQL) has limited support for advanced query operations, such as subqueries and full-text searches. Cassandra is optimized for simple queries and may not perform well for complex queries that require joins or aggregation.
  6. Data model limitations: Cassandra’s data model is optimized for columnar data and can be less suitable for transactional data with complex relationships. It also does not support joins or referential integrity constraints.

In next blog:

  • Cassandra vs MongoDB ?
  • Cassandra Features and Internals
  • Real world applications
  • Why Column-oriented databases can handle flexible and dynamic schema changes more easily than row-oriented databases ?

--

--