Cassandra vs Mongo db ? How to choose ?

Abhinav Vinci
3 min readMay 30, 2024

--

Cassandra is based on LSM Database and uses a type of Sorted List (SSTable) for storing data.

Log-Structured Merge-tree (LSM) databases are a type of database that organizes data for efficient insertion, updates, and deletion operations, making them particularly well-suited for write-heavy workloads.

Sorted List type store

MongoDb is based on B tree database. Its is a document database built for general-purpose usage.

B-tree-based databases use balanced tree structures which provide efficient read and write operations (Used in : PostgreSQL, MySQL.). It provides more balanced performance for sequential and random access.

B tree based store

So, Broadly difference boils down to LSM based DB vs B Tree based DB.

Write Performance

  • Cassandra supports faster writes compared to MongoDB as its more simpler append operation on a high level.
  • MongoDB uses a B-tree-based storage engine , which involves more complex data structures and often requires random I/O operations during writes, potentially slowing down the write performance.
  • Cassandra is preferred for applications requiring high write throughput. Common use cases include time-series data, sensor data, and applications with high write-intensive workloads.

Query pattern:

  • MongoDB is often chosen for applications with evolving schemas, complex data structuresIt supports a richer query language, including indexing, aggregation framework, and secondary indexes.
  • Ad-Hoc Querying: If your use case involves a lot of ad-hoc querying where the query patterns are not well-known in advance, Cassandra’s is not best choice. MongoDB supports many index types for various use cases.
  • Apache Cassandra has a more structured data storage system than MongoDB. If the data you’re working with is in a fixed format, Cassandra is more suitable.

Why Cassandra should be used only when limited query patterns ?

Cassandra encourages denormalization/duplication, where the same data might be stored in multiple tables to support different query patterns.

Cassandra basically works by partitioning and replicating. If all your queries will be based on the same partition key, Cassandra is your best bet. If you get a query on an attribute that is not the partition key, Cassandra allows you to replicate the whole data with a new partition key. So now you have 2 replicas of the same data with 2 different partition keys.

Cassandra scales by replicating the complete database for every new partitioning key. But you can’t keep making new copies again and again. So when you have a high variety in queries i.e. each query has a different column in the where clause, Cassandra is not a good option

Scale and Simplicity:

  • Small-Scale Applications: Cassandra shines in large-scale, distributed environments. For small-scale applications with limited data and traffic, using Cassandra could be overkill
  • Generally recommended to use Mongo for simpler and small scale use cases
https://www.simplilearn.com/cassandra-vs-mongodb-article

When Not to use Both (Cassandra and Mongo) ?

  1. Transactions: If application requires strict ACID (Atomicity, Consistency, Isolation, Durability) transactions across multiple operations. SQL DBs are best choice.
  2. Complex Queries and Joins: When need to support complex queries, including JOIN operations and subqueries. Use RDBMS
  3. Data is highly relational with many interdependent tables and foreign key constraints, a relational database management system (RDBMS) would be a better choice.
  4. Analytical Query Support: These are not designed for complex analytical queries, which often require joins, aggregations, and multi-table operations. Running such queries on these dbscan be inefficient and slow.
  5. Low latency reads and less data size: Use In-memory Databases: In-memory databases store data entirely in RAM, offering extremely low-latency read and write operations.
  6. Data Warehousing : For heavy analytical workloads, data warehousing solutions like Apache Hive, Google BigQuery, or Amazon Redshift might be more suitable, as they are optimized for complex queries and large-scale data analysis.

tldr: Use Cassandra when you have a huge amount of data, a huge number of queries but very little variety of queries.

--

--