Cassandra- But why? Benefits of a columnar DB

Abhinav Vinci
3 min readMar 6, 2023

--

Columnar DB ?

Columnar database, is a type of database that stores and retrieves data by column rather than by row.

In a traditional row-oriented database, data is stored and retrieved by row, which means that all the columns of a row are stored together. In a column-oriented database, however, each column of a table is stored separately, which allows for more efficient storage and retrieval of data.

Some popular column-oriented databases include Apache Cassandra and Apache HBase.

https://bi-insider.com/business-intelligence/column-and-row-based-database-storage/

Why use Columnar DB?

Its Faster:

  • Faster Column Retrival : In a column-oriented database, each column is stored as a separate file, which can be compressed and optimized for efficient data storage and retrieval. This approach allows for faster query performance, because only the columns needed for a particular query need to be accessed.
  • Reduced I/O: Column-oriented databases can often reduce I/O by only reading the columns that are needed for a particular query. This can result in faster query response times and more efficient use of resources.
  • High Performance for Analytical Workloads: Column-oriented databases are optimized for analytical workloads, which involve processing large amounts of data to generate insights. By storing data in columns, column-oriented databases can quickly scan and aggregate data to generate results.

Its Efficient:

  • More efficient in terms of memory usage : Column-oriented databases can be more efficient in terms of memory usage, because data in columns tends to have similar data types and values, which can be compressed more effectively than data in rows.
  • Efficient Compression: Column-oriented databases can often achieve higher compression ratios than row-oriented databases. This can result in significant savings in storage.
  • Flexible Schema: Column-oriented databases can handle flexible and dynamic schema changes more easily than row-oriented databases. This makes it suitable for a wide range of use cases, including time series data, messaging systems, and more.

What is Cassandra

Cassandra is a popular columunar database that is designed to handle large amounts of data across many commodity servers.

  1. Highly Scalable: Cassandra is highly scalable and can handle large amounts of data across multiple nodes with ease. You can add more nodes to your cluster as your data needs grow.
  2. Highly Available: Cassandra is designed for high availability and can handle node failures and network partitions without downtime. This makes it ideal for mission-critical applications.
  3. High Performance: Cassandra is designed to deliver high performance, with low latency and high throughput. It is optimized for write-heavy workloads and can handle millions of writes per second.
  4. Distributed Architecture: Cassandra is a distributed database, which means that it can be deployed across multiple data centers and regions. This allows you to replicate data for disaster recovery and reduce latency for users in different geographic locations.

Cassandra Simple usage example

from cassandra.cluster import Cluster
# Connect to the Cassandra cluster
# we first connect to a Cassandra cluster by creating a Cluster object with the addresses of the nodes in the cluster.
cluster = Cluster(['<cassandra-node-1>', '<cassandra-node-2>', '<cassandra-node-3>'])
# We then create a Session object with the name of the keyspace we want to use.
session = cluster.connect('<keyspace>')
# Create a table
session.execute("""
CREATE TABLE IF NOT EXISTS users (
id int PRIMARY KEY,
name text,
email text
)
""")
# Insert data into the table
session.execute("""
INSERT INTO users (id, name, email)
VALUES (%s, %s, %s)
""", (1, 'Alice', 'alice@example.com'))
# Select data from the table
rows = session.execute("SELECT * FROM users")
for row in rows:
print(row.id, row.name, row.email)
# Update data in the table
session.execute("""
UPDATE users
SET email = %s
WHERE id = %s
""", ('new-email@example.com', 1))
# Delete data from the table
session.execute("""
DELETE FROM users
WHERE id = %s
""", (1,))
# Close the connection to the Cassandra cluster
cluster.shutdown()

In Next Blog: Cassandra Part-2: We cover

  • When to use Cassandra over other columnar DB?
  • Alternatives to Cassandra
  • Drawbacks of Cassandra

--

--