Cassandra DB Replication

Saravanan
8 min readJul 18, 2024

--

What is Cassandra and its use cases?

Apache Cassandra is an open-source, distributed NoSQL database system designed to handle large amounts of data across many commodity servers with no single point of failure. It is known for its scalability, high availability, and fault tolerance. Cassandra is particularly well-suited for applications that require large-scale, real-time data processing across multiple nodes.

Key Features of Cassandra

  • Scalability: Handles increasing amounts of data seamlessly.
  • High Availability: Ensures data is always accessible.
  • Fault Tolerance: Manages node failures without data loss.
  • Distributed Architecture: Data is distributed across many servers.
  • Schema-Free: Flexible data model.
  • Tunable Consistency: Balance between consistency and availability.

Common Use Cases for Cassandra:

  • Real-Time Data Analytics
  • Content Management Systems
  • Internet of Things (IoT)
  • Geospatial Applications

Procedure to setup Cassandra:

Prerequisites:

  1. VMs: Ensure you have the required VMs for DC and VMs for DR Each VM should have a static IP address, sufficient CPU, memory, and disk space. ( In this use case, I am using three VMs for each DC and DR.
    IPs for DC: 192.168.0.46,192.168.0.41,192.168.0.42 and IPs for DR: 192.168.0.43,192.168.0.44,192.168.0.45)
  2. Java: Cassandra requires Java version 11. Ensure Java version 11 is installed on each VM
  3. Firewall: Open necessary ports (7000, 7001, 7199, 9042, 9160) on each VM.

Install Java

$ yum install java-11-openjdk-devel -y

Output:


Updating Subscription Management repositories.
Last metadata expiration check: 0:20:02 ago on Saturday 29 June 2024 08:10:13 PM IST.
Package java-11-openjdk-devel-1:11.0.23.0.9–3.el8.x86_64 is already installed.
Dependencies resolved.
Nothing to do.
Complete!

Verify the Java installation

$ java — version

Output:

openjdk 11.0.23 2024–04–16 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.23.0.9–2) (build 11.0.23+9-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.23.0.9–2) (build 11.0.23+9-LTS, mixed mode, sharing)

Download the Apache Cassandra 4.1.5 binary on each VM by using below command.

$ wget https://downloads.apache.org/cassandra/4.1.5/apache-cassandra-4.1.5-bin.tar.gz

Extract the downloaded file and copy the extracted files to /opt/cassandra:

$ tar -xvzf apache-cassandra-4.1.5-bin.tar.gz -C apache-cassandra-4.1.5 /opt/cassandra

Configure Cassandra

Edit the cassandra.yaml configuration file on each VM

Set the cluster name, seed provider, listen address, and snitch properties. Repeat this configuration on each VM, adjusting IP addresses accordingly.

$ vim /opt/cassandra/conf/cassandra.yaml

cluster_name: 'cassandra' # Cluster name should be common for all DC and DR VMs
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: "192.168.0.46,192.168.0.41,192.168.0.42,192.168.0.43,192.168.0.44,192.168.0.45" # In seeds section we need to mention all the 6 DC and DR VM IPs
listen_address: 192.168.0.46 # Mention the IP address of current VM
rpc_address: 192.168.0.46 # Mention the IP address of current VM
broadcast_rpc_address : 192.168.0.46 # Mention the IP address of current VM
endpoint_snitch: GossipingPropertyFileSnitch

Edit the cassandra-rackdc.properties configuration file on each VM to specify the data center (dc) and rack (rack1)

$ vim /opt/cassandra/conf/cassandra-rackdc.properties

# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
dc=dc # For DC VMs we need to mention as dc and for DR VMs we need to mention as dr
rack=rack1

Set up Environment Variables

$ export CASSANDRA_HOME=/opt/cassandra
$ export CASSANDRA_CONF=$CASSANDRA_HOME/conf
$ export CLASSPATH=$CASSANDRA_HOME/lib/*:$CASSANDRA_CONF
$ export JAVA_HOME=/usr/lib/jvm/java-11-openjdk
$ export PATH=$JAVA_HOME/bin:$PATH

Add the required ports in firewall

$ firewall-cmd — zone=public — add-port=7000/tcp — permanent
$ firewall-cmd — zone=public — add-port=7001/tcp — permanent
$ firewall-cmd — zone=public — add-port=7199/tcp — permanent
$ firewall-cmd — zone=public — add-port=9042/tcp — permanent
$ firewall-cmd — zone=public — add-port=9160/tcp — permanent
$ firewall-cmd — reload

List the added ports in firewall

$ firewall-cmd — list-ports

Output:

7000/tcp 7001/tcp 7199/tcp 9042/tcp 9160/tcp

Start the Cassandra in all the VMs using cassandra -R command

$ sh cassandra -R

Output:

Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
CompileCommand: dontinline org/apache/cassandra/db/Columns$Serializer.deserializeLargeSubset(Lorg/apache/cassandra/io/util/DataInputPlus;Lorg/apache/cassandra/db/Columns;I)Lorg/apache/cassandra/db/Columns;
CompileCommand: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubset(Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;ILorg/apache/cassandra/io/util/DataOutputPlus;)V
CompileCommand: dontinline org/apache/cassandra/db/Columns$Serializer.serializeLargeSubsetSize(Ljava/util/Collection;ILorg/apache/cassandra/db/Columns;I)I
CompileCommand: dontinline

Check the status of Cassandra nodes

$ sh nodetool status

Output:

Datacenter: dc
==============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.0.41 179.9 KiB 16 15.7% f0614fda-8982–43b0–8d9c-dcf264b4dfd9 rack1
UN 192.168.0.46 221.92 KiB 16 18.6% 6326b97f-9463–4064–9652–47973b4d80a2 rack1
UN 192.168.0.42 171.42 KiB 16 18.4% c43e9220-eb95–49ec-991c-50212c249de8 rack1
Datacenter: dr
==============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.0.43 143.79 KiB 16 13.5% 3d128b61-ebcf-4fac-8475–720180b203cb rack1
UN 192.168.0.45 107.71 KiB 16 18.5% 6add4e4a-7d97–47a2-b46e-83845597462f rack1
UN 192.168.0.44 122.47 KiB 16 15.3% 97c314ac-c123–459a-bfd5–134e68ca02ba rack1

Connect to Cassandra database

$ sh cqlsh 192.168.0.46

Output:

Connected to cassandra at 192.168.0.46:9042
[cqlsh 6.1.0 | Cassandra 4.1.5 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
cqlsh>

Configure Authentication Settings in cassandra.yaml for all the Cassandra nodes

Set authenticator to PasswordAuthenticator and authorizer to CassandraAuthorizer. Restart the Cassandra service on all nodes.

$ vim /opt/cassandra/conf/cassandra.yaml

authenticator: PasswordAuthenticator
# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
# Out of the box, Cassandra provides org.apache.cassandra.auth.{AllowAllAuthorizer,
# CassandraAuthorizer}.
#
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.role_permissions table. Please
# increase system_auth keyspace replication factor if you use this authorizer.
authorizer: CassandraAuthorizer

Restart the Cassandra service in all nodes.

Create an Admin User by logging into Cassandra database with the default cassandra superuser.

$ sh cqlsh -u cassandra -p cassandra 192.168.0.46

Output:

Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
Connected to cassandra at 192.168.0.46:9042
[cqlsh 6.1.0 | Cassandra 4.1.5 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
cassandra@cqlsh>
cassandra@cqlsh> CREATE ROLE admin WITH PASSWORD = 'cassandra@2024' AND SUPERUSER = true AND LOGIN = true;
cassandra@cqlsh>
cassandra@cqlsh> SELECT * FROM system_auth.roles WHERE role = 'cassandra';
role | can_login | is_superuser | member_of | salted_hash
- - - - - -+ - - - - - -+ - - - - - - - + - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cassandra | True | True | null | $2a$10$Rkz5TGMbyMqVUiOy9Av8ZuYWkHNmx4jGUY1ELVfd9/iAzb0hWrHVC
(1 rows)
cassandra@cqlsh> SELECT * FROM system_auth.roles WHERE role = 'admin';
role | can_login | is_superuser | member_of | salted_hash
- - - -+ - - - - - -+ - - - - - - - + - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
admin | True | True | null | $2a$10$PInhlRT441c0w/AOcZ2nt.0WVFNApFxYT7viQh/sbLCUS7RrTmBmq
(1 rows)

Alter the class of system_auth keyspace from SimpleStrategy to NetworkTopologyStrategy to enable replication strategy between DC and DR Cassandra nodes

$ sh cqlsh -u admin -p cassandra@2024 192.168.0.46

Output:

Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
Connected to cassandra at 192.168.0.46:9042
[cqlsh 6.1.0 | Cassandra 4.1.5 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
admin@cqlsh> desc keyspaces;
system system_distributed system_traces system_virtual_schema
system_auth system_schema system_views
admin@cqlsh>
admin@cqlsh> desc keyspace system_auth
CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
admin@cqlsh:system_auth> ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'dc': '1', 'dr': '1'} AND durable_writes = true;

admin@cqlsh> desc keyspace system_auth
CREATE KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'dc': '1', 'dr': '1'} AND durable_writes = true;

Check Cassandra login in DC and DR nodes.

Revoke Permissions from Default cassandra user(Optional)

$ sh cqlsh -u admin -p cassandra@2024 192.168.0.46

Output:

Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
Connected to cassandra at 192.168.0.46:9042
[cqlsh 6.1.0 | Cassandra 4.1.5 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
admin@cqlsh>
admin@cqlsh> ALTER ROLE cassandra WITH LOGIN = false;
admin@cqlsh>
admin@cqlsh> SELECT * FROM system_auth.roles WHERE role = 'cassandra';
role | can_login | is_superuser | member_of | salted_hash
- - - - - -+ - - - - - -+ - - - - - - - + - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
cassandra | False | True | null | $2a$10$Rkz5TGMbyMqVUiOy9Av8ZuYWkHNmx4jGUY1ELVfd9/iAzb0hWrHVC
(1 rows)
[root@cassandra-dc1 bin]# sh cqlsh -u cassandra -p cassandra 192.168.0.46
Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
Connection error: ('Unable to connect to any servers', {'192.168.0.46:9042': AuthenticationFailed('Failed to authenticate to 192.168.0.46:9042: Error from server: code=0100 [Bad credentials] message="cassandra is not permitted to log in"',)})

Create keyspace and table in DC Cassandra node

$ sh cqlsh -u admin -p cassandra@2024 192.168.0.46

Output:

Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
Connected to cassandra at 192.168.0.46:9042
[cqlsh 6.1.0 | Cassandra 4.1.5 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
admin@cqlsh>
admin@cqlsh> CREATE KEYSPACE app_sit WITH replication = {'class': 'NetworkTopologyStrategy', 'dc': 1, 'dr': 1} AND durable_writes = true;
admin@cqlsh> describe keyspaces;
app_sit system system_schema system_virtual_schema
system_auth system_traces system_distributed system_views
admin@cqlsh>
admin@cqlsh> use app_sit;
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> CREATE TABLE emp_details(emp_id int PRIMARY KEY, emp_name text, emp_city text);
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> describe tables;
emp_details
admin@cqlsh:app_sit> INSERT INTO emp_details (emp_id, emp_name, emp_city) VALUES (1, 'victor', 'mitch');
admin@cqlsh:app_sit> INSERT INTO emp_details (emp_id, emp_name, emp_city) VALUES (2, 'john', 'stockton');
admin@cqlsh:app_sit> INSERT INTO emp_details (emp_id, emp_name, emp_city) VALUES (3, 'micheal', 'holding');
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> select * from emp_details;
emp_id | emp_city | emp_name
- - - - + - - - - - + - - - - -
1 | mitch | victor
2 | stockton | john
3 | holding | micheal
(3 rows)

Check the table details in DR Cassandra nodes to verify the replication

$ sh cqlsh -u admin -p cassandra@2024 192.168.0.43

Output:

Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
Connected to cassandra at 192.168.0.43:9042
[cqlsh 6.1.0 | Cassandra 4.1.5 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
admin@cqlsh>
admin@cqlsh> describe keyspaces;
app_sit system system_schema system_virtual_schema
system_auth system_traces system_distributed system_views
admin@cqlsh>
admin@cqlsh> use app_sit;
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> describe tables;
emp_details
admin@cqlsh:app_sit> select * from emp_details;
emp_id | emp_city | emp_name
- - - - + - - - - - + - - - - -
1 | mitch | victor
2 | stockton | john
3 | holding | micheal
(3 rows)

Write data in DR Cassandra node

$ sh cqlsh -u admin -p cassandra@2024 192.168.0.43

Output:

Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
Connected to cassandra at 192.168.0.43:9042
[cqlsh 6.1.0 | Cassandra 4.1.5 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
admin@cqlsh>
admin@cqlsh> describe keyspaces;
app_sit system system_schema system_virtual_schema
system_auth system_traces system_distributed system_views
admin@cqlsh>
admin@cqlsh> use app_sit;
admin@cqlsh:app_sit>
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> describe tables;
emp_details
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> select * from emp_details;
emp_id | emp_city | emp_name
- - - - + - - - - - + - - - - -
1 | mitch | victor
2 | stockton | john
3 | holding | micheal
(3 rows)
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> INSERT INTO emp_details (emp_id, emp_name, emp_city) VALUES (4, 'paul', 'george');
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> select * from emp_details;
emp_id | emp_city | emp_name
- - - - + - - - - - + - - - - -
1 | mitch | victor
2 | stockton | john
4 | george | paul
3 | holding | micheal
(4 rows)

Check the newly added row in DC Cassandra node to verify the replication

$ sh cqlsh -u admin -p cassandra@2024 192.168.0.46

Output:

Warning: Using a password on the command line interface can be insecure.
Recommendation: use the credentials file to securely provide the password.
Connected to cassandra at 192.168.0.46:9042
[cqlsh 6.1.0 | Cassandra 4.1.5 | CQL spec 3.4.6 | Native protocol v5]
Use HELP for help.
admin@cqlsh>
admin@cqlsh>
admin@cqlsh> describe keyspaces;
app_sit system system_schema system_virtual_schema
system_auth system_traces system_distributed system_views
admin@cqlsh>
admin@cqlsh> use app_sit;
admin@cqlsh:app_sit>
admin@cqlsh:app_sit> describe tables;

emp_details
admin@cqlsh:app_sit> select * from emp_details;
emp_id | emp_city | emp_name
- - - - + - - - - - + - - - - -
1 | mitch | victor
2 | stockton | john
4 | george | paul
3 | holding | micheal
(4 rows)

This setup will configure the Cassandra cluster for DC-DR replication, ensuring high availability and fault tolerance.

--

--