Install & run bare-bone Cassandra Cluster on EC2

Mert Çalışkan
3 min readJul 27, 2015

--

I’ve already created a Cassandra Cluster on EC2 by employing Docker but if you need to do a bare-bone installation to compare some performance metrics without hassling with any containers, here is how you can achieve it.

Cassandra v2.1.8 encourages us to use JDK8 but the AMI with id: ami-a8221fb5 that I used comes with OpenJDK7. So first follow the steps in here to upgrade to JDK8 in your EC2 instance.

Then you can continue to create/edit-if-exists the datastax.repo file with the command given as follows:

sudo vi /etc/yum.repos.d/datastax.repo

The content of the datastax.repo file should be as follows:

[datastax] 
name = DataStax Repository
baseurl = http://rpm.datastax.com/community
enabled = 1
gpgcheck = 0

Install Cassandra with the command given as follows. Press ‘y’ whenever prompted.

sudo yum install dsc21

To check if the installation is a success, execute:

/etc/init.d/cassandra status

You should be seeing the output: cassandra is stopped.

Before starting cassandra service, we need to configure it for our cluster. The configuration file resides as /etc/cassandra/conf/cassandra.yaml. A sample configuration is given below. The bold parts are the added/modified parts of the default configuration file. The listen_address and the seed_provider part already exists in the default configuration file. So if you are going the change the file via vi for instance, better to remove them and add them to the top along with the other entries.

The listen_address sholuld be the private IP address of the instance. The broadcast_address, broadcast_rpc_address should be the public IP address of the instance. The seeds should refer to the public IP addresses of all the seeds and those IP values can be separated with commas. The whole value should be wrapped by quotation marks. If we have 3 nodes for a cluster and one of them is the seed node, configuration for all the 3 nodes should refer to the same public IP address of that seed node.

listen_address: <private-ip-address-of-instance>
broadcast_address: <public-ip-address-of-instance>
rpc_address: 0.0.0.0
broadcast_rpc_address: <public-ip-address-of-instance>
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
- seeds: “<public-ip-address-of-seed-instance>”
cluster_name: ‘Test Cluster’
num_tokens: 256
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
batchlog_replay_throttle_in_kb: 1024
authenticator: AllowAllAuthenticator
authorizer: AllowAllAuthorizer
permissions_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
disk_failure_policy: stop
commit_failure_policy: stop
key_cache_size_in_mb:
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32
memtable_allocation_type: heap_buffers
index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
start_native_transport: true
native_transport_port: 9042
start_rpc: true
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
column_index_size_in_kb: 64
batch_size_warn_threshold_in_kb: 5
compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100
sstable_preemptive_open_interval_in_mb: 50
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
client_encryption_options:
enabled: false
keystore: conf/.keystore
keystore_password: cassandra
internode_compression: all
inter_dc_tcp_nodelay: false

If you are lazy like me, instead of editing the cassandra.yaml file via vi, you can upload it directly via scp as below. If you encounter any permission errors, just chmod the yaml file before overwriting it.

scp -i /path/to/myAwesome.pem -oStrictHostKeyChecking=no /path/to/cassandra.yaml ec2-user@<public-ip-address-of-instance>:/etc/cassandra/conf/cassandra.yaml

Now we are ready to start the seed node:

sudo service cassandra start

When you execute nodetool status you should be seeing something like:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
— Address Load Tokens Owns Host ID Rack
UN <IP> 168.97 MB 256 ? 45efc448–8c7b-4511–9057–5d2b62ae7190 rack1

Adding a new node is a piece of cake, just modify the previously given cassandra.yaml with its IP and start the cassandra service. After executing the service, you should be seeing 2 nodes when you execute nodetool status.

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
— Address Load Tokens Owns Host ID Rack
UN <IP1> 168.97 MB 256 ? 45efc448–8c7b-4511–9057–5d2b62ae7190 rack1
UN <IP2> 169.28 MB 256 ? bf98b5bc-537f-4c05-ac3d-0a428001211d rack1

If you’d like to use the CLI for running CQL queries, execute:

cqlsh -u cassandra -p cassandra

You can continue by configuring more nodes to join the cluster.

--

--

Mert Çalışkan

Opsgenie Champion at Atlassian. Oracle Java Champion. AnkaraJUG Lead. Author of Beginning Spring & PrimeFaces Cookbook.