Apache Cassandra Clusters on Google Cloud (GCE)

When it comes to 1click installs for a wide range of solutions (not only Apache Cassandra, but other relational and non-relational databases, application servers, appliances and whatnot), there are prebuilt solutions available for every major cloud platform out there (AWS, Azure and GCE). I haven’t looked (so I can’t talk about) at the ones for AWS and Azure, but for GCE you have a template to start either a single node or a 3-node cluster. As far as I could tell you can’t choose the machine size in either case and can’t add additional nodes (not to mention that, for instance, you’ll end up with the standard ‘Test Cluster’ cluster name). So I took the time to put together several scripts that one can run and which help setting up a Apache Cassandra cluster with the desired VM configuration and also with some configuration changes. All the scripts are on our git here. And not only that but you’ll also get there a script to start a Cassandra cluster on GCE, but running in docker containers (all on same machine). But let’s briefly look at the scripts/commands to start a Cassandra Cluster on individual GCE VMs. It should be noted that the interaction with the GCE is done using gcloud utility.

  1. gcloud-server-setup.sh

This script starts the first (seed for the other ones) node.

It first creates a machine using some editable variables (NODE for machine name, ZONE for deployment zone, PROJECT for projectId and MACHINE for VM type — i have g1-small in there, but feel free to pick other type)

gcloud compute instances create $NODE --zone $ZONE --machine-type $MACHINE --network "default" --maintenance-policy "MIGRATE" --scopes …… --image "https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-1604-lts" --boot-disk-size "10" --boot-disk-type "pd-standard" --boot-disk-device-name "${NODE}disk1" --project $PROJECT

Once the machine is created we use gcloud utility ssh command to connect to the machine and perform the “standard” Apache Cassandra installation and configuration steps — all taken from another script available also on git here (note that this particular script installs Cassandra 3.9). After Cassandra is installed we use yet another script (from here) to do a minimal configuration: change cluster name, change the listen and rpc_address (so that it uses GCE internal network instead of 127.0.0.1) and even to add a new user for ssh user/pass login (in which case we also need to enable the password login in SSH — see script for details)

2. gcloud-add-replicas.sh

When the first node is up and running one could use this script to add one or more nodes to the cluster. It largely does the same tasks as the initial node setup, with just a couple of tweaks (it expects the seed node IP to know where to bootstrap the data from). This second script could very well be included in the first (just add a couple more tests and do slightly different commands on certain cases)

Granted, this is not a 1click install (it’s maybe 3 lines to execute in CLI for a 3 node cluster), but it gives more flexibility