Cassandra on ContainerShip

A Quick Setup Guide

Norman Joyner

Published in

ContainerShip Articles

5 min readMay 27, 2015

Background

Getting a containerized Cassandra cluster up and running in production can be a bit tricky. While containerization engines like Docker make shipping applications simple, they can complicate other aspects of running your workloads. In the case of a clustered database such as Cassandra, there are a multitude of issues that must be considered when running in a containerized, production environment.

Can I scale my cluster up when necessary?
How will the nodes find each other and communicate?
Can I even run a database since it has persistent data?
What happens if everything comes crashing down?

ContainerShip makes running Cassandra simple.

Setup Guide

The steps below will guide you through setting up a scalable Cassandra cluster on ContainerShip. Each follower node in the cluster will run an instance of Cassandra, and join the existing cluster. The guide below assumes you already have a ContainerShip cluster running; you can find more information about setting up a ContainerShip cluster from our official docs. This guide also assumes you are utilizing Navigator, a web-ui plugin for ContainerShip, to launch your applications. More information on using Navigator can be found here. While we do not demonstrate CLI or API usage in this article, setting up Cassandra using either will also work.

To get started, open Navigator and create a new application with a sensible name (such as cassandra).
Set the image to containership/cassandra:2.1.5 (most recent version at the time of writing this article. Check the Dockerhub page for additional tags in the future).
The default command is acceptable, so nothing needs to be entered for this field.
Cassandra needs ports 7000 & 7001 for inter-node communication, 7199 for JMX, 9042 for CQL, and 9160 for thrift. Since we are configuring our cluster to run one instance of Cassandra on each of our follower nodes, setting the networking mode to host is desirable.
Give the application some resources. Remember, Cassandra is a Java project so be sure to allocate enough memory for each container.
Since we are utilizing host networking mode, the container port can be left blank (autoassigned).
Set a sensible host volume location which the Cassandra container will bind mount, and write its data to. This location can be any directory on the host system large enough to hold all of Cassandra’s data. Once CodexD is merged into our next release, leaving the host volume blank will automatically use a copy on write subvolume in a default location. More on that in a later post.
Container volume should be set to /var/lib/cassandra/data, where Cassandra is configured to write to.
Environment variables are not necessary, but setting CASSANDRA_CLUSTER_NAME will override the default value of “ContainerShip Cassandra”.
Click create!

Once your application is created, you can ensure your cluster scales horizontally by running it once on every follower node. You can easily configure your Cassandra application to do this by setting the constraints.per_host=1 tag, as seen below. More information about tags is available in our docs.

ContainerShip’s navigator web UI illustrating constraints.per_host=1 tag

Your Cassandra application should automatically scale to n/n (where n is the number of active follower nodes in your cluster). As more follower nodes come online, the cluster will scale linearly.

Congratulations! Your Cassandra cluster is now ready to use!

Ensuring Cassandra is Clustered

Before adding data to your new database, you may want to ensure your Cassandra containers are part of the same cluster by using the nodetool executable. The nodetool command uses JMX which is now listening on 127.0.0.1:7199 on each follower node. Since JMX is only listening on localhost, and since nodetool is already installed inside the container, the easiest way to test is by entering the Cassandra container on a follower node.

SSH into a follower node
Enter the Cassandra container by running:

docker exec -it $(docker ps | grep cassandra | awk -F' ' '{ print $1 }') /bin/bash

Run:

nodetool status

If your Cassandra containers have successfully clustered, you will see something similar to the output below.

nodetool status command demonstrating clustered Cassandra containers

Backing up your Data

As you have seen so far, getting Cassandra up and running on ContainerShip is a breeze. Now that everything is up and running smoothly, and you’ve entered important data into the database, wouldn’t it be nice to be confident that your data is safe? Well you can be! Using ContainerShip Cloud, you can take point-in-time backups of your Cassandra cluster. Backing up your Cassandra application and all of its data is as easy as clicking a button.

Backing up Cassandra Test cluster using ContainerShip Cloud

These backups have a wide range of use cases. For example, if your database becomes corrupt, simply roll back to a previous day’s backup. If you spin up a new ContainerShip cluster in a another region, you can start running Cassandra there (including all the data) by simply restoring the backup to the new cluster using a single click. Is your new cluster on a different cloud provider? Or maybe on premises? Not a problem! As long as you connect your ContainerShip cluster to ContainerShip Cloud, you can restore your Cassandra backups anywhere!

We’re working on even more seamless database integrations. If you have any you’d love to see next, feel free to let us know by writing a response below. If this article was useful, please recommend it so others can also see how easy running Cassandra can be using ContainerShip and ContainerShip Cloud!