Five Minute Guide: Getting Started with Cassandra on Docker
If you’re reading this, there’s a good chance your management has come to you with the task of evaluating modern data platforms. Whether you’re looking to replace a legacy RDBMS or building out the data layer for the latest digital exhaust initiative, you’re going to need a local sandbox environment to start.
I’ve found myself in this situation too many times, with a variety of databases, while working with multiple organizations. For that reason alone, I write. I write so that you can skip through the operational steps, and start building brilliant data models, sculpting beautiful queries, and quickly prototyping cloud applications that not only look great, but are responsive, scalable, and indestructible. All of those non-functional requirements of your application are highly dependent on choosing the right data layer.
At the end of this guide, you will have a containerized environment hosting both the DataStax Distribution of Cassandra and a notebook based data exploration tool for performing data modeling, query profiling, and data visualization.
Step 1. Create Docker Account
As you may have assumed by the title, you will need Docker installed with access to the Docker Hub repository to pull the Cassandra images. Visit The Docker Store and create a new user account. Docker Images are prebuilt making it easier to consume over traditional tar-ball distributions with the additional benefits of workspace isolation.
* If you already have a docker account, skip this step. ( 4 steps to go!)
Step 2. Installing Docker
Containers add a level of platform independence allowing for installation on various operating systems including Linux, Mac, and Windows. Find your operating system below, and follow the installation process. After installing, you will have access to the docker terminal from the command line terminal.
Step 3. Pull The Docker Images
To pull docker images from the Docker Hub, you will need to provide your docker account credentials (see step 1). Then you can execute the docker pull commands from the terminal.
Following the login command, you will be prompted to provide the docker id and password. Once the login completes, the credentials will be added to your session and used on all subsequent requests.
$> docker login
Pull the DataStax Image
The DataStax Server Image is the DataStax distribution of Apache Cassandra with additional capabilities of Search Engine, Spark Analytics and Graph Components (configurable at the docker run step). For quality and simplicity, this is your best bet.
$> docker pull datastax/dse-server:latest
Pull DataStax Studio Image (Notebook)
The DataStax Studio is a notebook based development tool for data exploration, data modeling, data visualization, and query profiling. Studio also has the ability to save, import and export notebooks. This allows you to share your findings with your team as you go. (Awesome!)
$> docker pull datastax/dse-studio:latest
Step 4. Run The Containers
We will execute the docker run command to create new containers from pulled images. Once the container is created you won’t have to perform the run command again (i.e. use docker start/stop container).
Start the DataStax Server Container
The -name parameter provides a human readable reference for the container operations, but can also be used as a resolvable hostname for communication between containers (required for later steps).
As stated before, the DataStax distribution comes with some additonal integrations for building different models, making it highly sought after for implementing domain driven design patterns.
- The -g flag starts a Node with Graph Model enabled
- The -s flag starts a Node with Search Engine enabled
- The -k flag starts a Node with Spark Analytics enabled
$> docker run -e DS_LICENSE=accept --memory 4g --name my-dse -d datastax/dse-server -g -s -k
Start DataStax Studio Container
The -link parameter provides a way to map a hostname to a container IP address. In this example, we map the database container to Studio container by providing its name, ‘my-dse’. Now Studio can connect to the database using the container name instead of an IP address. (can also do user-defined bridge)
The -p flag is for mapping ports between container and host. The 9091 port is the default address for Studio.
$> docker run -e DS_LICENSE=accept --link my-dse -p 9091:9091 --memory 1g --name my-studio -d datastax/dse-studio
Step 5. Connecting Studio
Visit the Studio page that is now hosted on your docker container by entering http://localhost:9091 in your browser.
Select the “Working with CQL” notebook
When opening the notebook you will see a connection exception. This is because the default connection in studio uses localhost. You will need to change localhost to the DataStax Server Container name ‘my-dse’.
Select the “Edit Connection” button to modify the Host/IP connection field from localhost to the DSE Server Container‘s name, ’my-dse’. Finish by performing a Test. If successful, save the new connection settings.
You now have a fully functional Sandbox Environment! Hopefully you found this tutorial helpful and learned little bit about Cassandra and Docker along the way.
Cassandra Docker Cheat Sheet
Common commands used when working with Cassandra and Docker:
========= Status =========
$> docker ps
$> docker stats
$> docker inspect my-dse
$> docker exec -it my-dse nodetool status
========== Logs ==========
$> docker logs my-dse
$> docker exec -it my-dse cat /var/log/cassandra/system.log
$> docker logs my-studio
==== Start/Stop/Remove ====
$> docker start my-dse
$> docker stop my-dse
$> docker remove my-dse
======= Additional =======
&> docker inspect my-dse | grep IPAddress
#CQL (Requires IPAddress from above)
$> docker exec -it my-dse cqlsh [IPAddress]
$> docker exec -it my-dse bash