Cassandra in Kubernetes

Headless Services, KubeDNS, Init Containers, Lifecycle hooks and other K8s concepts on the way

Alex Punnen
Better Software
6 min readApr 2, 2020

--

There is pretty good and standard documentation regarding deploying Cassandra in K8s as a Stateful Set. This is almost sufficient for us. Let’s take a little dive when we do the setup.

Why StatefulSet and not a Deployments or others?

1. We want each pod to have its own Persistent Volume. Though Deployments are for generally stateless applications, it can also have a PV; but that is then shared across all the replicas/pods. That is one PV for all the pods. Not what we need.

2. We need each pod to have a unique Network IP. This is because each Cassandra server sets up a ring topology network connection with other servers. That is each pod needs to know the unique network IP of the other pods, as each server is a pod in Kubernetes. This way even if a pod is restarted, the other nodes can react to the predictable topology change.

These are the two key features that a Stateful Set offers

From the blog https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/

StatefulSets are valuable for applications that require one or more of the following. Stable, unique network identifiers. Stable, persistent storage.

One requirement that is needed by a StatefulSet is to create a Headless Service

StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods. You are responsible for creating this Service.

Please follow these gist for the below tutorial https://gist.github.com/alexcpn/6a919b500211b5c253a4e28010325a1d#file-cassandra_green_statefulset-yaml

Note the two highlighted parts above. ClusterIP is given as None for this service.

From official site — https://kubernetes.io/docs/concepts/services-networking/service/#headless-services

For headless Services, a cluster IP is not allocated, kube-proxy does not handle these Services, and there is no load balancing or proxying done by the platform for them. How DNS is automatically configured depends on whether the Service has selectors defined

We have used the selector field to use the DNS allocation per pod.

From the official docs https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/

For headless Services that define selectors, the endpoints controller creates Endpoints records in the API, and modifies the DNS configuration to return records (addresses) that point directly to the Pods backing the Service.

For a headless service, this resolves to multiple answers, one for each pod that is backing the service, and contains the port number and the domain name of the pod of the form auto-generated-name.my-svc.my-namespace.svc.cluster-domain.example.

Let us see the DNS records of the pods created by the StatefulSet cassandra-0, cassandra-1, cassandra-2.

It will be of form <podname>.headless service name.namespace.cluster domain. Example cassandra-0/1/2.cassandra.green.svc.cluster.local

If we exec into a pod say cassandra-0 and ping, it will reach the IP of the pod. Basically each pod can contact the other nodes through KubeDNS.

Let us use the DNS query utility dig to check this. First, let's find the Service IP of KubeDNS (coredns) in our cluster

and use that DNS server to query ‘cassandra-0.cassandra.green.svc.cluster.local’ using the KubeDNS.

This gives the pod IP.

In each Cassandra pod, the KubeDNS IP will be configured as default in /etc/resolv.conf.

And so each Cassandra pod will use KubeDNS to interface to the other and form a ring. Note that the Cassandra Docker Image takes the initial seed as an environment variable. All pods are able to reach this and connect and then discover each other through Cassandra’s discovery mechanism.

In regular Cassandra server, the seeds are configured in cassandra.yaml. But in Dockerised version, it is taken as an environment variable and the cassandra.yaml is updated at pod startup with the environment variable values.

Now we come to another requirement. Say that we need to make the Cassandra server have password-based authentication. This is just one line in cassandra.yaml.

Lets do this one step at a time. Let’s ignore for the time that this cassandra.yaml gets modified during pod startup. So we have a configuration file that needs some change and how do we load it to a pod. We can create a k8s configmap with the file contents and copy from the configmap to a file inside the pod in a directory.

There is one problem with this; the destination directory becomes read-only and it contains only the file that is copied. We don’t want this.

So we can copy to another folder and use an IntiContainer to copy from that folder the file to our destination folder; without overwriting the destination folder.

What is an Inti container?

Init containers are started before the pod is started and run to completion and can contain utilities or setup scripts not present in an app image. https://kubernetes.io/docs/concepts/workloads/pods/init-containers/

Let’s use this. We mount the configmap contents into a temporary directory and use the init container to copy the particular file from there to the target directory. This way the target directory contents do not get over-written.

However, since our Cassandra docker image is regenerating the cassandra.yaml at pod startup, the above, which may work for other scenarios, don’t work here.

However we can use the Pod lifecycle hook PostStart to manipulate the settings what we want. I am just doing a crude append to end of the cassandra.yaml, but we could do this more elegantly via sed.

Note — Since we are isolating each Cassandra instance by a namespace, it is trivial to spin up more test clusters, just by changing the namespace in the Stateful Set

How to connect

User a docker container which has the latest CQLsh and use one of the IP from the above nodetool status command to connect to the CQL of Cassandra

You can use the following examples to play around with https://docs.datastax.com/en/dse/6.7/cql/cql/examples/cyclist_id-rename.html

Reference

https://kubernetes.io/docs/tutorials/stateful-application/cassandra/#creating-a-cassandra-headless-service

--

--

Alex Punnen
Better Software

SW Architect/programmer- in various languages and technologies from 2001 to now. https://www.linkedin.com/in/alexpunnen/