So You Want To Configure The Perfect DB Cluster Inside A Kubernetes Cluster
TL;DR: A guide for setting up and configuring dedicated hosts for a database (or any) cluster inside a Kubernetes cluster. This is intended for people who know a thing or two about Kubernetes (mostly one).
In this article I cover the use case I had and how I used the various features of K8s to achieve my goal.
A colleague of mine wanted to add a tracing monitor in our k8s cluster. This included a server, web interface and a Cassandra DB backend. He thought it would take adding just a few pods, he had already downloaded the YAML files and was ready to go.
“Can you take a look before I install them?” he asked.
“NO WAY ON EARTH”, I thought to myself, “am I going to let that setup run on the same node with several of our critical, IO-heavy microservices and assume everything’s going to be fine”. So I told him we’d put it a separate node — something I’ve done more than a few times before… and we spent the better part of an afternoon editing YAML files.
So… How do we do it? Any why?
Our system is running smoothly with all the components randomly scattered around the 10 hosts we have, but we’re growing and performance doesn’t seem to be scaling as we’d like so adding more hosts just wouldn’t cut it anymore. Also, everyone needs to talk with the database so having 2 of our 3 members on the same host competing with each other for the disk access and the third trying to send all those MBs when the new component is taking 70% of the bandwidth of that host is not helping. I thought maybe we could allocate a dedicated machine with faster storage for each of them. There’s no need to install a new server and configure the firewall since k8s lets you do it with just a few simple lines of code.
The node side
To configure a dedicated k8s node there are two things to keep in mind. First, tell all other pods not to run on this node. And second, tell the scheduler to put the desired pod on this node.
kubectl taint nodes k8s-dedicated-node-1 dedicated=cassandra:NoSchedulekubectl label nodes k8s-dedicated-node-1 dedicated=cassandra
NOTE: Don’t forget to repeat these commands on the rest of the nodes.
The first command marks the node as
NoSchedule, so no new pods will be scheduled on it. We’ll come back to the
TIP: to remove existing pods from the node use
kubectl drain k8s-dedicated-node-1
The second line creates a label on this pod and we’ll need it to tell the pod where to run.
The pod side
Once we’re done with the host part, we need to make sure the pods will know to run on these nodes. Let’s add two configurations to the pod’s spec:
The tolerations section tells the k8s scheduler to ignore the taint, meaning the
NoExecute doesn’t apply for this pod. See the key and value? Remember we added them to the
NoExecute taint? Since there can be many taints, this is used to match only the taint we defined earlier, and “tolerate” the
Great, but k8s can still choose to put the pod on any host it wants. If the taint tells the scheduler where not to put pods, the node affinity section does the opposite and tells the scheduler where it can/needs to create the pod. It uses the node’s labels to choose which ones are matching candidates.
There are two supported types for
requiredDuringSchedulingIgnoredDuringExecution means it must be on the matching node(s).
preferredDuringSchedulingIgnoredDuringExecution means the pod should be created on these nodes but if it can’t (e.g. because the node doesn’t have enough memory) it can fallback to another node.
Earlier I mentioned Cassandra was a cluster and not a single pod. One more thing to consider is to run each member on a separate host. For this, the scheduler needs to consider not only which node to choose, but what else is running on this node. Behold the
antiPodAffinity, which says to avoid nodes that run other pods with the label
topologyKey is limiting the check to only the same hostname (e.i. node), meaning “allow only one Cassandra pod to run on this hostname”.
Sharp-eyed readers may have noticed I used preferred and not required here. If there are enough nodes the pods are guaranteed to be on separate nodes, and if in the future I might want to increase the cluster size and forget to allocate more nodes, two pods could share the same node.
On the other hand, you might want the pods to be stuck in pending state, so use required.
NOTE: Only the label/affinity pair is needed to put the pods on separate nodes, but the taint/tolerations pair makes sure you won’t find any stowaway pods competing for the host’s resources.
A deeper look into selectors
Like most configurations in Kubernetes, and in general, nothing goes without unforeseen consequences and tainting a node is no exception. You probably had the chance to run a service that uses a daemon set and run a pod on every node, so marking the node as
NoExecute will prevent them from running on this node.
My setup has a couple of nodes using different labels in the same format (e.g. dedicated=cassandra, dedicated=kafka). So to allow a monitoring agent to run on all nodes I defined the following toleration on the daemon set:
Let’s dive into how this works.
A lot of resources in k8s have relations to other resources (e.g. services and pods, pvc and pv, etc.). A common way to do it is to define a label on the target resource, and define a matcher on the source resource to describe how it finds the resource(s) it links to. In case of tolerations, the pod can have multiple rules that define what taints it can tolerate, and there’s some flexibility in how the match is done. Each rule works on a single key, and can match either a single value using the “Equal” operator or all values using the “Exists” operator. Lastly, the rule describes the effect it cancels. In our case we ignore the
Kubernetes lets you easily set up and connect together various components, but it won’t be long before you’ll require a bit more control over things. It also provides a lot of needed flexibility but that involves diving into more complex configuration.
I hope this guide gives you the tools and help you understand how to use them to achieve the perfect setup. Let me know in the comments below what you think of this article and if there are other things you’d like to learn about.