Deploying Zookeeper Cluster

Published in

Kafka in Kubernetes

2 min readMay 21, 2020

This is a third article in our Running Kafka in Kubernetes publication. Refer to https://medium.com/kafka-in-kubernetes/automating-storage-provisioning-41034e570928 to see the previous article.

Motivation

Kafka uses Zookeeper to manage service discovery for Kafka Brokers that form the cluster. Zookeeper sends changes of the topology to Kafka, so each node in the cluster knows when a new broker joined, a Broker died, a topic was removed or a topic was added, etc.. In short, Zookeeper is the manager of Kafka operation.

Deploying the Zookeeper

Zookeeper is deployed using YAML. You can refer to this https://raw.githubusercontent.com/fernandocyder/k8s-practice/master/04.kafka-expose-service/01.zookeeper.yaml on how we deploy the Zookeeper.

There are few key points that make this deployment working.

First, we deploy using StatefulSet

Here is what the above statefulset does:

Set 3 replicas.
Each Pod has init and main container. The init container is responsible for defining the zookeeper node unique ID. Thanks for StatefulSet, the hostname of each Pod is always fixed and has index number suffix, for example: zookeeper-0, zookeeper-1, etc..
Define volume claim template using ‘nfs-client’ storage class and mount two volumes.

Next, define the config map which is attached to the StatefulSet Pod environment variables.

The above environment variables are used by the docker-entrypoint.sh file used by the zookeeper docker.

Notice this variable: ZOO_SERVERS: “server.1=zookeeper-0.zookeeper:2888:3888;2181 server.2=zookeeper-1.zookeeper:2888:3888;2181 server.3=zookeeper-2.zookeeper:2888:3888;2181”

The above host names are valid as we deploy Zookeeper using StatefulSet and have Headless Service.

What is Headless Service? Refer to https://kubernetes.io/docs/concepts/services-networking/service/#headless-services.

You can use a headless Service to interface with other service discovery mechanisms, without being tied to Kubernetes’ implementation. For headless Services, a cluster IP is not allocated, kube-proxy does not handle these Services, and there is no load balancing or proxying done by the platform for them.

Here is the YAML to create the Headless Service