Deploying Kafka Broker Cluster

Fernando Karnagi
Kafka in Kubernetes
4 min readMay 21, 2020
Nice sky for streaming the light

This is a forth article in our Running Kafka in Kubernetes publication. Refer to https://medium.com/kafka-in-kubernetes/deploying-zookeeper-cluster-3acdcc7ed340 to see the previous article.

Motivation

Deploying Kafka in Kubernetes is challenge, especially when we need to achieve multi-brokers and separation of connection between internal and external applications.

Kafka Docker Image

As of when this article is published, there is no official open-source Kafka docker published by Apache Kafka community. There is one published by Confluent.io, but let’s not use that, for this exercise and demo purpose.

We created our own Docker image. You can refer to the Dockerfile here

The main logic of starting the Kafka broker is defined in the ‘docker-entrypoint.sh’.

The Docker image is published here https://hub.docker.com/repository/docker/fernandokarnagi/kafka-server. Currently it is on fernandokarnagi/kafka-server:v7.

If you notice, we define few key environment variables:

  1. ZOOKEEPER. This defines all the zookeeper nodes with their listening port number. This is the value that we use: zookeeper-0.zookeeper:2181,zookeeper-1.zookeeper:2181,zookeeper-2.zookeeper:2181
  2. HOSTNAME. This defines the hostname of each Pod. Since we use StatefulSet, it is guaranteed that each Pod has its own fixed hostname, for example: kafka-0, kafka-1, etc.
  3. LOG_DIRS. This defines the kafka log directory.
  4. CONF_DIR. This defines the directory to look up the configuration file.

Tell Me More About Listeners and Advertised Listeners

“The key thing is that when you run a client, the broker you pass to it is just where it’s going to go and get the metadata about brokers in the cluster from. The actual host and IP that it will connect to for reading/writing data is based on the data that the broker passes back in that initial connection — even if it’s just a single node and the broker returned is the same as the one it’s connected to.” quoted from https://www.confluent.io/blog/kafka-listeners-explained/ .

Why don’t we use Load Balancer? https://medium.com/code-tech/kafka-in-aws-with-ssl-offloading-using-load-balancer-c337da1435c3 explains why:

  1. Placing a load balancer in front of Kafka doesn’t make sense for load distribution purpose as the Kafka client is broker IP aware and it does the load balancing, more like a client-side load balancer
  2. It may throttle the data transfers between the consumers and the brokers

So, we need to expose the each Pod running individual Broker with its own unique ID, so it can be accessed directly by the client.

Then How to Allow External Access

When running on a PaaS or on-premise, a first simple approach to expose each instance of our Kafka cluster to the outside world is by creating a Services of type LoadBalancer which needs to route traffic to a specific Pod of the StatefulSet. Luckily, when the StatefulSet controller creates a Pod, it also adds a Label statefulset.kubernetes.io/pod-name with the unique name of the Pod. Using this label, we can create exposed Services for each Pod of our Kafka StatefulSet. With this, we need to use external Load Balancer and the DNS name as many as the number of running Pods. This might not be feasible.

Another approach is to have one external address for each of your Kafka Pods and each has its own port. In order to achieve this, we will need some kind of load balancer that will route traffic to the correct Kafka Pod. For this, we will use Ingress.

Reference: https://tothepoint.group/accessing-kafka-on-google-kubernetes-engine-from-the-outside-world/

Separate External and Internal Access

Let’s separate internal and external listeners. Internal will be used internally to communicate among the brokers and also used by other Pod from the same cluster. External will be used by other external application outside from the K8S cluster.

The first thing to do is to configure Kafka Broker to support this requirement. Take a look at the server properties generator below

echo "broker.id=$SVR_INDEX" echo "listeners=INTERNAL://0.0.0.0:9092,EXTERNAL://0.0.0.0:$PORT_INDEX" echo "advertised.listeners=INTERNAL://$HOSTNAME.kafka:9092,EXTERNAL://broker.mydomain.com:$PORT_INDEX," echo "listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"

Deploy the Kafka StatefulSet

Deploy that StatefulSet YAML. Below screenshot shows the Kafka cluster with 3 replicas.

Let’s also create one separate service for each Kafka Pod

This is the service details

Basically it is a Headless Service which targets to a particular Pod and listen to the EXTERNAL listener of that particular Pod.

Kubernetes Ingress does not Allow TCP

Refer https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/, Ingress does not support TCP or UDP services. For this reason this Ingress controller uses the flags — tcp-services-configmap and — udp-services-configmap to point to an existing config map where the key is the external port to use and the value indicates the service to expose using the format: <namespace/service name>:<service port>:[PROXY]:[PROXY]

We also create one for this purpose https://raw.githubusercontent.com/fernandocyder/k8s-practice/master/04.kafka-expose-service/05.kafka-tcp-service.yaml.

This is our configuration in our Rancher

Conclusion

In this article, you have seen how we created a Kafka Cluster and separate internal and external access. In the next article https://medium.com/kafka-in-kubernetes/accessing-kafka-broker-87aa7928a6e9, you will see how we perform the verification.

--

--