Accessing Kafka Brokers from Load Balancer

Fernando Karnagi
Kafka in Kubernetes
4 min readJun 5, 2020
3400 Newberry Access Rd, North Platte, NE 69101, USA, United States

Motivation

We have Kafka services deployed as StatefulSet in our K8S cluster, and we need to expose our Kafka Services to external client, as consumer and producer of messages.

Our Kafka services are powered by three (3) brokers and three (3) zookeepers.

We would like to expose our Kafka brokers to external clients using Load Balancer.

A bit about Kafka Architecture

Kafka consists of Records, Topics, Consumers, Producers, Brokers, Logs, Partitions, and Clusters. A Kafka Topic is a stream of records. A topic has a Log which is the topic’s storage on disk. A Topic Log is broken up into partitions and segments. The Kafka Producer API is used to produce streams of data records. The Kafka Consumer API is used to consume a stream of records from Kafka. A Broker is a Kafka server that runs in a Kafka Cluster. Kafka Brokers form a cluster. The Kafka Cluster consists of many Kafka Brokers on many servers.

Reference: http://cloudurable.com/blog/kafka-architecture/index.html

Kafka Brokers

A Kafka cluster is made up of multiple Kafka Brokers. Each Kafka Broker has a unique ID (number). Kafka Brokers contain topic log partitions. Connecting to one broker bootstraps a client to the entire Kafka cluster.

Each broker holds a number of partitions and each of these partitions can be either a leader or a replica for a topic. All writes and reads to a topic go through the leader and the leader coordinates updating replicas with new data. If a leader fails, a replica takes over as the new leader. Please refer to https://sookocheff.com/post/kafka/kafka-in-a-nutshell/ for more details about the concept of partitions and brokers. In short, a topic may have multiple partitions and each partition, not topic, has a leader. Leaders are evenly distributed among brokers. So, if you have multiple partitions in your topic you will have multiple leaders and your writes will be evenly distributed among brokers.

Kafka Listeners

Kafka, also has the listeners and advertised.listeners properties which grows some confusion on first users. To make it simple, listener is the network interface your server will bind, and advertised.listeners is the hostname or IP your server will register itself on zookeeper and listen to requests. If you put a hostname in there, your clients WILL have to use the hostname to connect. The advertised.listeners url is the one your clients will use to bootstrap the connection. Once connection is made, your client will get a connection to zookeeper to get other brokers urls.

Kafka on Kubernetes, Lesson to Learn

This article https://medium.com/@tsuyoshiushio/configuring-kafka-on-kubernetes-makes-available-from-an-external-client-with-helm-96e9308ee9f4 gives much information on how the author configured Kafka on Kubernetes.

In summary, here is the key take away:

  1. Multiple broker Pods cannot be exposed by a single Cluster IP service
  2. External IP and Port must be set in the advertised.listeners broker configuration on each Pod
  3. For each advertised.listener, there has to be the corresponding listener broker configuration.
  4. advertised.listeners and listeners must be uniquely defined on each broker.

Proposed Architecture

Based on the above articles, here is the design of our Kafka cluster in K8S with three (3) nodes.

At the Pod level, we defined three (3) Pod StatefulSet to run Zookeeper, and three (3) Pod StatefulSet to run Kafka broker. Each Zookeeper Pod will have its own name, registered in the Kubernetes internal DNS, so does each Kafka Pod. One Service (Cluster IP and NodePort) is created for each Kafka Broker.

Each broker is configured with two listeners, INTERNAL and EXTERNAL and listen to different ports.

At the Node level (we have three (3) nodes), each node exposes three (3) ports for each NodePort. We also defined three (3) separate load balancer.

With the above architecture, each Kafka’s EXTERNAL listener is configured with load balancer resolvable domain name and port.

Why NodePort, not Kubernetes Service Load Balancer

Refer to https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services---service-types, LoadBalancer: Exposes the service externally using a cloud provider’s load balancer. NodePort and ClusterIP services, to which the external load balancer will route, are automatically created.

So, LoadBalancer type of Service comprises of NodePort and ClusterIP, and the only thing it provides to developer to automate the creation of NodePort, ClusterIP, and the external load balancer, and members of load balancer. In addition tot Load Balancer type service is only supported by Cloud provider such as AWS, GKE, Azure, not supported by on-premise setup.

--

--