Photo by Joshuva Daniel on Unsplash

A Step by Step Kafka Guide

Unlock the Power of Apache Kafka with The Official Docker Image

A detailed Guide To Build and Deploy Kafka With the official docker image in KRaft mode

Towards Data Engineering
6 min readMar 11, 2024

--

Do you remember our latest Kafka Article ?
If you don’t it’s okay, I will tell you ^_^.

Few days ago,we celebrated the new big event of introducing the official Kafka image in the latest release of Kafka 3.7

Since this docker image is new, I didn’t find enough articles about the subject.

So let’s do a detailed guide about deploying a Kafka cluster using docker-compose.

Prerequisites

Setup a Cluster of 3 nodes

The docker-compose file

We’re gonna use the official docker-compose example published by Apache Kafka in Github

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
version: '2'
services:
kafka-1:
image: apache/kafka:latest
hostname: kafka-1
container_name: kafka-1
ports:
- 29092:9092
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: 'broker,controller'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093'
KAFKA_LISTENERS: 'PLAINTEXT://:19092,CONTROLLER://:9093,PLAINTEXT_HOST://:9092'
KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-1:19092,PLAINTEXT_HOST://localhost:29092
KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
CLUSTER_ID: '4L6g3nShT-eMCtK--X86sw'
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'

kafka-2:
image: apache/kafka:latest
hostname: kafka-2
container_name: kafka-2
ports:
- 39092:9092
environment:
KAFKA_NODE_ID: 2
KAFKA_PROCESS_ROLES: 'broker,controller'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093'
KAFKA_LISTENERS: 'PLAINTEXT://:19092,CONTROLLER://:9093,PLAINTEXT_HOST://:9092'
KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-2:19092,PLAINTEXT_HOST://localhost:39092
KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
CLUSTER_ID: '4L6g3nShT-eMCtK--X86sw'
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'

kafka-3:
image: apache/kafka:latest
hostname: kafka-3
container_name: kafka-3
ports:
- 49092:9092
environment:
KAFKA_NODE_ID: 3
KAFKA_PROCESS_ROLES: 'broker,controller'
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093'
KAFKA_LISTENERS: 'PLAINTEXT://:19092,CONTROLLER://:9093,PLAINTEXT_HOST://:9092'
KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-3:19092,PLAINTEXT_HOST://localhost:49092
KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
CLUSTER_ID: '4L6g3nShT-eMCtK--X86sw'
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'

As you can see, this docker compose is deploying 3 Kafka nodes in a plaintext mode.

There is multiple choices of docker-compose files, it depends on what you want to deploy.

Also, notice that there is only Kafka services to deploy, no need for Zookeeper anymore ( It’s important to mention that Zookeeper is planned to be removed in Apache Kafka 4.0).

So Kafka is gonna work in KRaft mode.

Kafka in KRaft Mode

Apache Kafka Raft (KRaft) is the consensus protocol that was introduced in KIP-500 to remove Apache Kafka’s dependency on ZooKeeper for metadata management.

This greatly simplifies Kafka’s architecture by consolidating responsibility for metadata into Kafka itself, rather than splitting it between two different systems: ZooKeeper and Kafka.

KRaft mode makes use of a new quorum controller service in Kafka which replaces the previous controller.

the 2 modes, image by Confluent

If we use Kafka in KRaft mode, we do not need to use ZooKeeper for cluster coordination or storing metadata. Kafka coordinates the cluster itself using brokers that act as controllers. Kafka also stores the metadata used to track the status of brokers and partitions.

1- The first step is creating an ID to identify the cluster

The ID is used when creating logs for the brokers you add to the cluster.

If we return to the docker-compose file, we will see the CLUSTER_IDgenerated for the 3 nodes:

CLUSTER_ID: '4L6g3nShT-eMCtK--X86sw'

2- Setup the necessary configuration for each broker

If we return to the docker-compose above, we will see

  • The three nodes got roles of broker and controller
KAFKA_PROCESS_ROLES: 'broker,controller'
  • A unique node.id is specified for each node in the cluster
KAFKA_NODE_ID: 1
KAFKA_NODE_ID: 2
KAFKA_NODE_ID: 3
  • KAFKA_CONTROLLER_QUORUM_VOTERS to specify a set of brokers that will participate in the quorum for electing the controller, using the format nodeId@host:port. Here, the 3 brokers are included in the list.
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka-1:9093,2@kafka-2:9093,3@kafka-3:9093'

3- Set up log directories ( where the Kafka data logs are stored) for each node

If we return to the docker-compose above, we will see that the KAFKA_LOG_DIRS is set to the same value for the 3 nodes

KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'

Other Parameters & Configs

Let’s run through the other parameters in the docker-compose file:

  • image
image: apache/kafka:latest

It does specify the docker image to be used for the container. In this case, apache/kafka:latest indicates that the container will use the latest version of the Official Apache Kafka image available on Docker Hub.

  • hostname
hostname: kafka-1
hostname: kafka-2
hostname: kafka-3

It sets the hostname inside the container. This is particularly useful for network communication between containers, especially in a setup where services need to communicate with each other using hostnames rather than IP addresses.

In this case, each Kafka broker is given a unique hostname (kafka-1, kafka-2, kafka-3) for identification within the internal network.

  • container_name
container_name: kafka-1
container_name: kafka-2
container_name: kafka-3

This specifies a custom name for the container. It’s a way to assign a readable, user-defined name to the container instead of the auto-generated ones. The names here (kafka-1, kafka-2, kafka-3) match their respective hostnames for consistency and ease of identification.

  • ports
ports:
- 29092:9092
ports:
- 39092:9092
ports:
- 49092:9092

This maps the container’s ports to the host’s ports.The port mappings are defined as <host_port>:<container_port>, allowing external applications (e.g., Kafka clients) to connect to the Kafka brokers.

The other Environment Variables

  • KAFKA_LISTENERS:
KAFKA_LISTENERS: 'PLAINTEXT://:19092,CONTROLLER://:9093,PLAINTEXT_HOST://:9092'

This config configures the listeners for network connections. In this case, there are three: one for internal communication between brokers (PLAINTEXT), one for controller communication (CONTROLLER), and one exposed to the host (PLAINTEXT_HOST).

  • KAFKA_INTER_BROKER_LISTENER_NAME:
KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'

This config sets the listener name for inter-broker communication. Here, it’s set to PLAINTEXT, indicating that brokers communicate with each other using the PLAINTEXT listener.

  • KAFKA_TRANSACTION_STATE_LOG_MIN_ISR:
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1

This is the minimum number of in-sync replicas (ISRs) that must be available for the transaction state log to be writable. Set to 1 in this configuration.

  • KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR:
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1

This config sets the replication factor for the transaction state log. Like the offsets topic, it’s set to 1 for this simple setup.

  • KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

This is the replication factor for the offsets topic. Since this is a minimal setup, it's set to 1.

  • KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0

Sets the initial delay before the first rebalance. It's set to 0 here, meaning rebalances start immediately without delay.

  • KAFKA_LISTENER_SECURITY_PROTOCOL_MAP:
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'

This config maps listener names to security protocols. This setup indicates that the CONTROLLER and PLAINTEXT listeners use the PLAINTEXT protocol.

Le’ts Deploy the Cluster

Let’s deploy our cluster by running this command:

docker-compose up -d

If we go check in Portainer:

image by Author

We will find the 3 nodes running.

Let’s check our Cluster Health

Choose one broker, and go inside the container and run this command:

./kafka-topics.sh --create --topic "NewTopic" --bootstrap-server kafka-1:9092

If all is good, this will create a new topic named “NewTopic”

Let’s check if the topic is created with this command:

./kafka-topics.sh --list --bootstrap-server kafka-1:9092
Inside node kafka-1

And Voila, our topic is successfully created !

Cleaning up

Don’t forget to remove the running containers:

docker-compose down

If you have any questions, feedback, or would like to share your experiences, please feel free to reach out in the comments section.

Clap my article 50 times 👏, that will really help me out and boost this article to others ✍🏻❤️.

Thank you 🫶!

References

--

--

Towards Data Engineering

A Data Specialist 📊 Eager to learn ? then follow me !