Installing Apache Kafka on Rocky Linux or CentOS

Published in

Bina Nusantara IT Division

6 min readSep 29, 2023

Apache Kafka is an open-source distributed event streaming platform designed for high-throughput, fault-tolerant, and real-time data streaming. Apache Kafka acts as a publish-subscribe system, allowing applications to publish and consume data streams in a scalable and fault-tolerant manner. Kafka is widely used for ingesting, storing, and processing large volumes of data, making it a popular choice for organizations dealing with large-scale data streams.

In this guide, we will learn how to install Apache Kafka on Rocky Linux / CentOS

Prerequisites

To setup Apache Kafka, we need to install Java JDK on our operating system. In this guide, we will use OpenJDK 17. On Rocky Linux, we can install it using the following command.

sudo dnf install java-17-openjdk java-17-openjdk-devel

Download Kafka

To Install Apache Kafka on Rocky Linux or CentOS, first we need to download the binary directly from the official website here. In this guide we will use the Kafka version 3.5.1 with Scala 2.13.

Download the binary file using the wget command:

wget https://downloads.apache.org/kafka/3.5.1/kafka_2.13-3.5.1.tgz

Extract the downloaded archive

tar -xvzf kafka_2.13-3.5.1.tgz

Copy the extracted folder to the /usr/local directory

mv kafka_2.13-3.5.1 /usr/local/kafka

Kafka as Background Service

Kafka relies on Zookeeper to store the cluster metadata. To keep Kafka is running in the background, create a systemd service for both Kafka and Zookeeper.

Zookeeper service

Create a zookeeper service by creating a “zookeeper.service” file in /etc/systemd/system

sudo nano /etc/systemd/system/zookeeper.service

Below is the content of the zookeeper.service file

[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/usr/bin/bash /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/bin/bash /usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Kafka Service

Create a kafka service by creating a “kafka.service” file in /etc/systemd/system

sudo nano /etc/systemd/system/kafka.service

Below is the content of the kafka.service file

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/jre-17-openjdk"
ExecStart=/usr/bin/bash /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/bin/bash /usr/local/kafka/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target

Running the service via systemctl

After creating both service file, we need to reload the systemd daemon first with the following command

sudo systemctl daemon-reload

Start both zookeeper and kafka service

sudo systemctl start zookeeper
sudo systemctl start kafka

Use the enable command to ensure both service restart after system reboot

sudo systemctl enable zookeeper
sudo systemctl enable kafka

We can check if the service is running with the “systemctl status” command

sudo systemctl status zookeeper kafka

If both service is active you should get this output

Configuration

The current kafka is running with default configuration. To configure both kafka and zookeeper settings:

Navigate to the /config subdirectory inside the kafka installation location. In this guide we used “/usr/local/kafka”

cd /usr/local/kafka/config

Zookeeper configuration

To modify the zookeeper configuration we can edit the “zookeeper.properties” file with a text editor

Here are the inside of the zookeeper.properties file. Modify the configuration to suits your needs.

# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
# Disable the adminserver by default to avoid port conflicts.
# Set the port to something non-conflicting if choosing to enable this
admin.enableServer=false
# admin.serverPort=8080
We can change the directory where zookeeper store its data by modifying the "dataDir" value. We can change the port which zookeeper uses by modifying the "clientPort" value.

For example in this setting zookeeper will use port 2181 for client connection and store data at /tmp directory

Kafka configuration

To modify the kafka configuration we can edit the “server.properties” file with a text editor

Here are the following notable configuration which we will modify

############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

############################# Socket Server Settings #############################
# The address the socket server listens on. If not configured, the host name will be equal to the value of
# java.net.InetAddress.getCanonicalHostName(), with PLAINTEXT listener name, and port 9092.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://localhost:9092

############################# Log Basics #############################
# A comma separated list of directories under which to store log files
log.dirs=/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

############################# Log Retention Policy #############################
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168

############################# Zookeeper #############################
zookeeper.connect=localhost:2181

broker.id is the unique id for each broker.

listeners is the address which the socket server listens on. If you want to ensure your kafka instance is accessible from other machine, modify the localhost:port into the current server’s address and the port that we want to use. For example, if your local IP is 192.168.69.69 and you want to use the default port (9092)

listeners=PLAINTEXT://192.168.69.69:9092

For this tutorial, we will use localhost for simplicity.

log.dirs is the directory where kafka store the log files. For example, currently we are storing it in the /tmp directory.

num.partitions is the default number of partition which is created per topic. For now we will left it as 1

log.retention.hours is the number of hours before the logs are auto deleted

zookeeper.connect is the zookeeper client address.

After modifying the configuration restart both service with the “systemctl restart” command

sudo systemctl restart zookeeper
sudo systemctl restart kafka

Firewall Configuration

If you want your kafka and zookeeper to be accessible from remote, we need to whitelist the used ports. In our server we need to add the port into the public zone on firewalld. Use the — permanent flag to keep our rules persistent

firewall-cmd --permanent --zone=public --add-port=9092/tcp
firewall-cmd --permanent --zone=public --add-port=2181/tcp
sudo firewall-cmd --reload

Topics

In Apache Kafka, Topics is a logical channel which are used to organize and categorize data streams. Each topic is identified by a unique name, and when messages are published to a topic, they are appended to that topic’s log. Topics in Kafka are typically used to represent different types of data or events.

To create a topic we can use the provided shell script in the /bin directory

cd /usr/local/kafka/bin

Use the kafka-topics.sh script

./kafka-topics.sh --create --bootstrap-server [server]:[port] --replication-factor 1 --partitions 1 --topic [topic-name]

Change the [server] and [port] according to the listener value in previous configuration. Change [topic-name] into your desired topic name. For example, our kafka instance is at localhost:9092 and we want to create the topic “topic1”

./kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic topic1

After creating a topic we can directly test it using the provided script. First we will publish message to our topic:

./kafka-console-producer.sh --broker-list localhost:9092 --topic topic1

With this we can input any text and publish it. But first lets open up another terminal session on the same machine and then run the consumer script

./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic example-student --from-beginning

The text we published via the producer script should be received on the consumer window.

In the end of this tutorial, we learned how to setup and configure Apache Kafka and Zookeeper on our own Rocky Linux / CentOS machine.