Installing Apache Kafka on Rocky Linux or CentOS
Apache Kafka is an open-source distributed event streaming platform designed for high-throughput, fault-tolerant, and real-time data streaming. Apache Kafka acts as a publish-subscribe system, allowing applications to publish and consume data streams in a scalable and fault-tolerant manner. Kafka is widely used for ingesting, storing, and processing large volumes of data, making it a popular choice for organizations dealing with large-scale data streams.
In this guide, we will learn how to install Apache Kafka on Rocky Linux / CentOS
Prerequisites
To setup Apache Kafka, we need to install Java JDK on our operating system. In this guide, we will use OpenJDK 17. On Rocky Linux, we can install it using the following command.
sudo dnf install java-17-openjdk java-17-openjdk-devel
Download Kafka
To Install Apache Kafka on Rocky Linux or CentOS, first we need to download the binary directly from the official website here. In this guide we will use the Kafka version 3.5.1 with Scala 2.13.
Download the binary file using the wget command:
wget https://downloads.apache.org/kafka/3.5.1/kafka_2.13-3.5.1.tgz
Extract the downloaded archive
tar -xvzf kafka_2.13-3.5.1.tgz
Copy the extracted folder to the /usr/local directory
mv kafka_2.13-3.5.1 /usr/local/kafka
Kafka as Background Service
Kafka relies on Zookeeper to store the cluster metadata. To keep Kafka is running in the background, create a systemd service for both Kafka and Zookeeper.
Zookeeper service
Create a zookeeper service by creating a “zookeeper.service” file in /etc/systemd/system
sudo nano /etc/systemd/system/zookeeper.service
Below is the content of the zookeeper.service file
[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]
Type=simple
ExecStart=/usr/bin/bash /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/bin/bash /usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Kafka Service
Create a kafka service by creating a “kafka.service” file in /etc/systemd/system
sudo nano /etc/systemd/system/kafka.service
Below is the content of the kafka.service file
[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service
[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/jre-17-openjdk"
ExecStart=/usr/bin/bash /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/bin/bash /usr/local/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
Running the service via systemctl
After creating both service file, we need to reload the systemd daemon first with the following command
sudo systemctl daemon-reload
Start both zookeeper and kafka service
sudo systemctl start zookeeper
sudo systemctl start kafka
Use the enable command to ensure both service restart after system reboot
sudo systemctl enable zookeeper
sudo systemctl enable kafka
We can check if the service is running with the “systemctl status” command
sudo systemctl status zookeeper kafka
If both service is active you should get this output
Configuration
The current kafka is running with default configuration. To configure both kafka and zookeeper settings:
Navigate to the /config subdirectory inside the kafka installation location. In this guide we used “/usr/local/kafka”
cd /usr/local/kafka/config
Zookeeper configuration
To modify the zookeeper configuration we can edit the “zookeeper.properties” file with a text editor
Here are the inside of the zookeeper.properties file. Modify the configuration to suits your needs.
# the directory where the snapshot is stored.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# disable the per-ip limit on the number of connections since this is a non-production config
maxClientCnxns=0
# Disable the adminserver by default to avoid port conflicts.
# Set the port to something non-conflicting if choosing to enable this
admin.enableServer=false
# admin.serverPort=8080
We can change the directory where zookeeper store its data by modifying the "dataDir" value. We can change the port which zookeeper uses by modifying the "clientPort" value.
For example in this setting zookeeper will use port 2181 for client connection and store data at /tmp directory
Kafka configuration
To modify the kafka configuration we can edit the “server.properties” file with a text editor
Here are the following notable configuration which we will modify
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
############################# Socket Server Settings #############################
# The address the socket server listens on. If not configured, the host name will be equal to the value of
# java.net.InetAddress.getCanonicalHostName(), with PLAINTEXT listener name, and port 9092.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://localhost:9092
############################# Log Basics #############################
# A comma separated list of directories under which to store log files
log.dirs=/tmp/kafka-logs
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
############################# Log Retention Policy #############################
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
############################# Zookeeper #############################
zookeeper.connect=localhost:2181
broker.id is the unique id for each broker.
listeners is the address which the socket server listens on. If you want to ensure your kafka instance is accessible from other machine, modify the localhost:port into the current server’s address and the port that we want to use. For example, if your local IP is 192.168.69.69 and you want to use the default port (9092)
listeners=PLAINTEXT://192.168.69.69:9092
For this tutorial, we will use localhost for simplicity.
log.dirs is the directory where kafka store the log files. For example, currently we are storing it in the /tmp directory.
num.partitions is the default number of partition which is created per topic. For now we will left it as 1
log.retention.hours is the number of hours before the logs are auto deleted
zookeeper.connect is the zookeeper client address.
After modifying the configuration restart both service with the “systemctl restart” command
sudo systemctl restart zookeeper
sudo systemctl restart kafka
Firewall Configuration
If you want your kafka and zookeeper to be accessible from remote, we need to whitelist the used ports. In our server we need to add the port into the public zone on firewalld. Use the — permanent flag to keep our rules persistent
firewall-cmd --permanent --zone=public --add-port=9092/tcp
firewall-cmd --permanent --zone=public --add-port=2181/tcp
sudo firewall-cmd --reload
Topics
In Apache Kafka, Topics is a logical channel which are used to organize and categorize data streams. Each topic is identified by a unique name, and when messages are published to a topic, they are appended to that topic’s log. Topics in Kafka are typically used to represent different types of data or events.
To create a topic we can use the provided shell script in the /bin directory
cd /usr/local/kafka/bin
Use the kafka-topics.sh script
./kafka-topics.sh --create --bootstrap-server [server]:[port] --replication-factor 1 --partitions 1 --topic [topic-name]
Change the [server] and [port] according to the listener value in previous configuration. Change [topic-name] into your desired topic name. For example, our kafka instance is at localhost:9092 and we want to create the topic “topic1”
./kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic topic1
After creating a topic we can directly test it using the provided script. First we will publish message to our topic:
./kafka-console-producer.sh --broker-list localhost:9092 --topic topic1
With this we can input any text and publish it. But first lets open up another terminal session on the same machine and then run the consumer script
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic example-student --from-beginning
The text we published via the producer script should be received on the consumer window.
In the end of this tutorial, we learned how to setup and configure Apache Kafka and Zookeeper on our own Rocky Linux / CentOS machine.