Running Kafka on Production

Pradyumna Khawas
WorkIndia.in
Published in
4 min readOct 6, 2021
Apache Kafka

Apache Kafka is an open-source distributed event streaming platform used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. In this article, we will discuss the tips and tricks to make sure your Kafka setup is unbreakable.

To use Kafka in production, one needs to ensure that the setup is resilient, fault-tolerant and highly available. The setup should be easily scalable as well.

Zookeeper Ensemble

We will start with setting up the Zookeeper ensemble on multiple nodes. We must ensure that the number of Zookeeper nodes is 2n + 1 where n is any number greater than 0. Having an odd number of servers allows Zookeeper to perform majority elections for leadership. This setup can handle n faulty/ dead nodes and any value higher will bring the ensemble down.

Zookeepers do not require big machines. An instance with 1 core, 2GB RAM and disk with enough storage for logs will suffice. The Zookeepers are pretty stable and do not go down unless hardware limits are reached. Nevertheless, it would be advised to use at least 3 nodes in the ensemble.

Kafka Version 2.8.0 introduces an early access to Zookeeper-less Kafka as part of KPI-500 using the Kraft mode. The implementation is partially complete and thus is not to be used in production environments.

We will start with a basic Ubuntu VM for the nodes. Follow the steps given below to setup a Zookeeper node.

  1. Install Java 8.
sudo apt update && sudo apt install -y openjdk-8-jdk-headless

2. Download and extract the Kafka release.

wget https://downloads.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
tar -xzf kafka_2.13-3.0.0.tgz
cd kafka_2.13-3.0.0

3. Select and create a directory /path/to/the/dir/(if not created) where the data is stored.

4. Create a file myid in the data directory which contains an id for the node. The id can be 0,1,2,3….

<id>

3. Edit the Zookeeper config file config/zookeeper.properties

dataDir=/path/to/the/dir/clientPort=2181tickTime=2000
initLimit=5
syncLimit=2
maxClientCnxns=60
admin.enableServer=falseserver.0=1.2.3.4:2888:3888
server.1=1.2.3.5:2888:3888
server.2=1.2.3.6:2888:3888

Here, in the last section of the config, the entries of the nodes in the ensemble need to be added in the manner.

server.<id>=<ip-addr>:2888:3888

The tickTime, initLimit, syncLimit and maxClientCnxns can be set according to the requirements.

4. Start the Zookeeper.

export KAFKA_HEAP_OPTS="-Xms1536M -Xmx1536M"
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

A heap memory of 1–2GB is sufficient.

Kafka Cluster

A Kafka cluster consists of one or more Kafka Brokers. A Kafka broker receives messages from producers and stores them on disk keyed by unique offset. The broker allows consumers to fetch messages by topic, partition and offset.

In production, the cluster must have at least three nodes with topic replication factor of two or more. Having such a setup ensures availability during downtimes (of some brokers), makes running maintenance (adding, upgrading or removing brokers) become hassle-free. Kafka broker stores data in logs on the disk. Hence, having a fast disk helps. Always mount a separate disk for storing the Kafka broker logs. Setup the cluster in low latency networks.

Kafka brokers require medium sized instances. The size of the broker is directly dependent on size of the data and rate of the data flow. Adding new brokers to the cluster helps in the way that it will spread the partitions across the brokers and reduce the stress over the brokers. It is good to start with at least 4 cores and 16 GB of RAM.

We will start with a basic Ubuntu VM for the nodes. Follow the steps given below to setup a Kafka broker node.

  1. Install Java 8.
sudo apt update && sudo apt install -y openjdk-8-jdk-headless

2. Download and extract the Kafka release.

wget https://downloads.apache.org/kafka/3.0.0/kafka_2.13-3.0.0.tgz
tar -xzf kafka_2.13-3.0.0.tgz
cd kafka_2.13-3.0.0

3. Edit the Kafka broker config file config/server.properties

Set the broker.id as 0,1,2,3,… Specify the zookeeper.connect as hostname1:port1,hostname2:port2,hostname3:port3 and advertised.listener as the IP (private) of the instance. The log directories can be set using log.dirs.

broker.id=0
advertised.listeners=PLAINTEXT://1.2.3.7:9092
zookeeper.connect=1.2.3.4:2181,1.2.3.5:2181,1.2.3.6:2181
listeners=PLAINTEXT://:9092
log.dirs=/data/logs/kafka

Set the number of threads required for the various processes. Tune these according to the instance used. These helped us improve the performance of the brokers drastically.

num.network.threads=150
num.io.threads=80
background.threads=80
num.recovery.threads.per.data.dir=3
num.replica.fetchers=3

Set the size of the socket buffers, maximum request size and the fetch size.

socket.send.buffer.bytes=10485760
socket.receive.buffer.bytes=10485760
socket.request.max.bytes=41943040
message.max.bytes=10485760
fetch.max.bytes=41943040
replica.fetch.max.bytes=10485760
replica.fetch.response.max.bytes=41943040

Set the default replication factor and number of partitions based on the requirement. Replication factor of the offset and transaction topics need to be set to 2 or 3.

default.replication.factor=3
num.partitions=1

offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3

Set Kafka broker log configuration such as retention hours and retention check interval.

retention.hours=48
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000

4. Start the Kafka broker.

export KAFKA_HEAP_OPTS="-Xms8G -Xmx8G"
bin/kafka-server-start.sh -daemon config/server.properties

A heap memory of 6–8GB is more than sufficient.

At WorkIndia, we have a three node Zookeeper ensemble backing a three Kafka broker node cluster on AWS. Each zookeeper node is a t3.small EC2 instance and the Kafka brokers are i3.xlarge EC2 instances. This setup is able to handle ~1.8 million messages per hour. At the inception, the cluster gave us issues here and there. But after tweaking the configs (as mentioned above), we found the balance to run a critical Kafka cluster carefree. Ironically as it seems, we haven’t had a downtime on our cluster for a year till now.

--

--