Journey of Apache Kafka & Zookeeper Administrator ( Part 4 )

Davinder Pal
Analytics Vidhya
Published in
4 min readAug 23, 2020
Random Photo :)

June 2019 ( continued )

In the Previous Article, I have explained the different aspects of Apache Kafka and In this article, I will be covering optimization for Apache Kafka aspects.

After the Installation of Apache Kafka and Monitoring Setup, I was quite confident that It will work and I will be able to handle good traffic with it.

That time, I read one blog from LinkedIn: apache-Kafka-2-million-writes-second-three-cheap-machines so I decided that I will break that number with my settings and Honestly, I did break that Number, I was able to reach 2.3 Million / Second throughput on 3 cheap Kafka nodes.

Here is the story about 1–2 weeks of Painful & Rigorous testing/optimizations of Apache Kafka.

I know, I should share Kafka test results here but It’s not possible right now, As I left the company so I will share my experience here.

Basic Things Required for Testing
1. Cheap Kafka Nodes with Kafka 2.1.1
2. Detailed Monitoring on Kafka Nodes
3. Separate Test Machine

Cheap Kafka Nodes
I had 3 nodes of Kafka (installed/configured with my ansible-playbook ). The configurations were
* 6 Vcore
* 10 Gbps Network
* 12–24 GB RAM
* 100 GB Disk ( Backed by SAN, over 10 Gbps Network )

JVM Settings ( OpenJDK 1.8 )
6 GB Heap Size

Different Topic Configurations

kafkaperf1R1: Partition 1 with Replica 1
kafkaperf1R3: Partition 1 with Replica 3
kafkaperf3R1: Partition 3 with Replica 1
kafkaperf3R3: Partition 3 with Replica 3
kafkaperf6R1: Partition 6 with Replica 1
kafkaperf6R3: Partition 6 with Replica 3
kafkaperf9R1: Partition 9 with Replica 1
kafkaperf9R3:
Partition 9 with Replica 3
kafkaperf12R1:
Partition 12 with Replica 1
kafkaperf12R3: Partition 12 with Replica 3
kafkaperf15R1: Partition 15 with Replica 3
kafkaperf15R3: Partition 15 with Replica 3
kafkaperf18R1: Partition 18 with Replica 3
kafkaperf18R3: Partition 18 with Replica 3

How to Perform Tests
create a topic
bin/kafka-topics.sh --create --topic kafkaperf1R1 --bootstrap-server localhost:9092
Now 1st Round of Tests,

bin/kafka-producer-perf-test.sh --topic kafkaperf1R1 --num-records 100000 --record-size 100 --throughput 100000 --producer-props acks=1

check the results and check Kafka Monitoring System, is Kafka choking somewhere? any spikes or something unusual in Monitoring Charts.

Very Important: Document All tests in Microsoft One Note or Similar Tool so you can later compare or show/export to PDF if required.

OS & Network Optimizations

- name: OS Tuning
sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
state: present
loop:
- { "key":"vm.max_map_count", "value": "{{ kafkaVmMaxMapCount }}" }
- name: Networking Tuning
sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
sysctl_set: yes
state: present
reload: true
loop:
- { "key": "net.ipv4.tcp_max_syn_backlog", "value": "40000" }
- { "key": "net.core.somaxconn", "value": "40000" }
- { "key": "net.ipv4.tcp_sack", "value": "1" }
- { "key": "net.ipv4.tcp_window_scaling", "value": "1" }
- { "key": "net.ipv4.tcp_fin_timeout", "value": "15" }
- { "key": "net.ipv4.tcp_keepalive_intvl", "value": "60" }
- { "key": "net.ipv4.tcp_keepalive_probes", "value": "5" }
- { "key": "net.ipv4.tcp_keepalive_time", "value": "180" }
- { "key": "net.ipv4.tcp_tw_reuse", "value": "1" }
- { "key": "net.ipv4.tcp_moderate_rcvbuf", "value": "1" }
- { "key": "net.core.rmem_default", "value": "8388608" }
- { "key": "net.core.wmem_default", "value": "8388608" }
- { "key": "net.core.rmem_max", "value": "134217728" }
- { "key": "net.core.wmem_max", "value": "134217728" }
- { "key": "net.ipv4.tcp_mem", "value": "134217728 134217728 134217728" }
- { "key": "net.ipv4.tcp_rmem", "value": "4096 277750 134217728" }
- { "key": "net.ipv4.tcp_wmem", "value": "4096 277750 134217728" }
- { "key": "net.core.netdev_max_backlog", "value": "300000" }

Kafka Parameters Tuning

### Production Optimization Parameters
### if nothing is set then it will use default values.
kafkaDefaultReplicationFactor: 3
kafkaMinInsyncReplicas: 2
kafkaBackgroundThread: 10
kafkaMessagesMaxBytes: 1000012 # 1MB approx
kafkaReplicaFetchMaxBytes: 2000000 # this should be higher than kafkaMessagesMaxBytes
kafkaQuededMaxRequests: 500
kafkaNumReplicaFetchers: 1
kafkaNumNetworkThreads: 6
kafkaNumIoThreads: 8
kafkaSocketSendBufferBytes: 102400
kafkaSocetReceiveBufferBytes: 102400
kafkaSocetRequestMaxBytes: 104857600
kafkaNumPartitions: 1
kafkaNumRecoveryThreadsPerDataDir: 1
kafkaOffsetsTopicReplicationFactor: 3
kafkaTransactionStateLogReplicationFactor: 3
kafkaTransactionStateLogMinIsr: 3
kafkaLogFlushIntervalMessages: 10000
kafkaLogFlushIntervalMs: 1000
kafkaLogRetentionHours: 168
kafkaLogSegmentBytes: 2073741824 # need to ask expert kafka, should we use default 1GB here or 2GB or more
kafkalogRetentionCheckIntervalMs: 300000
kafkaGroupInitRebalanceDelayMs: 3

Now 2nd Round of Tests, I know rerun of all tests again is a painful process but someone has to do it, otherwise, no one will know what happens when someone puts stress on Kafka, how Kafka behaves with same settings. Again document all tests into One Note or a similar tool.

Now 3rd Round of Tests,
First, check previous runs of Tests and you will know that Once you reach Partition 6 and high with Replica 3, you are getting very good latency and 1 million throughputs, let’s increase the throughput to an even further 2 million and Now this time run tests only on Partition 6 and higher configurations. Again document all tests into One Note or a similar tool.

Now 4th / 5th / 6th Round of Tests,
First check which was the best cases from the 3rd Round of Tests and Reduces the Number of Topics that were underperforming if required. let’s increase the throughput to 2.1 / 2.2 / 2.3 and keep an eye on Monitoring System where the system is choking + any unusual spikes. Again document all tests into One Note or a similar tool.

Here is the last Scenario,
Instead of Increasing Throughput, try Increasing the Record-Size to 512 KB / 1 MB. And see what happens to Network, are you able to reach your Network Limits. This will be our 7th & 8th Round of Tests, Again document all tests into One Note or a similar tool.

Interestingly, CPU, Memory & JVM things like GC Time + Heap Used, Never changed even when I was reaching 2.3 Million Target. After 2.3 Million, I was seeing that Avg. Latency & 99% percentile was in seconds and It was a little too much for my requirements.

The journey will continue on Next Topic ( Yahoo Kafka Manager aka CMAK )!

--

--