How Kafka Is Fast?

Faruk BOZAN
emlakjet
Published in
3 min readJul 29, 2024

Kafka is one of the most popular solution for message broker, real-time data streaming etc systems. This article will cover how Kafka is so fast and has high throughput. Please notice that this article is not about “What is Kafka?” or “Kafka Architecture”. Let’s dive deep.

We will have a look Kafka’s speed and high throughput in five sections.

1 — Partitioning

Partition is sub component of a Kafka topic. You can create a topic with one or more partitions. Messages will be sent to partitions depending on producer message keys and Kafka algorithms. Consumer and consumers groups will consume the messages in parallel. Especially with consumer group, Kafka will has more efficiency to consume messages.

2 — Distributed Architecture

Kafka brokers have a huge advantage for distributed architecture. You can setup a CDN with Kafka. Kafka has a leader partition mechanism. Producer, produces message for leader partition and Kafka replicates the message to other replica partitions. In default settings, consumers consume message from leader partition. But you can change this behaviour. Consumer can consume the message from closest broker even it has not leader partition.

3 — Sequential I/O

Kafka has offset logic. Each message has a offset value. So you and Kafka can know which message is the next. Messages will consume to forward not backward. This logic is also known as FIFO. This mechanism is also prevent hard disk to seek with high cost.

4 — Page Cache / OS Buffer

Kafka doesn’t follow in memory cache strategy to store messages. Because Kafka is JVM based application and if memory usage is grow up, garbage collector will execute to clear RAM blocks. This is a cost. Instead of this, Kafka uses operating system cache, named page cache or os buffer. Kafka also use OS commands directly for execution.

5 — Zero Copy Pattern

Zero copy pattern is one of the most critical design decision of Kafka. With this design, Kafka skips some steps to get faster and throughput. Before ZC, a data follows the way Disk -> OS Buffer -> Kafka -> Socker Buffer -> NIC Buffer. NIC, stands for network interface card. But with ZC some steps are skipped and final way is Disk -> OS Buffer -> NIC Buffer. As you can see, there is no need for two steps anymore.

Conclusion

Kafka has several design patterns and decisions for it’s performance. We covered five of them and most known. You can also gain more performans with changing default settings like producer ACK modes and consumer delivery methods. See you in next post.

References For Graphics

https://blog.bytebytego.com

https://biriukov.dev/

--

--