Unlocking the Need for Speed: The Secrets Behind Kafka’s Blazing Performance

Nishant Sharma
3 min readOct 3, 2023

--

Franz Kafka was a novelist and short-story writer based in Prague, who is widely regarded as one of the major figures of 20th-century literature.

Kafka is an open-source distributed event streaming platform optimized for high throughput. So when we say it’s fast, it means kafka’s ability to move a lot of data with a very low latency (100s MB/s with 100Ks of messages/s at a very low latency as low as 2ms). Think of it as, very large pipe moving Liquid. The greater the pipe’s diameter, the greater the amount of liquid it can convey.

There are several design decisions that made this possible, some of which are out of the scope of this article. We will cover the 2 most important design decisions which carry the most weight.

Two fundamental principles that lie at the core of Kafka’s blazing fast performance are Sequential I/O and Zero Copy Principle. In this comprehensive exploration, we’ll delve into these principles to understand why Kafka is hailed as a high-speed data processing powerhouse.

1. Sequential I/O:

There is a common misconception that disk access is slower than memory access, but it largely depends on access patterns. Disk access is slower in case of random access since it takes time to physically move data from one point to another. But in case of sequential access, since the arm doesn’t needs to jump around, it is much faster to read & write blocks of data, one after the another. Kafka takes advantage of this fact. It stores messages in partitions, and each partition contains messages in the order they were produced. When Kafka writes data, it appends it to the end of a partition. This sequential write pattern allows Kafka to take full advantage of the underlying hardware’s ability to read and write data sequentially, which is significantly faster than random access.

2. Zero Copy Principle:

Traditional Data Copying:

In conventional data transmission systems, the process of relocating data from one location to another (such as from memory to network sockets) often entails multiple data copying steps. These copying operations consume CPU cycles and memory resources, which in turn result in a slowdown of the overall data transfer process.

Efficient Data Transfer in Kafka:

Kafka employs a technique referred to as “sendfile” or “scatter-gather” to achieve Zero Copy. When a producer generates a message, Kafka initially stores it in the page cache within memory. Instead of performing redundant data copies, Kafka efficiently transfers the data from the page cache directly to the network socket or to consumers. This approach eliminates the need for unnecessary data copying, thereby reducing CPU usage and significantly enhancing performance.

In summary, Kafka’s speed can be attributed to its smart use of Sequential I/O, ensuring data is read and written in an orderly fashion, and Zero Copy, which minimizes data copying overhead.

References :

ByteByteGo — https://www.youtube.com/@ByteByteGo/ [ Best resource for system design ]

Kafka official Documentation — https://kafka.apache.org/documentation/

About the Author:

I’m Nishant Sharma, a backend developer with two years of work experience, currently working at Navyug InfoSolutions. When I’m not coding, I’m a barbecue enthusiast and a fervent football fan. Connect with me on LinkedIn for professional updates. Thank you!

--

--