What the hell is going on in Kafka — part 1

5 min readJun 3, 2020

Have you ever wondered what is going on in Kafka? We can read all about Kafka and how it is one of the best tools for Pub Sub mechanisms, but what exactly goes on in Kafka? Well, I’m starting this blog series with various touch-points on Kafka and how it works. Before I start, I would like to thank draw.io where I made all the pictures present in this blog. Also, I’m using Kafka logo in these pictures, so shout-out to Kafka Logo creator. Now I believe pictures say a lot more than words and honestly I enjoy using draw.io so let there be pictures. I’ve tried to cite as many sources as possible, if I miss anywhere please leave a comment regarding it. With this, let’s start.

What is Kafka?

It is a high throughput distributed messaging system.

There are two keywords here

High Throughput — Kafka can handle terabytes of data efficiently
Distributed — Kafka can scale horizontally. It means

you can add more machines to distribute data load.
In case of machine failure, there is no data loss.

Now let’s list down various problems that can occur in a messaging system and how Kafka is handling them. But before delving into that, let’s see how Kafka stores a message so we can relate the problems of messaging system and Kafka’s solution.

A disclaimer first — This is just a visualization and not how Kafka exactly stores the data. I’ll show the actual storage example later.

We are using the filesystem example to visualize how the data is stored on a broker. Now some definitions:

Broker — is a place where all the data of Kafka is stored. It is the physical machine. In the picture above, Topics folder is on a broker.
Topic — compared to the traditional messaging system, it is queue name in which publisher pushes the message. But this is more of a logical entity, and it expands to one or more brokers. In the picture above, every subdirectory to Topics is a Kafka Topic.
Partition — is where the actual data is stored. A topic has many partitions is the relation between topic and partitions. As said earlier topic expands over brokers, the partitions are divided among broker almost equally. For example, if we have 3 brokers and a topic has 5 partitions, then 2 brokers will get 2 partitions and the last partition will go to the remaining broker. In picture above 0.master, 1.master and 2.replica, 3.replica are master and replica partitions.
Master Partition — is the main partition on which Kafka writes the messages and consumers read the messages from.
Replica Partition — is the fail-safe mechanism in case a broker crashes. How is it done? I’ll tell you in a minute.
Message — is the data we are crazy about. Partition stores the incoming messages in a log format. In every message, one can find 3 things:

ID — also called Offset is the unique identifier of the message. It starts with 0 for each partition and every partition has a separate incremental sequence.
Timestamp — is the time at which a message gets inserted into Kafka. It is set by a producer when it publishes the message or by Kafka when it inserts the message.
Content — is the actual message.

Now that we know how the messages are stored in Kafka we can see some problems related to the traditional messaging system and how Kafka solves them.

Issues with Traditional Messaging Systems

Lets take all the above cases and understand what is going on?

Single host hosting the queue dies.
Here, the link between the producer and the consumer is broken. The whole purpose gets defeated. Since Kafka is a distributed system, even if a single machine dies, Kafka will check all the leader partitions on that broker and will re-elect their respective leaders on other brokers. A simple Diagram and example to make it clear.

2. Buggy Consumer
Lets take an example, we have a consumer which has processed 10 messages and we come to know the consumer to be faulty. Now, after correcting the code, the consumer should be able to process those 10 messages in the correct manner. But the broker has deleted the message as it has fulfilled its purpose. For rectification, the producer has to send those messages again, making it a tightly coupled system.
But Kafka, as we saw earlier, writes the messages on a log file with precise offsets. After rectification, the consumer can start from any offset on the partition.

3. Slow Consumption/No Consumption
Messaging systems being single host require its consumers to be fast and reliable. But sometimes lack of resources or application unresponsiveness lead to slow or no consumption. This can be dangerous if unmonitored as it can bloat up your host consuming all the disk space.
Kafka solves this problem using log retention. It is a configurable value which will tell Kafka to keep a message for a certain period. After that the message gets deleted, even if the consumers haven’t consumed it. This value is generally large (Default is 7 days). This may seem like a dangerous property, but Kafka works on the principle of Dumb Broker/ Smart Consumer. It means broker doesn’t care if the consumers have read the message or not. It simply keeps the message for retention period and then deletes it.

Now one can debate these problems can still occur with Kafka. Its true Kafka is good, even great, but not invincible. It is up to us developers to have monitoring set up in place and have alerts around this, so we act hastily over any problems. Kafka helps us mitigate these problems, but not fully eradicate them.

What’s next?

So till now we

talked about some elements of Kafka and their definitions.
visualized how Kafka stores the data
talked about issues with messaging systems and how Kafka helps mitigate them.

With this, I would like to end the first part of this series. I hope you like it. The second part will be available soon. In that we’ll go over Kafka distributed system, a bit on Zookeeper and the actual storage of data in Kafka.

What the hell is going on in Kafka — part 1

What is Kafka?

Issues with Traditional Messaging Systems

What’s next?

Written by Deepanshu Gupta