Everything We Need To Know About Kafka

Manisha Ekanayake
3 min readJul 1, 2022

--

In our degree program, we studied the fundamentals of data structures like queues, lists, stacks, and everything. We also learned how those Structures differed from one another. There are software programs like Kafka, RabbitMQ, ActiveMQ, and HiveMQ Message Brokers to deal with when we are in the business world. Here, we’re attempting to map what we’ve learned into this result. As a result, what happened is, that we are unable to utilize every aspect or feature of the product. since we don’t know how to use this product specifically. Because we are attempting to map our fundamentals to these products. now we have an issue, when we scale the application, our topic or queue has multiple consumers. We are currently in some problems. Let’s see how we solve this problem.

*What is Kafka?

Kafka is a distributed data store optimized for ingesting and processing streaming data in real-time. Streaming data is data that is continuously generated by thousands of data sources, which typically send the data records in simultaneously. A streaming platform needs to handle this constant influx of data, and process the data sequentially and incrementally.

Let’s take some use cases and try to understand how Kafka performs in different scenarios.

Use case 1

use case1

In Kafka, we can have one topic and multiple partitions. In the above image, we can see, Employee topic and 3 partitions. A project consumer group counts as a single consumer. Messages are being taken by a single consumer (Consumer 1) from all three partitions. Now we scale the application.

Use case 2

Use case 2

when we scale the application, we add one more consumer. Now what happens is, that Consumer 1 listens to Partition 0 and Partition 1. Consumer 2 listening to Partition 2. But we can predict this. This is completely random depending on how listeners and the topics are coordinated.

Use case 3

Use case 3

In use case 3, we have 3 consumers and 3 partitions. Here what happened is, each consumer reads from each partition.

Use case 4

Use case 4

Here, we add one more consumer to scale more. Since we have only 3 partitions, Kafka does not send any messages to consumer 4. This means consumer 4 will be idle. If we add more consumers, all of them are idle.

let’s assume if consumer 1 or consumer 2 crashes then there is already one consumer running so then Kafka immediately handed over to the next consumers. We can keep this consumer running but we need to decide whether it’s a valid use case or not.

Use case 5

Use case 5

If we open another consumer group to the same topic, what happened is in the project consumer group has 3 consumers. So, each consumer has to each partition. But the finance consumer group has only two consumers. So, consumer 2 listens to two partitions, and consumer 1 listens to one partition.

So how the consumer knows where do I take the messages? Kafka support retention. This means they maintain consumer offset. Based on that Kafka knows where to send messages.

Kafka Rule: The number of consumers inside the consumer group should be equal to or less than the number of partitions in the topic.

Reference : https://youtu.be/wP-FMNuO3D0

--

--