Kafka Partitions and Consumer Groups in 6 mins

Ahmed Gulab Khan
Javarevisited
Published in
6 min readFeb 28, 2022

In my previous article, we had discussed how Kafka works and went through some basic Kafka terminology. In this article, we would go over how Partitions and Consumer Groups work in Kafka.

If you haven’t gone through my previous article or if you’re new to Kafka, I recommend you to go through it as it’d help you get a basic understanding of how Kafka works.

So, what is a Partition?

Before talking about partitions we need to understand what a topic is. In Kafka, a topic is basically a storage unit where all the messages sent by the producer are stored. Generally, similar data is stored in individual topics. For example, you can have a topic named “user” where you only store the details of your users, or you can have a topic named “payments” where you only store all the payment-related details. A topic can be further subdivided into multiple storage units and these subdivisions of a topic are known as partitions.

By default, a topic is created with only 1 partition and whatever messages are published to this topic are stored in that partition. If you configure a topic to have multiple partitions then the messages sent by the producers would be stored in these partitions such that no two partitions would have the same message/event.

All the partitions in a topic would also have their own offsets (If you don’t know what an offset is, I recommend you check out this article where I have discussed it)

As an example, a producer producing messages to a Kafka topic with 3 partitions would look like this:

Now, what is a Consumer Group?

A bunch of consumers can form a group in order to cooperate and consume messages from a set of topics. This grouping of consumers is called a Consumer Group. If two consumers have subscribed to the same topic and are present in the same consumer group, then these two consumers would be assigned a different set of partitions and none of these two consumers would receive the same messages.

Note: Consumer Groups can help attain a higher consumption rate, if multiple consumers are consuming from the same topic.

Now, let’s go through a few scenarios to better understand the above concepts

Scenario 1: Let’s say we have a topic with 4 partitions and 1 consumer group consisting of only 1 consumer. The consumer has subscribed to the TopicT1 and is assigned to consume from all the partitions. This scenario can be depicted by the picture below:

Scenario 2: Now let’s consider we have 2 consumers in our consumer group. These 2 consumers would be assigned to read from different partitions — Consumer1 assigned to read from partitions 0, 2; and Consumer2 assigned to read from partitions 1, 3.

Note: Kafka assigns the partitions of a topic to the consumer in a consumer group, so that each partition is consumed by exactly one consumer in the consumer group. Kafka guarantees that a message is only ever read by a single consumer in the consumer group.

Since the messages stored in individual partitions of the same topic are different, the two consumers would never read the same message, thereby avoiding the same messages being consumed multiple times at the consumer side. This scenario can be depicted by the picture below:

But, what if the number of consumers in a consumer group is more than the number of partitions? Check out Scenario 3

Scenario 3: Let’s say we have 5 consumers in the consumer group which is more than the number of partitions of the TopicT1, then every consumer would be assigned a single partition and the remaining consumer (Consumer5) would be left idle. This scenario can be depicted by the picture below:

Okay, and what if you want multiple consumers to read from the same partition? Check out Scenario 4

Scenario 4: If you want to assign multiple consumers to read from the same partition, then you can add these consumers to different consumer groups, and have both of these consumer groups subscribed to the TopicT1. Here, the messages from Partition0 of TopicT1 are read by Consumer1 of ConsumerGroup1 and Consumer1 of ConsumerGroup2. This scenario can be depicted by the picture below:

All of that seems okay, but when should you add consumers to the same consumer group VS when should you add the consumers to a different consumer group?

  • If your goal is to attain a higher throughput or increase the consumption rate for a particular topic, then adding multiple consumers in the same consumer group would be the way to go. You can have at max consumers equal to the number of partitions of a topic in a consumer group, adding more consumers than the number of partitions would cause the extra consumers to remain idle, since Kafka maintains that no two partitions can be assigned to the consumer in a consumer group. When the the number of consumers consuming from a topic is equal to the number of partitions in the topic, then each consumer would be reading messages from a single topic, and the message consumption would be happening in parallel, thereby increasing the consumption rate.
  • But if you have a use case where you want multiple consumers to consume the same messages of a topic, then having multiple consumer groups subscribed to the topic, and adding consumers as per your requirements to these consumer groups would be something you can make use of.

Some common Q&A’s:

Q. Is having a Consumer Group mandatory in Kafka?

A. Yes, it is mandatory to specify Kafka which consumer would belong to which consumer group. If you do not set the consumer group id in your app, you will get an exception. If you start a consumer to consume from a topic using the Kafka CLI command, then a new random consumer group is created with the name console-consumer-<some_random_number> and the consumer automatically falls under this consumer group.

Q. If a new consumer joins a consumer group, how would the partitions be assigned to the new consumer?

A. Let’s say we have 1 topic with 3 partitions; and 1 consumer group consisting of 2 consumers. Out of the 3 partitions, 2 would be assigned to one consumer and the remaining partition would be assigned to the other consumer. Now, consider these two cases

  • Case 1: If a new consumer joins the consumer group, rebalancing happens and each consumer is now assigned to a single partition (since we have equal number of partitions and consumers).
  • Case 2: If a consumer goes down, then there’d be only 1 consumer left in the consumer group and all the partitions would be assigned to this consumer through rebalancing.

Q. What is rebalancing in Kafka?

A. Rebalancing is the re-assignment of partition ownership among consumers within a given consumer group such that every consumer in a consumer group is assigned one or more partitions. Rebalancing happens when:

  • A new consumer joins the consumer group
  • An existing consumer goes down
  • New partitions are added
  • An existing consumer is considered dead by the Group coordinator

Q. What is a Group coordinator?

A. A Group coordinator is a kafka broker which receives heartbeats from all consumers of a consumer group. Every consumer group has a group coordinator.

Q. What is a Group Leader?

A. The first consumer that joins a consumer group is called the Group Leader of that consumer group

Q. Can we decrease the number of partitions in a topic?

A. Apache Kafka doesn’t support decreasing the partitions of a topic. Since, all the data sent to a topic is sent to all the partitions and removing one of them means data loss.

More Kafka articles in the series that you can go through:

  1. Apache Kafka: A Basic Intro
  2. 3 Simple steps to set up Kafka locally using Docker

Follow for the next Kafka blog in the series. I shall also be posting more articles talking about Software engineering concepts.

You can also find me on
GitHub
Dev.to

--

--