Consumer Groups in Apache Kafka

Ashwitha B G
4 min readSep 28, 2020

--

In the previous blogs, we saw introduction to Kafka and fault tolerance in Kafka by replicating partitions in different brokers. In this blog, let's see how we can scale the consumers and also how is the fault tolerance done in Kafka.

Before we could understand how consumers are scaled, let’s understand how consumers keep track of the last offset it has read.

Offset management

As we saw earlier, Each consumer of the topic reads messages at its own pace. As we keep a bookmark for the last page we read in the book, consumers keep track of the last message it has read. And commits the last read offset to the Kafka cluster for storing it in __consumer_offset topic. So even if a consumer goes down, it can fetch the last read offset from this topic and start reading from that offset.

offset management

As we can scale both producers and brokers, we will have lots of messages accumulated in the Kafka topic. We might start needing to scale the consumers to read more messages. How do we scale consumers?

Consumer Groups

Instead of having single consumers to read all the messages, we will have consumer groups. Here we have independent consumers working together as a team.

Consumer groups

Each consumer in the consumer group reads messages from one or more partitions. Due to this, the overall performance of reading data increases. In the above example, there are two consumer groups reading messages from partitions of the topic. The first consumer group has two consumers and each consumer reads from two partitions. And in the second consumer group, there are 4 consumers and each consumer reads from separate partitions.

Extra consumer in the consumer group

Suppose if we have extra consumers in the consumer group, then those consumers will be idle. No two consumers in the same group can read from the same partition. Only when one of the consumers goes down, then this consumer can be used for reading data.

Consumer group with 2 consumers

Let’s take the above example, where there are 2 consumers in the consumer group. Consumer 1 is pulling messages from Partition 1 and consumer 2 pulling messages from partition 2. Suppose consumer 2 goes down, do the consumer group stop reading from partition 2?

Consumer group with 1 consumer

If any consumer goes down, one of the consumers from the same group will be assigned to that partition. The process of doing this is called consumer rebalancing.

How do we know if a consumer goes down? That’s where the group coordinator comes in.

Group coordinator

We saw that the committed offset is stored in __consumer_offset topic partition. Leader brokers of this topic partition will be elected as the group coordinator. It receives the heartbeat of the consumers in the consumer group and helps in assigning partitions to the consumers in the consumer group. How do consumers read from topic partitions?

group coordinator
  1. When the consumer group is configured, consumers send the request to group coordinator to join the group.
  2. The first consumer that sends this request will be elected as the leader of the consumer group.
  3. leader of the consumer group assigns the subset of partitions to each of the consumers and sends this information to the group coordinator.
  4. The group coordinator is responsible for informing consumers which partition it should read from.

Summary

Kafka consumers are scaled by having multiple consumers in the consumer group. Each consumer in the consumer group reads from a separate partition. The group coordinator manages the consumer health check.

References

--

--