Apache Kafka Guide #5 Consumer Group

Paul Ravvich
Apache Kafka At the Gates of Mastery
6 min readDec 24, 2023

--

Hi, this is Paul, and welcome to the #5 part of my Apache Kafka guide. Today we’re going to talk about how Consumers Group works.

When Kafka needs scaling, an application will utilize multiple consumers. These consumers collectively read data as part of a consumer group. For instance, consider a Kafka topic divided into five partitions. Within this setup, there’s a consumer group named consumer-group.

Each consumer in a group reads from their exclusive partitions

This group comprises three consumers, labeled one, two, and three. Each consumer is assigned to read from specific, unique partitions within this group. For example, Consumer 1 might read from partitions zero and one, while Consumer 2 might handle partitions two and three. Consumer 3, on the other hand, would read from partition four. This arrangement ensures that each of the three consumers — one, two, and three — collectively cover all the partitions, with each one reading from a distinct partition. As a result, the group collectively reads the entire Kafka topic.

Too many consumers

What happens if your consumer group has more consumers than there are partitions?

Consider topic A as an example, with partitions 0, 1, and 2. Suppose we have a consumer group application with consumers 1, 2, and 3. In this scenario, the mapping is straightforward: each consumer reads from one partition. If we introduce another consumer to this group, it’s entirely possible. Then, Consumer 4 will become inactive. This means it will be a standby consumer, not reading from any of the topic partitions. It’s important to understand that this is a normal occurrence.

Consumer 4 won’t assist Consumer 1 in reading from Partition 0. Instead, it will remain inactive.

Additionally, it’s entirely feasible to have multiple consumer groups focusing on the same topic. Let’s illustrate this with an example. Returning to our subject, which has three partitions, we initially encounter Consumer Group Application 1, comprising two consumers. These consumers will divide the task of reading from the topic’s partitions: Consumer 1 will read from two partitions, while Consumer 2 will read from just one, and this arrangement works well.

Subsequently, we have Consumer Group Application 2 with three consumers. Each consumer in this group will read from a unique partition. Lastly, if there’s a third consumer group with a single consumer, this individual will be responsible for reading from all the partitions of the topic.

Also, it is entirely possible to have several consumer groups focusing on a single topic.

Therefore, it’s completely fine to have numerous consumer groups engaged with the same subject. Let’s consider an example.

Returning to our topic with three partitions, we encounter our initial consumer group, which I’ve labeled Consumer Group Application 1, containing two consumers. These consumers will distribute their reading tasks across our topic partitions. Consumer 1 will read from two partitions, while Consumer 2 will focus on just one partition, which is perfectly acceptable.

Next, we introduce a second consumer group application, comprising three consumers. Each consumer here will read from a separate partition.

Furthermore, if there’s a third consumer group with a single consumer, this individual will read from all topic partitions.

Thus, it’s entirely acceptable for a topic to be engaged by multiple consumer groups, where each partition is read by numerous consumers. However, within a consumer group, only one consumer is assigned to each partition.

The rationale for having several consumer groups becomes clear when we revisit the car example I mentioned earlier. We discussed a location service and a notification service, both utilizing the same data streams from cars’ GPS. This scenario necessitates one consumer group for each service — one for the location service and another for the notification service.

When establishing distinct consumer groups, as we will explore in the programming segment, we utilize the consumer property called group.id to name a consumer group. This allows consumers to identify the group to which they belong.

Consumer Offset

These groups are even more influential than we realize. In them, we can determine consumer offsets, but what exactly are these? Essentially, Kafka stores the offsets where a consumer group has been reading. These offsets are stored in a special Kafka topic named __consumer_offset, underscored to denote its internal nature. For clarity, let’s consider an example illustrating the significance of consumer offsets.

Imagine a Kafka topic. Here, I’ve depicted offsets. We’ve written extensively on this topic, reaching an offset of 1.

Within the consumer group, a consumer periodically commits offsets. Once committed, the consumer can continue reading from the offset forward.

This process is crucial. After processing data received from Apache Kafka, the consumer intermittently commits these offsets and instructs Kafka brokers to log this information in the consumer offset topic. By doing so, we inform the Kafka broker about our reading progress in the Kafka topic.

Why do this periodically? If a consumer fails and then restarts, it can resume reading from where it left off, thanks to the recorded consumer offsets. Kafka recognizes, for instance, that in partition two, the reading reached offset 4. Upon restarting, it will begin sending data from this point forward.

This ability to resume from a specific point, a consequence of the consumer group offsets, provides a mechanism to replay data from the point of failure or crash.

  1. Maintains the record of the reading position for a consumer group.
  2. The committed offsets are stored in a topic named __consumer_offsets.
  3. After processing the data received from Kafka, a consumer in a group should periodically commit the offset (the Kafka broker, not the group, will record this in __consumer_offsets).
  4. If a consumer fails, it can resume reading from where it last stopped, thanks to the saved consumer offsets.

Consumers Delivery Semantics

We offer different semantics for delivery to consumers, which we will delve into later in this course. By default, Java Consumers commit offsets in an “at least once” mode automatically. However, if you opt for manual committing, you get three delivery semantics. I’ll elaborate on these later in the course.

At least once

Firstly, the “at least once” approach means offsets are committed right after processing a message. If there’s an error during processing, the message might be read again. This can lead to duplicate message processing, so it’s important to ensure processing is idempotent. That means reprocessing messages won’t adversely affect your system.

At most once

Secondly, the “at most once” option involves committing offsets as soon as messages are received by consumers. If processing fails, some messages may be lost, since they won’t be reread due to the earlier offset commitment. Essentially, we encounter messages at most once in this scenario.

Exactly once

Lastly, there’s “exactly once” processing, where messages are processed only once. In Kafka-to-Kafka workflows, this involves using the transactional API, which is straightforward, especially with the Kafka streams API. For Kafka to external systems, an idempotent consumer is necessary. This introduction covers these key concepts.

See you in the next part of the guide!

Paul Ravvich

Thank you for reading until the end. Before you go:

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!