Apache Kafka Guide #37 Consumer Offset Reset Behavior

Paul Ravvich
Apache Kafka At the Gates of Mastery
3 min readApr 4, 2024

--

Apache Kafka Guide #36 Consumer Offset Reset Behavior

Hi, this is Paul, and welcome to the #37 part of my Apache Kafka guide. Today we will discuss Consumer Offset Reset Behavior.

Consumer Offset Reset Behavior

We’re about to delve into the topic of Consumer Offset Reset Behavior. This entails understanding that consumers are designed to continuously read from a log, a process we’ve already observed to some extent. However, should your application encounter a bug, your consumer can experience downtime. It’s important to note that, by default, Kafka maintains a data retention period of seven days.

This implies that if your consumer is offline for over seven days, then the offsets it intends to read from will become invalid.

  • Consumer expectation is reading log continuously.
  • When Kafka has a retention of 7 days and the consumer is down more than 7 days, the offset is invalid.

This discussion leads us to the topic of Consumer Offset Reset behavior.

The setting auto.offset.reset=latest, previously utilized, directs the consumer to commence reading from the log’s end. Conversely, the setting auto.offset.reset=earliest instructs the consumer to start reading from the log's beginning. auto.offset.reset=none, which results in an exception if no offset is found. This might be preferable in scenarios where processing should not continue without addressing a potential data recovery need first.

This explanation illuminates the functionality of the auto-offset reset mechanism. Furthermore, it’s important to note that consumer offsets may be lost. For Kafka versions older than 2.0, offsets are lost if no new data is read within a day. For versions above 2.0, the threshold is seven days without reading data. This loss can be managed through the broker setting offset.retention.minutes, which many adjust to extend the retention period, often to at least a month, thereby mitigating potential data loss.

The behavior of the Consumer:

  • auto.offset.reset=latest read from the end of the log.
  • auto.offset.reset=earliest read from the start of the log.
  • auto.offset.reset=none when an offset is not found then throw an Exception.

Consumer offset can be lost:

  • The Consumer hasn’t read new data for 1 day (Kafka < 2.0)
  • The Consumer hasn’t read new data for 7 days (Kafka ≥ 2.0)

You can adjust this period using the Broker setting: offset.retention.minutes

Replaying the data

To replay data for a consumer group, the initial step involves taking all the consumers within the specific group offline. Subsequently, utilize the kafka-consumer-groups command to ascertain the offsets required for your operation, followed by a restart of your consumer. Essentially, it is advisable to establish a suitable data retention period, extending beyond seven days if necessary, along with an appropriate offset retention period. It’s crucial to verify that the auto-offset reset behavior aligns with your expectations or requirements for your consumers. Additionally, should you encounter any unforeseen behavior, you have the option to employ Kafka’s replay functionality to address the issue.

To replay data for Consumer Group:

  • Take all Consumers from a specific group down
  • Use kafka-consumer-groups command to set you’re offset
  • Restart Consumer

Don’t forget:

  • Define the correct retention period and offset the retention period
  • Be sure the auto offset reset behavior that you need
  • Use reply in case of wrong behavior.

Thank you for reading until the end. Before you go:

Paul Ravvich

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!