In the Land of Streams — Kafka Part 3: Offsets and How to Handle Them

A Kafka Streaming Ledger

Giannis Polyzos
4 min readDec 13, 2022
https://www.vecteezy.com/free-vector/squirrel-cartoon

The Blob post series consists of the following parts:
- Part 1: A Producer’s Message
- Part 2: The Rise of the Consumers
- Part 3: Offsets and how to handle them (this blog post)
- Part 4: My Cluster is Lost!! — Embracing Failure

This part aims to cover the following:

  1. What’s the role of offsets in Kafka
  2. What are the caveats when working with offsets
  3. Different approaching for handling offsets

It’s a cycle — the message lifecycle

Up to this point, we have seen the whole message lifecycle in Kafka — PPC (Produce, Persist, Consume)

One thing really important though — especially when you need to trust your system provides the best guarantees when processing each message exactly once — is committing offsets.

Fetching messages from Kafka, processing them and marking them as processed, by actually providing such guarantees has a few pitfalls and is not provided out of the box.

This is what we will see next, i.e what do I need to take into account to get the best possible exactly-once processing guarantees out of my applications?

Committing Offsets Scenarios

We will take a look at a few scenarios for committing offsets and what caveats each approach might have.

Scenario 1: Committing Offsets Automatically

This is the default behavior with enable.auto.commit set to true. The caveat here is that the message is consumed and the offsets will be committed periodically, BUT this doesn’t mean the message has been successfully processed. If the message fails for some reason, its offset might have been committed and as far as Kafka is concerned that message has been processed successfully.

Scenario 2: Committing Offsets Manually

Setting enable.auto.commit to false takes Kafka consumers out of the “autopilot mode” and it’s up to the application to commit the offsets. This can be achieved by using the commitSync() or commitAsync() methods on the consumer API.
When committing offsets manually we can do so either when the whole batch returned from the poll() method has finished processing in which case all the offsets up to the highest one will be committed or we might want to commit after each individual message is done with it’s processing for even stronger guarantees.

Commit/Message

Committing offsets per message

Commit/Batch

Committing offsets per batch

This gives us control over how message offsets are committed and we can trust that we will wait for the actual processing to finish before committing the offset.
For those who want to account for (or at least try to) every unhappy path there is also the possibility that things fail in the commit process itself. In this case the message will be reprocessed

Scenario 3: Idempotency with External Storage

You can use an external data store and keep track of the offsets there (for example like cassandra).

Consuming messages and using something like a transaction for both processing the message as well as committing the offsets will guarantee that either both will succeed or fail and thus idempotency is ensured.

One thing to note here is that offsets are now stored in an external datastore. When starting a new consumer or a rebalancing takes place you need to make sure your consumer fetches the offsets from the external datastore.

One way to achieve this can be adding a ConsumerRebalanceListener and when onPartitionsRevoked and onPartitionsAssigned methods are called store (commit) or retrieve the offsets from the external datastore.

Wrapping Up

In this post, we saw the importance of offsets. As key takeaways here:

  1. Consuming a message is different from actually processing it successfully
  2. Auto-committing offsets can have a negative impact on your application guarantees

and how consuming the messages from actually processing is different.

We also reviewed a few different of how you might want to approach committing offsets back to Kafka and different caveats you might encounter with each approach.

Check Next: Part 4 My Cluster is Lost!! — Embracing Failure

--

--

Giannis Polyzos

Staff Streaming Product Architect @ Ververica ~ Stateful Stream Processing and Streaming Lakehouse https://www.linkedin.com/in/polyzos/