Processing guarantees in Kafka

How do we guarantee all messages are processed?

How do we avoid or handle duplicate messages?

  • No guarantee — No explicit guarantee is provided, so consumers may process messages once, multiple times or never at all.
  • At most once — This is “best effort” delivery semantics. Consumers will receive and process messages exactly once or not at all.
  • At least once — Consumers will receive and process every message, but they may process the same message more than once.
  • Effectively once — Also contentiously known as exactly once, this promises consumers will process every message once.
  • Producer failure
  • Consumer publish remote call failure
  • Messaging system failure
  • Consumer processing failure

Kafka Consumer API

No guarantee

At most once

At least once

Effectively once

Idempotent writes

Transactions

Effectively once production of data to Kafka

producer.initTransactions();
producer.beginTransaction();
sourceOffsetRecords.forEach(producer::send);
outputRecords.forEach(producer::send);
producer.commitTransaction();

Effectively once consuming of data from Kafka

Effectively once in Kafka Streams

Side effects

Performance

Effectively once for the win?

Composing guarantees

Deduplication

Using a key to identify duplicates

Using a sequence number for deduplication

Using a datastore for deduplication

Summary

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How Far Can You Go? · Nu Echo

Open Source CLI Installer for Windows Inspired by Homebrew

3 Reasons Why Communication is an Essential Skill For IT Professionals

Journal Entry 6/30/2021

The Hunter’s Wife

How does routing work in Rails

6 micro:bit LED examples using the blocks editor

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andy Bryant

Andy Bryant

More from Medium

How To Obtain Kafka Consumer Lags in Pyspark Structured Streaming (Part 1)

Kafka Producer/Consumer with Message key and offset

Spark Structured Streaming — Programming Model

Spark Streaming