Kafka — What is 2 Generals’ Problem? And can it affect my Kafka Producer?

We look at the paradox & the delivery guarantees offered by Kafka Producer to deal with it.

Abhinav Kapoor
CodeX
6 min readSep 2, 2023

--

Before we get into the details of Kafka producer and the 2 Generals’ Problem. Let's refresh the basics —

Kafka is a distributed messaging system with one or more producers appending messages to the broker, and one or more consumer groups reading from the broker.

The data is stored in the broker. There is a leader broker and for durability, Kafka also replicates the data to other brokers known as follower brokers. In case the leader broker fails, one of the followers is elected as the new leader.

Kafka producers & consumers communicating via messages over a broker.

The message delivery from producer to consumer can be broken down into 2 separate processes —

  1. Producer to Broker — Delivery guarantees are handled by acknowledgement from the broker.
  2. Broker to Consumer — Based on polling, the consumer controls how many times it can read & process a message.

For the same reason, I have also split the delivery guarantees into 2 articles — Delivery schematics by the Producers and processing of messages by the consumers.

What is 2 Generals’ Problem?

Scene showing 2 Generals’ problem — Armies must launch a coordinated attack to be successful. Messenger has to run through hostile territory and avoid being captured. Image Credit — Author.

The 2 generals' problem takes the example of 2 units of army which plan to attack a fortified city from 2 different directions. In order to succeed, the 2 Generals must be able to launch a coordinated attack from both directions.

The problem is that the messengers carrying the message must pass through a valley guarded by the city’s forces. As the message is being sent through hostile territory, there is a possibility that the defenders intercept/capture/kill the messenger. There is no guarantee that the last message has reached the General on the other side. Without this guarantee, both Generals are in a dilemma — General A doesn't know if the message to attack reached General B. And General B, don't know if the confirmation reached General A. Thus both will hesitate to attack without knowing if the other Army unit will arrive for the coordinated attack plan.

The solution looks simple —

  1. The Leader General, General A initiates the message to attack — “We attack at 4 AM”
  2. General B must acknowledge this “Got the message, we attack at 4 AM”

Usually, this is enough to form a consensus, but General B doesn’t know if General A received this message. So there needs to be another acknowledgement from General A to be “Attach confirmed”. And this starts a never-ending cycle of acknowledgements. So the problem is unsolvable.

There are some pragmatic ways to address it —

  1. Cost Intesive — If there are enough messengers to risk, The Generals can send several messengers at once, hoping that some would make it to the other side. Let's say 100 messengers, with each messenger having its sequence number. Even if the General on the other side receives one message it knows when to attack & the missing sequence numbers also tell the general how safe the valley is.
  2. Time Intesive — If there are not enough messengers to risk, there can be another approach where the absence of messages builds confidence. Let's say the time to cross the valley is 1 hour, General A continues to send messages to General B every 1 hour until it receives a single acknowledgement from General B. If General B does receive a message for a few hours after sending the confirmation, it means that General A has received the acknowledgement.

Delivery guarantees from the Producer to the Broker

When the producer produces a message to Kafka it has some possibilities to choose to suit the context.

1. No Guarantee / Fire and Forget:

The producer does not wait for an acknowledgement, the message may be lost even before it is written to the broker’s persistent storage.

Fire and Forget. Image credit & source https://developer.confluent.io/courses/architecture/guarantees/

Producer Setting: acks = 0

There is no durability guarantee. But It's the fastest.

From the producer’s point of view, this can be considered as At-most-once delivery, as the messages may be lost but are not duplicated.

2. Only the Leader Broker saves the message & acknowledges it.

The broker acknowledges when the leader broker has written the message to its persistent storage. Data will be replicated to follower brokers in due time.

Acknowledgement from leader Broker. Image credit & source https://developer.confluent.io/courses/architecture/guarantees/

Producer Setting: acks = 1

The durability is much more improved but there is a little chance that the message may get lost if there is a leader election going on while the message is being produced.

The latency is longer as there is a round trip between the leader broker and the producer client.

3. Strongest guarantee. All (in sync) brokers save the message & acknowledge it.

The broker acknowledges when all the broker replicas (In sync replicas to be precise) have written the message to its persistent storage.

Acknowledgement when the message is written to all brokers. Image credit & source https://developer.confluent.io/courses/architecture/guarantees/

Producer Setting: acks = all

The durability is strongest. The latency is about 2.5 times more than just leader.

From the producer’s point of view, this can be considered as an At-least-once delivery, as messages are never lost (but maybe re-delivered).

How can 2 Generals’ problem affect Kafka Producer?

Now as we know from 2 Generals’ problem, when a General doesn't receive the acknowledgement, it doesn't know if the message itself failed or if the acknowledgement failed. To be sure of the delivery, it sends the message multiple times leading to duplicate messages.

Similarly when Kafka producer is configured to receive acknowledgements, and if it doesn't receive the acknowledgement, it has little choice but to resend the message. If the message was already written by the broker and only the acknowledgement is missing, this can lead to duplicate messages. Hence it results in messages being delivered more than once to the consumer.

Due to acknowledgement failure, the producer retries resulting in duplicate messages and out-of-order messages. Image credit & source https://developer.confluent.io/courses/architecture/guarantees/

What can be done to avoid producing duplicate messages?

First of all, duplicate messages may or may not be a problem. If Exactly once schematics are desired, Kafka producer supports an idempotent delivery option which guarantees that resending will not result in duplicate entries. This allows the producer to retry until it is successful without the possibility of duplicates.

Kafka transparently detects and ignores them. To achieve this, the broker assigns each producer an ID and deduplicates messages using a sequence number that is sent by the producer along with every message.

Producer retries resulting in exactly once messaging schematics & preserving the order. Image credit & source https://developer.confluent.io/courses/architecture/guarantees/

From the producer’s point of view, it is Exactly once schematics, achieved using at least once with deduplication.

Idempotent producer needs acks = all, and other constraints on Retries & MaxInFlight settings.

Important: For the default behaviour, do check the documentation of language specific SDK. For Java its true by default in version ≥ 3.0.0 KAFKA-10619 Enable producer idempotence by default.

But for .NET, its default is off https://docs.confluent.io/platform/current/clients/confluent-kafka-dotnet/_site/api/Confluent.Kafka.ProducerConfig.html#Confluent_Kafka_ProducerConfig_EnableIdempotence.

Summary

  1. 2 Generals’ is a problem that demonstrates the challenges of distributed computing due to possible failures. The pragmatic solutions to the problem embrace the fact & try to mitigate it.
  2. Kafka producer offers 3 kinds of delivery guarantees.
  3. In the absence of acknowledgement, Kafka producer can produce duplicate messages. If exactly once schematics are desired it can be turned on. Do check language specific SDK for default behaviour.

I hope you liked the article. In the subsequent one, I’ll cover the delivery guarantees for the consumer.

Reference & Further Study

2 Generals’ Problem — https://finematics.com/two-generals-problem/

Kafka Producer guarantees — https://developer.confluent.io/courses/architecture/guarantees/

--

--

Abhinav Kapoor
CodeX

Technical Architect | AWS Certified Solutions Architect Professional | Google Cloud Certified - Professional Cloud Architect | Principal Engineer | .NET