Advanced Kafka Producer Configurations

Krishna
kafka with spring
Published in
4 min readJun 20, 2024
Kafka Producer

Introduction

In previous article we have seen Basic Kafka Producer example. In this article I am going to cover advanced kafka producer configurations to send kafka messages with high compression, throughput, and efficiency of requests and to handle failure scenarios.

Batching

First we will try to understand few producer configurations related to batching. Batch means group of kafka messages.

max.inflight.requests.per.connection (default = 5) -> means how many batches can be sent parallely to kafka broker. This setting is defined per partition. 1 Request contains 1 Batch.

linger.ms (default = 0) -> How long to wait until we send a batch to kafka. This can introduce small delay in message processing but decrease network calls between kafka producer and kafka cluster (as we are sending messages in batches) which increase throughput and producer efficiency.

batch.size (default = 16Kb) -> As our code executes kafkaTemplate.send() method or (producer sends message to kafka), all messages stays in temporary buffer. Messages keeps going to buffer till configured time (linger.ms). After configured time, batch is sent to kafka. If batch is filled before linger.ms batch is sent to kafka right away.

By default, Kafka producers try to send records as soon as possible. max.inflight.requests.per.connection (default = 5) meaning there are 5 batches are in flight. After this if producer sends any messages, kafka starts batching all these messages until linger.ms, After linger.ms batch is ready to be sent to kafka. Even if the batch is ready, the producer will not send it out to Kafka if the maximum number of in-flight batches (controlled by max.in.flight.requests.per.connection) has been reached. The producer will wait until some of the in-flight requests are acknowledged before sending the next batch.

Handling Failure Scenarios in Kafka Producers

System Failures or Network failures are common in software systems and in fact we cannot prevent them from happening. The only thing we can do is design our system resilient to such failures. To handle this type of scenarios in kafka producer we have config called Producer Retries.

retries (default= Integer.MAX_VALUE ): Producer attempts to send a message for this configured number of retries before marking it as failed.

delivery.timeout.ms (default = 120000 (2 min)): Producer doesn’t retry the record forever if retries= Integer.MAX_VALUE, it is bounded by timeout. Record will be failed if it cannot be delivered within this configured time.

retry.backoff.ms (default = 100ms): By default, producer waits 100ms between each retry. You can configure this value using retry.backoff.ms parameter.

Having retires, delivery timeout, retry backoff parameters configured, producer can retry requests which are failed (Due to network glitch, Insufficient number of In Sync replicas or Kafka Cluster outage or request is sent to kafka but ack doesn’t reach producer due to network error) and these request can be sent to kafka and get acknowledged.

Problems with Retries:

Duplicate Messages

Duplicate Messages: When publisher sends request to kafka and kafka sends ack to producer, but ack never reaches producer due to network error. So in this case producer retries the failed request and gets ack from kafka. In this scenario producer sent same message twice to kafka ending up in duplicate messages. In next section we will cover how to avoid duplicate messages in case of retries.

Ordering: Kafka guarantees ordering within partition. Assume Kafka sent two requests one after other to the same partition but due to some issues 1st request failed to reach kafka and second request sent to kafka successfully. In this case ordering is lost within partition. We will cover how to avoid this scenario too.

Kafka Indempotent Producer

Idempotent producer ensures that duplicates are not introduced due to unexpected producer retries.

Version Availability

Producer idempotence can be enabled for Kafka versions >= 0.11

Internal working of Idempotent producer:

Producer is assigned unique producer id (PID) and each message is assigned sequence number that gets incremented for every message by producer. This PID and sequence number combination is maintainer per topic per partition basis on producer side. On a broker side largest PID-Sequence number is maintained per topic per partition basis. Any lower PID-sequence number will be discarded ( as it will a duplicate ).

Idempotence can be enabled using enable.idempotence=true in kafka producer.

Now the Default

Starting with Kafka 3.0, producers are by default having enable.idempotence=true and acks=all. See KIP-679 for more details.

Conclusion

Fine tuning kafka producer properties like batching, compression, retries according to your requirements is important. It is recommended to enable idempotency in producers.

--

--