Apache Kafka Guide #32 Producer linger.ms and batch.size settings

Paul Ravvich
Apache Kafka At the Gates of Mastery
4 min readMar 19, 2024

--

Apache Kafka Guide Producer linger.ms and batch.size settings

Hi, this is Paul, and welcome to the #32 part of my Apache Kafka guide. Today we will discuss how to adjust the Producer linger.ms and batch.size settings.

Today, we’re going to explore how to enhance the batching process within Apache Kafka. The default behavior of Kafka producers involves immediately sending records upon execution of an send operation. However, an important configuration parameter, max.in.flight.requests.per.connection=5stipulates that a maximum of five message batches can be simultaneously in transit between the producer and the broker. This configuration allows for a degree of parallel processing.

Once this limit is reached, and while some messages are still being transmitted, Kafka intelligently begins to aggregate messages into batches for subsequent transmission. This smart batching mechanism is particularly beneficial as it not only boosts overall throughput but also maintains minimal latency levels.

Additionally, message batching has the advantage of enhancing compression rates when compression is enabled, leading to more efficient data transmission. Essentially, batching is a crucial feature in Apache Kafka that significantly improves both throughput and data compression.

To further refine the batching process, two key settings are available: linger.ms and batch.size. The linger.ms setting, with a default value of zero, determines the maximum wait time before sending a batch. Adjusting this to, for example, five milliseconds introduces a slight delay, allowing the producer to accumulate more messages in a batch over those five milliseconds before sending it off. On the other hand, the batch.size setting dictates the maximum batch size, enabling the sending of a batch as soon as it is full, even if the linger.ms duration has not yet been met. Adjusting these settings allows for optimization of batch sizes according to specific requirements.

By default, Kafka Producer tries to send messages as soon as possible:

  • max.in.flight.requests.per.connection=5 — up to 5 message batches being in flight (sent between Producer & Broker)
  • If more than 5 messages must be sent while others are in flight, Kafka starts batching them before the next batch send
  • Batched have a higher compression ratio for better efficiency

Two settings to adjust batching:

  • linger.ms — (default=0) how long wait until sending batch
  • batch.size — when a batch is full before linger.ms, increase the batch size

Example

Here is an illustration of the concept.

Consider a batch created by a producer where messages 1, 2, and 3 are sent. The process involves a waiting period up to a specified linger.millisecond before the batch is closed. This delay enables the producer to continue adding messages to the batch. Once the linger.millisecond duration is reached, the result is a single batch, consolidated into one request. The maximum size of this batch is determined by batch.size. Subsequently, this batch is transmitted to Kafka, potentially utilizing a compression mechanism if producer compression has been activated.

Batch size

Let’s delve into the concept of “batch.size.”

Initially, its default setting is 16 kilobytes. This setting dictates the maximum amount of data, in bytes, that a single batch can contain. By opting to increase the batch size to values such as 32 kilobytes or 64 kilobytes, you stand to gain in terms of enhanced compression, improved throughput, and heightened efficiency in requests. This efficiency arises from the reduction in the number of requests sent.

Should a message exceed the set batch size, it bypasses the batching process and is dispatched immediately. It’s important to note that batch allocation occurs on a per-partition basis. Setting the batch size excessively high can lead to unnecessary memory consumption, so it’s advisable to proceed with caution in this regard.

Furthermore, monitoring the average batch size becomes possible through the use of Kafka producer metrics. This monitoring is crucial for those looking to optimize their producers’ performance.

  • The maximum number of bytes can be in a butch
  • Increasing a butch size can help increase compression, throughput, and efficiency or requests
  • Bigger messages than batch size will not be batched
  • Batch allocated per partition.
  • If the batch size is too high you waste the memory.

You can monitor actual batch size with Kafka Producer Metrics

Hight Throughput Producer

To sum up, achieving a high throughput producer involves several steps. First, we must adjust the linger.ms setting. This adjustment means the producer will pause for a few milliseconds to allow batches to accumulate before dispatching them. To ensure these batches are full, it’s necessary to also increase the batch sizes, enabling the sending of larger batches and enhancing efficiency. Furthermore, implementing some form of compression at the producer level will augment the efficiency of transmissions. In essence, these modifications constitute the recommended changes to the code.

properties.setProperty(ProducerConfig.BATCH_SIZE_CONFIG, Integer.toString(32 * 1024));
properties.setProperty(ProducerConfig.LINGER_MS_CONFIG, "20");
properties.setProperty(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
  • Increase linger.ms — producer waits for full bathes before sending them
  • Then you check you send full batches and have enough memory to increase batch.size and send bigger batches
  • Enable Producer compression

Thank you for reading until the end. Before you go:

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!