Kafka Producer Overview

Sylvester John
5 min readMay 5, 2019

--

This article is a continuation of part 1 Kafka technical overview article. In part 2 of the series let’s look into the details of how Kafka producer works and important configurations.

Producer Role

The primary role of a Kafka producer is to take producer properties & record as inputs and write it to an appropriate Kafka broker. Producers serialize, partitions, compresses and load balances data across brokers based on partitions.

Properties

Some of the producer properties are bootstrap servers, acks, batch.size, linger.ms key.serializer, value.serializer and many more. We will discuss some of these properties later in this article.

Producer record

A message that should be written to Kafka is referred to as Producer Record. A producer record should have the name of the topic it should be written to and value of the record. Other fields like partition, timestamp and key are optional.

Broker and metadata discovery

Bootstrap server

Any broker in Kafka cluster can act as a bootstrap server. Generally, a list of bootstrap servers is passed instead of just one server. At least 2 bootstrap servers are recommended.

In order to send producer record to an appropriate broker, the producer first establishes a connection to one of the bootstrap server. The bootstrap-server returns list of all the brokers available in the clusters and all the metadata details like topics, partitions, replication factor and so on. Based on the list of brokers and metadata details the producer identifies the leader broker that hosts the leader partition of the producer record and writes to the broker.

Workflow

The diagram below shows the workflow of a producer.

The workflow of a producer involves five important steps:

  1. Serialize
  2. Partition
  3. Compress
  4. Accumulate records
  5. Group by broker and send

Serialize

In this step, the producer record gets serialized based on the serializers passed to the producer. Both key and value are serialized based on the serializer passed. Some of the serializers include string serializer, byteArray serializer and ByteBuffer serializers.

Partition

In this step, the producer decides which partition of the topic the record should get written to. By default murmur2 algorithm is used for partitioning. Murmur 2 algorithm generates a unique hash code based on the Key passed and the appropriate partition is decided. In case the key is not passed the partitions are chosen in a round-robin fashion.

It’s important to understand that by passing the same key to a set of records, Kafka will ensure that messages are written to the same partition in the order received for a given number of partitions. If you want to retain the order of messages received it’s important to use an appropriate key for the messages. Custom partitioner can also be passed to the producer to control which partitions message should be written to.

Compression

In this step producer record is compressed before it’s written to the record accumulator. By default, compression is not enabled in Kafka producer. Below are supported compression types:

Compression enables faster transfer not only from producer to broker but also during replication. Compression helps better throughput, low latency, and better disk utilization. Refer http://blog.yaorenjie.com/2017/01/03/Kafka-0-10-Compression-Benchmark/ for benchmark details.

Record accumulator

In this step, the records are accumulated in a buffer per partition of a topic. Records are grouped into batches based on producer batch size property. Each partition in a topic gets a separate accumulator/buffer.

Sender thread

In this step, the batches of the partition in record accumulator are grouped by the broker to which they are to be sent. The records in the batch are sent to a broker based on batch.sizeand linger.ms properties. The records are sent by the producer based on two conditions. When the defined batch size is reached or defined linger time is reached.

Duplicate message detection

Producers may send a duplicate message when a message was committed by Kafka but the acknowledgment was never received by the producer due to network failure and other issues. From Kafka 0.11 to avoid duplicate messages in case of scenario stated earlier Kafka tracks each message based on producer ID and sequence number. When a duplicate message is received for a committed message with same producer ID and sequence number then Kafka would treat the message as a duplicate message and will not committee message again but it will send the acknowledgment back to the producer so the producer can treat the message as sent.

Few other producer properties

  • Buffer.memory — manage buffer memory allocated to producer
  • Retries — Number of times to retry message. Default is 0. The retry may cause out of order messages.
  • Max.in.flight.requests.per.connection — The number of messages to be sent without any acknowledgment. Default is 5. Set this to 1 to avoid out of order message due to retry.
  • Max.request.size — Maximum size of the message. Default 1 MB.

Summary

Based on the producer workflow and producer properties, tune the configuration to achieve desired results. Importantly focus on below properties.

  • Batch.size — batch size (messages) per request
  • Linger.ms — Time to wait before sending the current batch
  • Compression.type — compress messages

In part 3 of the series let’s understand Kafka producer delivery semantics and how to tune some of the producer properties to achieve desired results.

--

--