Apache Kafka Guide #31 Producer Message Compression

Paul Ravvich
Apache Kafka At the Gates of Mastery
4 min readMar 14, 2024

--

Apache Kafka Guide Producer Message Compression

Hi, this is Paul, and welcome to the #31 part of my Apache Kafka guide. Today we will discuss Message Compression at the Producer level.

Producer Message Compression

Typically, producers transmit data into Kafka, which is often in a text-based format, such as JSON. In such scenarios, applying compression to the producer is critical. Enabling compression reduces the size of the message, making it both quicker to send to Apache Kafka and requiring less storage space on disk. Compression can be implemented at various stages, either at the producer level, which necessitates no modifications to the brokers or consumers, or it can be set on the broker, highlighting the differences in compression between the producer and the broker.

There are several compression types available, including none (the default), gzip, lz4, snappy, and zstd (introduced in Kafka 2.1). Compression works by consolidating repeated values, thereby enhancing efficiency. The larger the batch of messages, the greater the compression and effectiveness. An informative blog post on Cloudflare compares compression techniques in Kafka and is a recommended read for those deeply interested in the subject.

  • The Producer sends the data in text-based like JSON. In this case, it is important to apply compression to the Producer.
  • Compression can be enabled at the Producer and does not require any configuration change in the Broker or Consumer.
  • compression.type can be none (default), gzip, lz4, snappy, zstd (Kafka 2.1)
  • Compression is more effective for bigger message batches.

How Producer Message Compression Works?

Compression operates by reducing the size of data that needs to be sent or stored. For instance, consider a producer with a batch of 100 messages intended for Kafka. This batch ranges from message one to message 100. When compression is activated, the entire batch is compressed into a smaller, more compact format. As illustrated in the diagram, this compression significantly decreases the batch’s size. Consequently, the compressed batch can be transmitted to Kafka more rapidly and stored on disk with greater efficiency, enhancing overall data handling and transmission processes.

Apache Kafka Producer Message Compression

Advantages of Producer Message Compression

Message compression offers numerous benefits. Firstly, it significantly reduces the size of producer requests, potentially by up to four times, leading to faster data transfer across the network. This efficiency results in reduced latency, improved throughput, and enhanced disk utilization on Kafka, as the compressed messages occupy less space on the disk.

Disadvantages of Producer Message Compression

Despite these advantages, the downsides of compression are relatively minor in the contemporary context. It requires the producer to allocate some CPU resources for compression calculations, and similarly, consumers must expend CPU effort to decompress and read the message batch. Nonetheless, for optimizing speed and compression ratio, snappy or lz4 are recommended as starting points, although other methods may also be suitable depending on the specifics of your data sets.

Optimization

Further optimization can be achieved by adjusting two settings: linger.ms and batch.size. These adjustments encourage larger batch sizes, leading to more effective compression and higher throughput, which will be explored in more detail in the next lecture.

In my experience working with clients using Apache Kafka, particularly in scenarios involving high throughput streams, I consistently advocate for enabling compression in production settings. The implementation of this simple setting can resolve numerous issues, underscoring its significant impact on streamlining operations.

Message Compression at the Broker / Topic level

Compression can occur at both the broker and topic levels. Enabling compression at the broker level applies it across all topics, whereas at the topic level, it affects only a specific topic. This flexibility is part of the broker settings, where you can choose compression.type=producer the default setting. This configuration means the broker accepts compressed batches from producer clients and writes them directly to the topic's log file without further compression, optimizing the process but shifting the compression responsibility to the producer.

Alternatively, setting compression.type=none results in the broker decompressing all batches sent to Apache Kafka. Though this approach might seem inefficient, it remains an option. You can also specify a particular compression type, such as compression.type=lz4, which introduces unique behaviors. If the compression type of the topic matches that of the producer, the data will not be recompressed and will be stored as-is. However, a mismatch in compression types between the producer and the topic triggers the broker to decompress and then recompress the batch using the specified algorithm, like lz4.

It’s important to note that enabling compression on the broker side increases CPU usage. The optimal strategy is to have all producers compress data before sending it, maintaining the default compression.type=producer at the broker to minimize resource consumption. If you cannot control the compression practices of your producers but still wish to utilize compression, activating it at the broker level is feasible, though it may lead to higher CPU usage and potential performance impacts. Always conduct thorough testing before making such changes live in a production environment.

  • compression.type=producer (default) — use producer compression and does not recompress the data, just write directly into the topic’s log.
  • compression.type=none — all batches are decompressed by the broker.
  • Warning — broker side compression takes extra CPU

Thank you for reading until the end. Before you go:

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!