Kafka Log Retention and Cleanup Policies

Sunny Garg
5 min readJul 28, 2019

--

Apache Kafka provides two types of Retention Policies.

Time Based Retention:

Once the configured retention time has been reached for Segment, it is marked for deletion or compaction depending on configured cleanup policy. Default retention period for Segments is 7 days.

Here are the parameters (in decreasing order of priority) that you can set in your Kafka broker properties file:

Size Based Retention:

In this policy, we configure the maximum size of a Log data structure for a Topic partition. Once Log size reaches this size, it starts removing Segments from its end. This policy is not popular as this does not provide good visibility about message expiry. However it can come handy in a scenario where we need to control the size of a Log due to limited disk space.

So till now we understood what are retention policies, once retention period is reached, clean policies comes into picture. Lets understand Cleanup Policies now.

Log Cleanup Policies! What are these?

In Kafka, unlike other messaging systems, the messages on a topic are not immediately removed after they are consumed. Instead, the configuration of each topic determines how much space the topic is permitted and how it is managed.

Concept of making data expire is called as Cleanup. Its a Topic level configuration. It is important to restrict log segment to continue grow in size.

Types of Cleanup Policies:

Delete Policy:

This is default cleanup policy. This will discard old segments when their retention time or size limit has been reached.

Compact Policy:

This will enable Log Compaction on a topic. The idea is to selectively remove records for each partition where we have a more recent update with the same primary key. This way the log is guaranteed to have at least the last state for each key.

Lets see this first, high level digram of a immutable log stream for a single topic demonstrating compaction.

Source: https://kafka.apache.org

As we can see here it does cleanup of values V1 for key K1 and keeps latest copy with value as V4.

Here are few important configurations for Log Compaction.

Delete and Compact Both:

We can specify both delete and compact values for the cleanup.policy configuration at the same time. In this case, the log is compacted, but the cleanup process also follows the retention time or size limit settings.

When both methods are enabled, capacity planning is simpler than when you only have compaction set for a topic. However, some use cases for log compaction depend on messages not being deleted by log cleanup, so consider whether using both is right for your scenario.

How to choose Cleanup Policy?

Config ‘log.cleanup.policy’ can have a value among ‘delete’, ‘compact’ or ‘compact, delete’

What is Log Cleaner?

Log cleaner does Log compaction. Log cleaner is a pool of background compaction threads.

How each compaction thread works?

Source: https://kafka.apache.org
  1. It chooses the log that has the highest ratio of log head to log tail
  2. It creates a succinct summary of the last offset for each key in the head of the log
  3. It recopies the log from beginning to end removing keys which have a later occurrence in the log. New, clean segments are swapped into the log immediately so the additional disk space required is just one additional log segment (not a fully copy of the log).
  4. The summary of the log head is essentially just a space-compact hash table. It uses exactly 24 bytes per entry. As a result with 8GB of cleaner buffer one cleaner iteration can clean around 366GB of log head (assuming 1k messages).

How to enable Log Cleaner? and other configurations.

Few FAQs about Log Cleaner?

  1. Does Log Cleaner impacts Read performance?

No. Cleaning does not block reads and can be throttled to use no more than a configurable amount of I/O throughput to avoid impacting producers and consumers.

2. Does offset of message gets changed after compaction?

No. The offset for a message never changes. It is the permanent identifier for a position in the log.

3. Does Log Cleaner also deletes Tombstone messages?

Yes. A message with a key and null payload is considered as Tombstone message. Log cleaner also deletes these tombstones.

4. Does order of messages gets changed after compaction?

No. Ordering of messages is always maintained. Compaction will never re-order messages, just remove some.

I am sure you must have learned from this Story. Feel free to give your suggestions or write me for any topic you want to cover.

Write me here sunnyg28@gmail.com

--

--