Apache Kafka Guide #20 Log Cleanup Policies
Hi, this is Paul, and welcome to the #20 part of my Apache Kafka guide. Today we will discuss how Log Cleanup Policies work.
Welcome to our section on understanding Kafka’s log cleanup policies. Kafka topics hold data that eventually expires according to specific policies. This process of data expiration is known as log cleanup. In Kafka, there are two main log cleanup policies to consider. The first is the log.cleanup.policy=delete
. This is the standard setting for all user-created topics. It involves removing data based on its age, and by default, this period is set to one week. In other words, data in Kafka will be deleted after a week due to this deletion policy.
Additionally, log deletion can be triggered by the maximum log size, which we will explore in the upcoming lecture. The default setting for this is minus one, implying no limit on size, but the one-week data limit still applies. The second policy is the log.cleanup.policy=compact
. This is the default for a specific internal Kafka topic, the consumer offsets topic.
Delete Policy log.cleanup.policy=delete
- Delete accordingly the age of data (default is one week)
- Delete accordingly the max size of the log (default is -1 which means infinite)
Compact Policy log.cleanup.policy=compact
(Kafka default for the topic is __consumer_offset
)
- Delete accordingly the keys of messages
- Delete old duplicates after the active segment is committed
- Infinite space retention and time
I’ll talk about the consumer offsets topic. This topic has many partitions. Notably, under settings, it uses Producer for compression type and compact
cleanup policy. This indicates a unique cleanup policy for this topic, which we’ll explore in upcoming guides. The log cleanup compacts policy involves deleting messages based on the most recent key occurrence. This means older keys are removed after the active segment is saved. It allows unlimited retention of time and space, offering some unique features.
$ kafka-topics — bootstrap-server <host:port> --describe --topic __consumer_offsets
Why is log cleanup necessary in Kafka, and when does it occur?
Log cleanup in Kafka is crucial for managing disk space. It involves deleting Kafka data that’s already been consumed and is now obsolete. This process helps maintain a consistent disk space, reducing the need for extensive maintenance on the Kafka Cluster.
How frequent is log cleanup?
It’s triggered by the creation of partition segments. More segments mean more frequent cleanups. However, it’s important not to clean logs too often, as this process requires CPU and RAM resources to identify what needs to be deleted.
There’s a setting named log.cleaner.backoff.ms
which controls this. It indicates that the cleaner checks for tasks every 15 seconds. Understanding this helps in managing log cleanup effectively.
Deleting data from Apache Kafka:
- Control the size of the data on disk.
- Decrice maintenance work on the Kafka Cluster
How often is log cleanup?
- On partition segments
- Smaller segment means more often log cleanup, bigger means rare
- Takes CPU and RAM resources and couldn’t be too often
- Checks by default every 15 seconds
log.cleaner.backoff.ms
Thank you for reading until the end. Before you go:
- Please consider clapping and following the writer! 👏
- Follow us on Twitter(X), LinkedIn