Apache Kafka Guide #21 Log Cleanup Delete

Paul Ravvich
Apache Kafka At the Gates of Mastery
3 min readFeb 8, 2024

--

Hi, this is Paul, and welcome to the #21 part of my Apache Kafka guide. Today we will discuss how Log Cleanup Delete works.

Let’s discuss the primary log cleaner policy, commonly known as ‘deletes’, which is set as the standard option. This policy is influenced by two primary configurations. The first is log.retention.hours, determining the duration your data is stored. By default, it’s set to 168 hours or one week. Choosing a larger number results in extended data retention, leading to increased disk usage due to more data being stored. On the contrary, a smaller number means less data is kept, requiring less disk space. However, this could result in data loss if consumers are inactive for extended periods. My advice is to enhance this setting and allocate more disk space, considering disk space is relatively inexpensive nowadays, whereas losing consumer data can be costly.

Additionally, log.retention.ms and log.retention.minutes are other factors that affect this policy, with the smallest unit taking precedence over others.

Another related setting is log.retention.bytes, defining the maximum bytes per partition in your topic. The default is set to minus one, implying unlimited space retention for your topics. Adjusting this is beneficial for maintaining the size of your topic log within a certain limit.

Log Retention Hours log.retention.hours

  • Number of hours for holding the data (default is 168 hours — 1 week)
  • A higher number = more disk space required
  • A lower number = less disk space usage but if your consumers are inactive for too long you can lose the data
  • Parameter log.retention.ms and log.retention.minutes the smaller unit has a priority.

Log Retention Bytes log.retention.bytes

  • Max size in Bytes for each Partition. Default = -1 which means infinite
  • Useful for using as the threshold for log size.

Log Cleanup Police: Delete

We have various sections, and new data is stored in active, recent sections. Older sections from earlier times will be removed or deleted because they are either too old or the section becomes too large.

There are two popular choices for setting the log cleaner policy. The first is a one-week retention period, which is the standard setting. This is achieved by setting log.retention.hours to 168 and log.retention.bytes to minus one.

The second option allows for unlimited time retention but limits it to 500 megabytes of data for each topic partition. For this setting, log.retention.ms should be set to minus one, and log.retention.bytes to a large number that equates to 500 megabytes.

Two common use cases of options:

  • Time retention: log.retention.hours=168 and log.retention.bytes=-1
  • Size retention: log.retention.ms=-1 and log.retention.bytes=524288000

Thank you for reading until the end. Before you go:

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!