Kafka — Data Durability and Availability Guarantees
Published in
4 min readMay 3, 2023
One of the key components of Kafka’s durability and availability guarantees is replication.
Producers on the other hand have some choices as to when they receive acknowledgment for the success or failure of a produce request from the broker.
Producer acks = 0
- The producer configuration, acks, directly affect the durability guarantees. And it also provides one of several points of trade-off between durability and latency.
- Setting acks=0, also known as the “fire and forget” mode, provides lower latency since the producer doesn’t wait for a response from the broker.
- But this setting provides no strong durability guarantee since the partition leader might never receive the data due to a transient connectivity issue or we could be going through a leader election.
Producer acks = 1
- With acks=1, we have a little bit better durability, since we know the data was written to the leader replica, but we have a little higher latency since we are waiting for all the steps in the send request process which we saw in the Inside the Apache Kafka Broker module.
- We are also not taking full advantage of replication because we’re not waiting for the data to land in the follower replicas.
Producer acks = all
- The highest level of durability comes with acks=all (or acks=-1), which is also the default. With this setting, the send request is not acknowledged until the data has been written to the leader replica and all of the follower replicas in the ISR (in-sync replica) list.
- Now we’re back in the situation where we could lose N-1 nodes and not lose any data. However, this will have higher latency as we are waiting for the replication process to complete.
Topic min.insync.replicas
- The topic level configuration, min.insync.replicas, works along with the acks configuration to more effectively enforce durability guarantees. This setting tells the broker to not allow an event to be written to a topic unless there are N replicas in the ISR.
- Combined with acks=all, this ensures that any events that are received onto the topic will be stored in N replicas before the event send is acknowledged.
Producer Idempotence
- Kafka also has ordering guarantees which are handled mainly by Kafka’s partitioning and the fact that partitions are append-only immutable logs.
- Events are written to a particular partition in the order they were sent, and consumers read those events in the same order. However, failures can cause duplicate events to be written to the partition which will throw off the ordering.
- To prevent this, we can use the Idempotent Producer, which guarantees that duplicate events will not be sent in the case of a failure.
End-to-End Ordering Guarantee
- Combining acks=all, producer idempotence, and keyed events results in a powerful end-to-end ordering guarantee. Events with a specific key will always land in a specific partition in the order they are sent, and consumers will always read them from that specific partition in that exact order.
Don’t forget to hit the Clap and Follow buttons to help me write more articles like this.
And, if you are looking for summarized articles on Apache Kafka, you can also check my previous articles like Foundational Concepts of Kafka and Its Key Principles, Why is Apache Kafka fast?
References