Kafka Replication & Min In-Sync Replicas
Kafka is a distributed messaging system that provides resiliency, high availability, and fault tolerance. One of the means it uses to achieve this is data replication across broker nodes. If a broker node fails then replicated data in a topic partition is not lost, and it can still be consumed from replica partitions. The level of redundancy is configurable, but the cost of redundancy is an increase in latency as the data is replicated. Understanding this configuration, from producer acks, to replication factor and minimum in-sync replicas, is therefore essential.
Minimum In-sync Replicas
When a Kafka producer writes a message to a topic, it writes it to the partition replica leader. This is a replica that has been voted the leader by the broker from its list of in-sync replicas that are distributed across a cluster of broker nodes. The data written to the leader by the producer is then replicated across the partition replica followers. This is controlled by the topic’s replication factor. A value of 3 means that the data is replicated from the leader to two follower partitions, ensuring a total of three replicas hold the data.
While the data is replicated across the follower partitions, the min.insync.replicas configuration parameter controls the minimum number of these replicas (including the leader) that must successfully write the data to their log file when the producer is configured with acks equal to all. While there is a minimum number of in-sync replicas configured, all replicas that are in-sync at the time of the write must acknowledge the write before it is considered successful.
The partition replica leader tracks the set of replicas that are in-sync, in the set known as the ISR (In-Sync Replicas). Partition replica followers send fetch requests to the leader to get the latest log entries. Each partition replica leader tracks the lag of its follower replicas. If the leader does not receive a fetch request within the configured replica.lag.time.max.ms, or the follower has not consumed up to the leader’s log end within this period, then the follower is considered out of sync and is removed from the ISR.
A partition replica is also considered out of sync if it loses connectivity with Zookeeper…