Preferred leader election in Kafka
In the Kafka cluster, we have observed that some leader partitions are not on the broker where they are preferred. This post will explain how leader election work, what event can make partition leave it’s preferred broker and what steps can be taken to bring it back. Most of the references in this doc are from the book “Kafka the definitive guide “ written by the founders of Kafka and official Kafka documentation.
How the leader gets selected:
To start with let’s understand how the leader election works.
Whenever a new topic is created, Kafka runs it’s leader election algorithm to figure out the preferred leader of a partition. The first replica will be the one that will be elected as a leader from the list of replicas.
Reference from Kafka definitive guide:
“The first replica in the list is always the preferred leader. This is true no matter who is the current leader and even if the replicas were reassigned to different brokers using the replica reassignment tool. In fact, if you manually reassign replicas, it is important to remember that the replica you specify first will be the preferred replica.”
Let's understand this by example
.Consider we have 4 node cluster
.topic with 4 partition
.replication factor 2
Image Source: medium
As seen in the diagram above, all leaders replicas are distributed across brokers and replicas for the same exist on another broker. In Kafka, all read and write happens through a leader. So it’s important that leaders are spread evenly across brokers.
So far we understood that during the topic creation time leader election algorithm of Kafka will take care of distributing the partition leader evenly.
Now since affinity is assigned for a leader partition during creation time, but over the period of time because of certain events like broker shutdown or crashes or brokers are not able to register heartbeat, leadership gets changed and one of the followers replicas will become the leader.
For example in the above diagram if broker 4 is dead then broker 3 will become the leader for partition 2 of the topic.
In that case leadership is skewed. As we can see distribution of leaders is not even.
At this point in time, we have established that if the leader replica is not there on the preferred broker. It leads to an uneven distribution of leaders.
Now arises certain question in mind, few of them are listed below:
.what if the broker that was preferred for this partition comes back.
.If leaders are not equally distributed, how to overcome this state.
We will work on this one by one
what if the broker that was preferred for this partition comes back.
Once the broker is up again, it will try to get the load that was originally assigned to it.
But it can only do that if all the replicas are in sync at that moment of time when it’s trying to get leadership back. In case replicas are out of sync leader partition will be left with current broker, reason will be to avoid inconsistency and data loss. After that, it will trigger rebalance only if auto.leader.rebalance.enable(will discuss in this detail in the next topic) is set to true. Then periodically it will check whether the partition leader is preferred or not. An important point to notice here is that even auto leader rebalance is set to true, it’s an asynchronous operation. It does not guarantee that the leader will be moved immediately(the reason for the same is out of sync replicas). But eventually, it will.
One workaround can be to stop producing messages to that particular topic or to be more precise partition.
Few lines from Kafka definitive guide for the same:
“Kafka brokers do not automatically take partition leadership back (unless auto leader rebalance is enabled, but this configuration is not recommended) after they have released leadership (e.g., when the broker has failed or been shut down).“
Answer to next question
If leaders are not equally distributed, how to overcome this state.
Now we are aware that events can be triggered periodically to rebalance leadership and property controlling the same is auto.leader.rebalance.enable. But this config is not recommended. Kafka definitive guide is not recommending this but it’s not clear why.
To answer this to a certain extent, I found this bug on Kafka.
link:https://issues.apache.org/jira/browse/KAFKA-4084
In a nutshell, we can say that it has performance implications on clusters. Details can be read from the above link. The fix is there in version 1.1.
So, now the question arises, we can’t set the property as it’s not recommended how to get out of this state.
Below are the steps for the same:
Step 1:
Do a rolling restart, or if there are few brokers holding the leaders of other brokers just restart them and check if leadership is balanced.
Step 2:
If you are using the version containing fix then-leader rebalance property can be used.
Definition for the same can be found here:https://kafka.apache.org/documentation/
Step 3: Third tool can be used, to reassign the leader, this utility is provided by Kafka, and it works asynchronously.
I hope this gives a fair insight into how leadership works in Kafka.
Reference:
.https://www.confluent.io/resources/kafka-the-definitive-guide/