Apache Kafka® is a distributed streaming platform and high availability can be achieved by proper configuration settings in most of the environment. But if you want to set up a Kafka cluster in Microsoft Azure, with its unique Fault Domain and Update Domain flavor and the SLA for Virtual Machine, the high availability Kafka service could be impaired even if you strictly follow suggestions from HCC (Hortonworks Community Connection).
Build a Kafka Azure Client Tool to achieve High Availability Kafka Service in Microsoft Azure Cloud.
Kafka application is designed to be highly available and resilient to node failures if you can manage your kafka and topic configuration properly, such as use right replica factor for redundancy and use rack awareness (kafka 0.10) to spread replicas across racks. But the rack awareness feature will be impaired if you setup Kafka cluster on Microsoft Azure Cloud.
The Virtual Machine on Microsoft Azure could be impacted from some scenarios: unplanned hardware maintenance, unexpected downtime, and planned maintenance. VM will experience Unexpected downtime (reboot) and in some cases loss of the temporary drive. In additional to that, VM could be rebooted and relocated to a different server or different rack. To reduce the impact to your system because of one or more of these events, Microsoft recommended user to configure multiple virtual machines in an availability set for high availability and redundancy.
What is availability set?
An availability set is a logical grouping of VMs within a data center to provide for redundancy and availability. Each virtual machine in Azure cloud is assigned a fault domain and an update domain by the underlying Azure platform. Fault domains define the group of virtual machines that share a common power source and network switch (similar concept as Rack). Update domain indicates groups of virtual machines and underlying physical hardware that can be rebooted at the same time.
Being aware of Azure VM maintenance and unexpected downtime could impact the high availability Kafka service, Microsoft has provided a rebalance tool in their HDinsight managed service. The tool will provide rebalance feature for existing topic partitions, but not for topic creation and will not be aware of the VM(broker) domains changing after reboot. And this tool is only accessible in specific Microsoft managed service environment. So we need an independent tool which is generic and accessible by any Kafka instance or cluster.
Scope and Features
Our Kafka Azure Client Tool can be used as same fashion as Kafka client to achieve High Availability in Azure Cloud. It has features as
- Create the topic partitions in high availability set
- Rebalance the existing partitions which are in risk, reassign them to high available set
It also has additional features as
- Set up cronjob to detect the VM domain change and auto trigger the rebalance job for those partitions impacted, reassign the partition to make them high available again.
- Be used as java API, called by any java program to create topic, reassign the topic partitions in Azure environment with High Availability feature.
A: The following example has 6 Virtual Machine created on 2 Fault Domain and 3 Update Domain. The topic has 3 partitions: P0, P1, P3. Each partition has 2 replicas: Replica1, Replica2
(Left) The partitions are all in risk for the high availability, reasons are as explained.
(Right) After using Kafka Azure Client Tool to rebalance the topic. All 3 partitions’ replicas are reassigned to proper brokers(VM) and achieved high availability — replicas are all on different Fault Domain and different Update Domain.
B: This example explains how high availability is re-achieved after broker is down and brought up on different Update Domain which risks the high availability for the replicas resides on this broker.
1. Topic is created by Kafka Azure Client Tool and are high available.
2. Broker 3 is down and brought up on different Fault Domain and Update Domain. It changed from FD2UD2 to FD1UD4. Partition 1 and Partition 2 are in risk for high availability because Broker 2 and Broker 3 are on same Fault Domain FD1. Update Domain job will detect the domain change for Broker 3 and rebalance the partitions resides on Broker 3
3. Rebalance job is auto-triggered for Broker 3. After reassigning the replicas for Partition 1 and Partition 2, the high availability is re-achieved again.
In order to 1) find a set of high available brokers for replica efficiently for all the partitions 2) fairly distribute the replicas across Kafka cluster, following algorithms and logic are implemented to achieve the goals.
- Work out a FaultDomain/UpdateDomain list, from which an availability set can be got in sequence
e.g. 20 VMs in an Azure Cloud with 3 Fault Domain and 20 Update Domain
- Get fair replicas distribution of replicas by picking random starting point. The leader and follower counts will be considered to elect the lead for the replicas.
- When the broker VM resides in an environment that availability set in sequence is violated from the random starting point, the tool will switch the window in calculated FDUD list to next index.
- In some circumstance, such as most of the brokers are down, or only 1 or 2 Fault domains are utilized. The FDUD list could result to “No High Available Set is available”. But you still want to create the topics to avoid data lost, “-force” option can be used to create the topic or reassign the replicas regardless. When the Kafka cluster is back to healthy, you can trigger the rebalance task to move partition replicas back to high availability set.
- VM Fault Domain and Update Domain info will be created and stored as ZNode in zookeeper. (/brokers/domains/<brokerId>→ FD0UD0)
Kafka open issues
During the development, we have encountered some Kafka open issues for replica reassignment feature. We have made some improvement in our Kafka Azure Client Tool for some open issue. For some we will improve in the future and adopt the new protocol when Kafka has their issue fixed.
- KAFKA-1792 change behavior of — generate to produce assignment config with fair replica distribution and minimal number of reassignments
Reassignment partition is a labor-intensive and the [-generate] option will not take into consideration current replica assignment and will not try to minimize number of replica moves between brokers. A big replica reassignment could impact the cluster performance. It is to be addressed in above Kafka open ticket.
Improved in Kafka Azure Client Tool: rebalance feature is able to take specific broker Id as argument and only reassign the partitions reside on this broker to minimize number of replica movement.
- KAFKA-5601 Refactor ReassignPartitionsCommand to use AdminClient
Currently the ReassignPartitionsCommand (used by kafka-reassign-partitions.sh) talks directly to ZooKeeper. It would be better to have it uses the AdminClient API instead. Kafka Azure Client Tool is also use ReassignPartitionsCommand to reassign the replicas via ZooKeeper. When this Kafka open ticket resolved, we can change it to use AdminClient API accordingly.
Improved in Kafka Azure Client Tool: The New Topic Creation uses new AdminClient API.
Other improvements in the future
Disk usage can be taken into consideration when assigning the brokers for replicas.
Automatically reassign some partition loads to newly joined brokers to achieve better fair distribution and load balance.