Zookeeper in Kafka

Logeesan Jegatheswaran
3 min readAug 21, 2019

--

This is the continuation of the previous article posted by me. Zookeeper is an essential component in the Kafka. In this post you will know about What is zookeeper and what are the service provided by it in Kafka.

What is Zookeeper?

Zookeeper acts as a centralized service and used for maintaining configuration information, naming , providing distributed synchronization and providing group services.Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.The service itself is distributed and highly reliable.

Consensus, group management, and presence protocols will be implemented by the service so that the applications do not need to implement them on their own.It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems.It runs in Java and has bindings for both Java and C.

Zookeeper data is kept in-memory, which means Zookeeper can achieve high throughput and low latency numbers.Zookeeper are excelled in performance aspect, reliability aspect and in strict ordering. The performance aspects of Zookeeper means it can be used in large, distributed systems. The reliability aspects keep it from being a single point of failure. The strict ordering means that sophisticated synchronization primitives can be implemented at the client.

Common components of Zookeeper architecture

Zookeeper architecture Ref:https://zookeeper.apache.org/doc/r3.5.1-alpha/zookeeperOver.html
Zookeeper architecture
  • Node: The systems installed on the cluster
  • ZNode: The nodes where the status is updated by other nodes in cluster
  • Client Applications: The tools that interact with the distributed applications
  • Server Applications: Allows the client applications to interact using a common interface

The services in the cluster are replicated and stored on a set of servers (called an “ensemble”), each of which maintains an in-memory database containing the entire data tree of state as well as a transaction log and snapshots stored persistently. Clients connect to a single Zookeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.

Why Zookeeper is essential for Apache Kafka?

Zookeeper in Kafka

Controller election

Within a Kafka cluster, a single broker serves as the active controller which is responsible for state management of partitions and replicas. For example if there are 10 brokers, there will be one broker which acts as a controller.
Controller has the responsibility to maintain the leader-follower relationship across all the partitions. If a node is about to fail, message will be given(by controller) to other partition replicas in other brokers to be as a partition leaders to fulfill the responsibility of the partitions in the node that is about to fail. So when a node shuts down, new controller can be elected at any time to fulfill the duties. This controller election is done by Zookeeper.

Configuration of topics

The configuration regarding all the topics including the list of existing topics, the number of partitions for each topic, the location of all the replicas, list of configuration overrides for all topics and which node is the preferred leader, etc.

Access Control Lists

Access control lists or ACLs for all the topics are also maintained within Zookeeper.

Membership of the clusters

Zookeeper also maintains a list of all the brokers that are functioning at any given moment and are a part of the cluster.

References

--

--