Apache Kafka Guide #9 Zookeeper

Paul Ravvich
Apache Kafka At the Gates of Mastery
3 min readDec 30, 2023

--

Hi, this is Paul, and welcome to the #9 part of my Apache Kafka guide. Today we’re going to talk about how Zookeeper works.

Zookeeper manages Kafka brokers and is a software that maintains a list of these brokers. It’s crucial for Kafka, especially in broker failure scenarios, to aid in leader elections for partitions and notify brokers about changes like new topics, broker status changes, or topic deletions. Thus, Zookeeper holds significant Kafka metadata.

Up to version 2.x, Kafka relied on Zookeeper. Since Kafka’s inception, Zookeeper was indispensable, and launching Kafka without Zookeeper was impossible. However, from Kafka 3.x onwards, Kafka operates independently without Zookeeper, using the Kafka Raft mechanism, also known as Kraft. To learn more, search for ‘KIP-500’ on Google. In future versions, like Kafka 4.x, Zookeeper will no longer be part of Kafka.

Zookeeper typically operates with an odd number of servers, ranging from 1 to a maximum of 7, and has a leader-follower structure for write and read operations, respectively.

An outdated yet still prevalent misconception is that Kafka consumers store offsets in Zookeeper. However, since Kafka version 0.10, consumer offsets are stored in Kafka’s internal topics named ‘consumer offsets,’ not in Zookeeper. This clarification is important due to ongoing confusion in this area.

  • Zookeeper keeps a list of brokers and manages them.
  • Zookeeper helps in leader election for partition.
  • Zookeeper sends notifications about creating/deleting topics, broker up/down, etc…
  • Zookeeper managed an odd number of servers (1, 3, 5, 7). The cluster can't be more than 7 Zookeeper instances.
  • Zookeeper Cluster has a Zookeeper Leader for writing and Zookeeper Follwers for reading
  • Zookeeper does not store consumer offset since Kafka v0.10
  • Apache Kafka 2.x not working without Zookeeper
  • Kafka 3.x can work without Zookeeper (KIP-500 Kafka Raft but not for production)
  • Kafka 4.x no Zookeeper at all.

Zookeeper Cluster

If we consider Zookeeper, there’s a scenario where we have three Zookeeper servers. The middle one acts as the leader. The brokers are linked to Zookeeper, which is how they access their metadata.

Regarding Kafka clients, over time, they have transitioned to depend solely on the brokers for connectivity, rather than Zookeeper.

Previously, producers, consumers, and administrative clients connected through Zookeeper. Therefore, you might still encounter references to the Zookeeper option in various online resources.

All Kafka clients and command-line interfaces have transitioned to use only Kafka brokers for establishing connections. This applies even to consumers who previously connected to Zookeepers. Since Kafka version 2.0, the Kafka topics command also references Kafka brokers instead of Zookeeper. This shift is significant because the community has made a concerted effort to move all commands from Zookeeper to Kafka. This is crucial for a future without a Zookeeper in Kafka, ensuring that clients face no issues since they won’t expect a Zookeeper to be present. Another reason for phasing out Zookeeper is its lesser security compared to Kafka. Therefore, if Zookeeper is still in use, it should be configured to accept connections only from Kafka brokers and not Kafka clients.

See you in the next part of the guide!

Paul Ravvich

Thank you for reading until the end. Before you go:

--

--

Paul Ravvich
Apache Kafka At the Gates of Mastery

Software Engineer with over 10 years of XP. Join me for tips on Programming, System Design, and productivity in tech! New articles every Tuesday and Thursday!