What is Zookeeper?

In distributed applications, we face major challenges such as data inconsistency and not having consensus (general agreement). Since all distributed systems have to deal with maintaining consistency, a lot of effort needs to be put in order to overcome challenges in distributed computing. Zookeeper provides a centralized service maintaining configuration information, DNS, and providing distributed synchronization.

In order to get high availability, multiple Zookeeper servers can be used instead of one. When there are multiple Zookeeper servers, it is called an Ensemble. In the ensemble, all Zookeeper servers store a copy of the data. Data is being replicated over the hosts in the ensemble in order to guarantee high availability of data.

The servers in the ensemble must all know about each other. Each server maintains an in-memory image of the state, along with a transaction logs in a persistent store. As long as a majority of the servers are available, the Zookeeper service will be available. Let’s consider a Zookeeper ensemble with five servers. Zookeeper service is available as long three out of five servers are healthy.

Furthermore, a leader for the ensemble is elected using leader election recipe.

What is leader election?

When Zookeeper servers start up at the beginning, a leader is elected. The leader is responsible for maintaining consensus. Leader election takes place in case of failure of an existing leader as well.

When there is an ensemble, clients may connect to any of the active servers. But they can only execute read queries. If a client wants to execute an update command, it needs to pass through the leader. All update requests are passed through the leader in order to guarantee the availability of data.

How high availability guaranteed by Zookeeper?

When leader receives an update command, it stamps the update with a number that reflects the order of all transactions. Then leader broadcast the update request along with the stamped number and waits till the majority of following servers respond to that message. Once a following server receives the update request message, it checks for the number stamped. If the number stamped is larger than the last recorded transaction of its own log, it responds to the leader saying it agree on the number. Likewise, the leader gets responses from the servers in the ensemble. Only if the majority of servers(followers) in the ensemble have responded acknowledgment, leader enters the requested transaction to its own log and replicates it over the ensemble. Then update execution of the query takes place and responses are sent back to the client.

write request passing through the leader

Data model of Zookeeper

Zookeeper maintains a hierarchical structure of nodes. These nodes are known as znodes. Each znode in the namespace has data associated with it. Each znode might has children associated with it as well. Node structure is almost similar to a standard file structure.

znode hierarchy

Name of a znode is a set of path elements separated by a slash (/). Every znode is identified by a path. There are two types of znodes.

  1. Persistent znodes - These znodes are persisted in disk even if the session that created the znode is no longer active. In order to create a persistent znode create /<path> <data> command can be used.
  2. Ephemeral znodes - These znodes exist until the session that created the znode ends. When the session ends the znode is deleted. Ephemeral znode can be created using create -e /<path> <data> command. By setting -e flag, we indicate, this node is an ephemeral znode.

Commands to interact with ZooKeeper

Once Zookeeper servers are up, run zkCli.cmd in /ZooKeeper/bin. In the command prompt opened, run following commands.

  1. create /app1 <some_arbitary_data> to create a new znode with path “ /app1”
  2. ls / lists all the znodes under root

3. get /app1 shows statistics associated with the znode /app1

znode statistics

Statistics of a znode

Znodes maintain statistics which include data associated with the node, data length, version numbers, transaction Id, times, number of children and session Id of the owner this znode(ephemeralOwner).

Every change made to a node results in increase to one of the version numbers. Each version number represents different meaning. dataVersion represents the number of changes made so far to the data associated with the znode. cVersion represents the number of changes made to the children of the znode. aclVersion represents the number of changes made to the ACL (Access Control List) of the znode.

ctime is created time and mtime is modified time of the znode.

cZxid is created Zookeeper Transaction ID and it is the transaction ID which caused this znode’s creation. mZxid is modified ZooKeeper Transaction ID and it is the transaction ID which belongs to the last modification of this znode.

ephemeralOwner is not equal to zero if this znode is an ephemeral node. Then it is the session id of the owner of this ephemeral znode. Otherwise, it will always be zero for persistent znodes.

4. create /app1/p1 <app1’s_child's_data> creates a child znode for /app1 znode.

5. ls /app1 lists all child znodes of /app1

6. get /app1 gets the statistics of znode /app1

Here we can see numChildren and cVersion have increased by 1. It is because we created child znode /app1/p1

7. get /app1/p1 gives the statistics of znode /app1/p1

So far, we saw a high-level picture of ZooKeeper Service. This can be used to maintain consensus in a distributed system.

Example scenario of ZooKeeper

For an instance, we can imagine of an online game where users are required to enter a username in order to join the game. But this username is not valid once the game is over. In such scenario, there is a requirement which we need to ensure that one username is unique for the session. For this purpose, we can create an ephemeral znode for the user with the provided username. This znode can be used to store data about the player. But once the game is over, the znode will be deleted. Since no two znodes with the same path are allowed, the uniqueness of the username is guaranteed.

Conclusion

Zookeeper is a service which can be used to solve distributed systems problems effectively and it is based on ZAB(ZooKeeper Atomic Broadcast) algorithm. Zookeeper service guarantees updates from clients are applied in the same order that they were sent, and clients see the same view regardless of the server connected.