How to install Zookeeper cluster in Standalone

Guisse
6 min readApr 3, 2023

--

Introduction

Hope this helps anyone who needs to do quick proof of concept POC from deploy a cluster kafka and Zookeeper in standalone.

Before this start, what is a streamin data ?

Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using stream processing techniques without having access to all of the data. In addition, it should be considered that concept drift may happen in the data which means that the properties of the stream may change over time.

It is typically used in the context of Big Data where it is generated from many different sources at high speed. (wikipedia)

1. Zookeeper in standalone

Architecture

Minimuim of nodes 3

Client : Clients, one of the nodes in our distributed application cluster, access information from the server. For a particular time interval, every client sends a message to the server to let the sever know that the client is alive. Similarly, the server sends an acknowledgement when a client connects. If there is no response from the connected server, the client automatically redirects the message to another server.

Server: Server, one of the nodes in our ZooKeeper ensemble, provides all the services to clients. Gives acknowledgement to client to inform that the server is alive.

Ensemble Zookeeper : Group of ZooKeeper servers. The minimum number of nodes that is required to form an ensemble is 3.

  • If we have a single node, then the ZooKeeper ensemble fails when that node fails. It contributes to “Single Point of Failure” and it is not recommended in a production environment.
  • If we have two nodes and one node fails, we don’t have majority as well, since one out of two is not a majority.
  • If we have three nodes and one node fails, we have majority and so, it is the minimum requirement. It is mandatory for a ZooKeeper ensemble to have at least three nodes in a live production environment.
  • If we have four nodes and two nodes fail, it fails again and it is similar to having three nodes. The extra node does not serve any purpose and so, it is better to add nodes in odd numbers.

Leader : Server node which performs automatic recovery if any of the connected node failed. Leaders are elected on service startup.

Follower : Server node which follows leader instruction.

Benefits of Distributed Applications

  • Reliability − Failure of a single or a few systems does not make the whole system to fail.
  • Scalability − Performance can be increased as and when needed by adding more machines with minor change in the configuration of the application with no downtime.
  • Transparency − Hides the complexity of the system and shows itself as a single entity / application.

Installation in standalone

prerequis : have a os linux (ubuntu, centos,, redthat) vm is running

### Step 1: Creating a User for ZooKeeper

a. Create a separate user for the ZooKeeper service by typing:

useradd zookeeper -m

The -m flag creates a home directory for the user. In this case, it will be /home/zookeeper

b. Next, set bash as the default shell for the new user with the command:

usermod --shell /bin/bash zookeeper

c. Set a password for the user:

sudo passwd zookeeper

d. Then, add the user to the sudoers group for it to have sudo privileges:

usermod -aG sudo zookeeper

e. Check to verify that the user is now a superuser by listing the accounts in the sudoers group:

sudo getent group sudo

### Step 2: Creating a ZooKeeper Data Directory

Before wonload ZooKeeper, you can to create a directory structure where it can store configuration and state data.

a. To store the data on the local machine, first create a new ZooKeeper directory by running:

sudo mkdir -p /data/zookeeper

c. Then, give the ZooKeeper user ownership to that directory

chown -R zookeeper:zookeeper /data/zookeeper

### Step 3: Doawnload ZooKeeper

a. Go back to the command line and move to the /opt directory:

cd /opt

b. Use the wget command to download the .tar file. Paste the link copied from the official Apache web page: click here

sudo wget https://dlcdn.apache.org/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz

c. Extract the file by running:

sudo tar -xvf apache-zookeeper-3.7.1-bin.tar.gz

d. Rename the extracted file to zookeeper with the command:

sudo mv apache-zookeeper-3.7.1-bin.tar.gz zookeeper

e. Give the zookeeper user ownership of that file by running:

sudo mv chown -R zookeeper:zookeeper /opt/zookeeper

### Step 4 : Configuring Zookeeper single node

The next step is creating a configuration file for ZooKeeper. The configuration below sets up ZooKeeper in standalone mode (used for developing and testing). For production environments, you need to run

a. To configure ZooKeeper in standalone mode, create a new zoo.cfg file in the zookeeper directory:

sudo vi /opt/zookeeper/conf/zoo.cfg

copy this code in your file zoo.cfg

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=5
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=2
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/tmp/data/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
maxClientCnxns=60

create a directory

sudo mkdir -p /tmp/data/zookeeper

### Step 5: Configuring Zookeeper a Multi nodes

a. This example we have a 3 nodes: copy a directory zookeeper 3

cp -R zookeper zk1 && cp -R zookeeper zk2 && cp -R zookeeper zk3

b. Update a file zoo.cfg

c. Create a zk directory in the lib forlder . that will be zk data as a mentioned in the zk[1,2,3].cfg file.

sudo mkdir -p /tmp/data/zk1 && \
sudo mkdir -p /tmp/data/zk2 && \
sudo mkdir -p /tmp/data/zk3

d. Create a file name myid in the directory

sudo touch /tmp/data/zk1/myid && \
sudo touch /tmp/data/zk2/myid && \
sudo touch /tmp/data/zk3/myid

e. Each zookeeper server should have a unique number in the myid file. For example, server 1 will have value 1, server 2 will have value 2 and so on.

 sudo sh -c "echo '1' > /tmp/data/zk1/myid" && \
sudo sh -c "echo '1' > /tmp/data/zk2/myid" && \
sudo sh -c "echo '1' > /tmp/data/zk3/myid"

### Step 6 : Starting and Connecting to the Zookeeper

a. to start Zoookeeper service use this command

singlenode :

 sudo ./bin/zkServer.sh start conf/zoo.cfg
vérify the process is running
log

multi nodes :

sudo ./zk1/bin/zkServer.sh start zk1/conf/zk1.cfg && \
sudo ./zk2/bin/zkServer.sh start zk1/conf/zk1.cfg && \
sudo ./zk3/bin/zkServer.sh start zk1/conf/zk1.cfg

b. Connect to ZooKeeper with the command:

single node :

 bin/zkCli.sh -server 127.0.0.1:2181

multi nodes:

sh zk1/bin/zkCli.sh -server 127.0.0.1:2181 
# /zk2/bin/zkCli.sh -server 127.0.0.1:2182

c. To close the session, type:

quit

### Step 7 — Creating and Using a Systemd Unit File

a. Use your editor to create a .service file named zk.service at /etc/systemd/system/.

sudo vi /etc/systemd/system/zk.service

b. Add the following lines to the file to define

[Unit]
Description=Zookeeper Daemon Mamadou cire GUISSE
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target

[Service]
Type=forking
WorkingDirectory=/opt/zookeeper
User=zk
Group=zk
ExecStart=/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/zkServer.sh stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/zkServer.sh restart /opt/zookeeper/conf/zoo.cfg
TimeoutSec=30
Restart=on-failure

[Install]
WantedBy=default.target

Save the file and exit the editor.

d. Now that your systemd configuration is in place, you can start the service

sudo systemctl start zk

e. Once you’ve confirmed that your systemd file can successfully start the service, you will enable the service to start on boot.

sudo systemctl enable zk

f. Check the state of the Zookeeper

sudo systemctl [status|stop|restart ] zk

Now that your single or multi-node Zookeeper deployment is ready to use.

you can send me a message if you have a question

Thanks you for learning this Proof of Concept !

--

--

Guisse

Big Data Engineer, Développer FullStack, DevOps and Cloud Builder