Journey of Apache Kafka & Zookeeper Administrator ( Part 2 )

Davinder Pal
Analytics Vidhya
Published in
5 min readAug 21, 2020

June 2019 ( continued ) ( Apache Zookeeper )

In the Previous Article, I have explained that what will be the structure of installation folders so I just had to implement it that’s where Ansible shows its Magic on Apache Zookeeper.

I have written Ansible Playbooks which automate each aspect of Apache Zookeeper Administration.

Example:
1. Basic Setup
2. Configuration Modifications like Jvm / Logging /etc.
3. Production Optimization for OS + Zookeeper
4. Upgrade Cluster
5. Logging, Monitoring & Alerting Setup

GitHub CodeBase: 116davinder/zookeeper-cluster-ansible

Common
It contains basic tasks for the Apache Zookeeper setup process like
1. Install packages like wget/tar/nc/net-tools.
2. Creating zookeeper user and groups.
3. Creating required directories like data & logs.
4. Do System Tuning like OS/Network/File System.
5. Currently Disable System Firewall or IP Tables.

Java
This role will allow user to install / upgrade different versions of java like 1.8 / 11/ 13 / 14 / etc.

Install
1. It will upload Apache Zookeeper Tar.gz from Ansible Server to Zookeeper Nodes.
2. It will unpack Tar.gz to a given location like the “/zookeeper” folder.
3. It will create a symbolic link for the zookeeper to the given version of Apache Zookeeper.
4. It will create “/etc/profile.d/zookeeper.sh” for environment setup.

Configure
This role will actually create/update configuration for Apache Zookeeper.
1. create or update zoo.cfg / log4j.properties / java.env.
2. myid is being auto-generated for each host based on its IP Address.

Service
This role will create/update the SystemD file for Apache Zookeeper.

ServiceState
This role will allow the user to restart/stop/start service.

PortCheck
This role will allow the user to make a basic check on a given port with status up or down.

Nri-Zookeeper
This role will allow the user to install New Relic based Monitoring setup for Apache Zookeeper.

Let’s do it :)

When I was writing ansible playbooks for Apache Zookeeper at that time I realized that a couple of problems should be solved first.

Problem 1: how to generate predictable myid for each zookeeper node?

Method 1: # Using Jinja 2 / Native Ansible ( Preferred )
{% set id = hostvars[inventory_hostname]['ansible_default_ipv4']['address'].split('.')[3] | int | abs %}{{ id }}
Method 2: # Using Bash Shell
shell: "echo {{ ansible_ssh_host }} | cut -d . -f 4"

Problem 2: how to set JVM Properties for Zookeeper?

Research on Google provided me with a couple of options but none of them was native so I finally checked Apache Zookeeper Documentation for it and they recommend using java.env but then I was wondering what format I have to use in this file and after a couple of more searches on google and couple of hit & retry one format worked for me.

export JVMFLAGS="-Xmx{{ zookeeperXmx }} -Xms{{ zookeeperXms }}"

Problem 3: how to set generate zoo. cfg for each server?

It was another problem where I had to add each zookeeper node address into the zoo. cfg with its myid.

Method 1: # Using Jinja 2 / Native Ansible ( Preferred )
{% for host in groups['clusterNodes'] %}
{% if not host | ipaddr %}
{% set ip = hostvars[host]['ansible_default_ipv4']['address'] %}
{% else %}
{% set ip = host %}
{% endif %}
{% set id = ip.split('.')[3] | int | abs %}
server.{{ id }}={{ ip }}:2888:3888
{% endfor %}
Method 2: # Using Bash Shell
shell: "echo server.$(echo {{ item }} | cut -d . -f 4)={{ item }}:2888:3888 >> {{ zookeeper_install_dir }}/conf/zoo.cfg"
loop:
- "{{ groups['zookeeper'] }}"

Problem 4: Optimization
why optimization? what to optimize? how to optimize?
I had to read quite a lot of blogs about Red Hat 7 Optimizations for Apache Zookeeper. Finally, I was able to gather different tweaks for OS / Network / Ulimits.
Ansible Code: systemTuning.yml

Problem 5: how to monitor logs?

This was the easiest one, I had to use Splunk. I was using one index for Apache Kafka and Apache Zookeeper.

[default]
host = $HOSTNAME

[monitor:///zookeeper/zookeeper-logs/*.out]
disabled = false
index = kafka
sourcetype = zookeeper
crcSalt = <SOURCE>

Problem 6: how & what to monitor Apache Zookeeper?

This was also one of the easiest solutions, I had to use New Relic that time.
Just follow steps from newrelic/nri-zookeeper.

After following New Relic Guide, I found that I have deployed more than 10 different clusters of Apache Zookeeper so how I am gonna differentiate these clusters in New Relic Dashboard and New Relic have a very elegant solution for it. Update the zookeeper-config.yml to use Labels. Now each cluster will have its own unique environment name.

integration_name: com.newrelic.zookeeperinstances:
- name: {{ ansible_fqdn }}
command: metrics
arguments:
host: localhost
port: 2181
cmd: nc
labels:
role: zookeeper
env: {{ zookeeperEnvironment }}

Once I was able to publish Apache Zookeeper Metric then I realized that System Metrics ( CPU / Memory / Disk ) was also published using New Relic Infra Agent and I had to use those as well in my Dashboards so I did little more research on how to find those metrics with ease. Luckily :) New Relic Infra Agent Support labels as well so I just need to update its config ( /etc/newrelic-infra.yml ) for the same labels.

custom_attributes:
label.env: {{ zookeeperEnvironment }}

Finally! Ansible Code Worked :)
It took me a couple of days because of the above-mentioned problems. “If you are persistent then problems will go away eventually”.

I used a flexible approach for my Ansible Code because I want to update each process/configuration of Apache Zookeeper to updated Separately if required like JVM / Logging / Zoo.cfg / etc.

Base Playbooks
clusterSetup.yml: It will install Apache Zookeeper on a given environment.
clusterNewRelicSetup.yml: It will install New Relic Monitoring Setup.
clusterUpgrade.yml: It will upgrade Apache Zookeeper to New Version.

Maintenance Playbooks
Notes*:
Below Playbooks will restart Apache Zookeeper in Rolling Fashion to avoid Outage.
clusterJava.yml: It will install/update java packages.
clusterJvmConfigs.yml: It will update java.env.
clusterLogging.yml: It will update log4j.properties.
clusterRemoveNodes.yml: It will decommission nodes.
clusterRemoveOldVersions.yml: It will remove old versions of configs folders.
clusterSystemUpgrade.yml: It will upgrade OS if required.
clusterRollingRestart.yml: It will do a rolling restart of Apache Zookeeper.

Manual Steps :(
Creating New Relic Dashboard was another challenge because before this I never used it and It was a bit of a learning curve for me.

Couple of things to remember, New Relic Infra Agent Publish Metric to a different database in New Relic Insights.
SystemSample: Use to Store CPU Metrics.
StorageSample: Use to Store Disk Metrics.
NetworkSample: Use to Store Network Metrics.
ZookeeperSample: Use to Store Actual Zookeeper Metrics.

Use New Relic API Explorer to import the below dashboard JSON code.
New Relic Dashboard Code: newrelic-dashboard-zookeeper.json
New Relic Dashboard Sample: Apache-Zookeeper.pdf

My GitHub Repository has other Playbooks / Roles as well but I will cover them in the next articles because it’s my story and this article is not right for them.

The journey of Apache Kafka will start in the Next Article!

--

--