Journey of Apache Kafka & Zookeeper Administrator ( Part 2 )

Published in

Analytics Vidhya

5 min readAug 21, 2020

June 2019 ( continued ) ( Apache Zookeeper )

In the Previous Article, I have explained that what will be the structure of installation folders so I just had to implement it that’s where Ansible shows its Magic on Apache Zookeeper.

I have written Ansible Playbooks which automate each aspect of Apache Zookeeper Administration.

Example:
1. Basic Setup
2. Configuration Modifications like Jvm / Logging /etc.
3. Production Optimization for OS + Zookeeper
4. Upgrade Cluster
5. Logging, Monitoring & Alerting Setup

GitHub CodeBase: 116davinder/zookeeper-cluster-ansible

Common
It contains basic tasks for the Apache Zookeeper setup process like
1. Install packages like wget/tar/nc/net-tools.
2. Creating zookeeper user and groups.
3. Creating required directories like data & logs.
4. Do System Tuning like OS/Network/File System.
5. Currently Disable System Firewall or IP Tables.

Java
This role will allow user to install / upgrade different versions of java like 1.8 / 11/ 13 / 14 / etc.

Install
1. It will upload Apache Zookeeper Tar.gz from Ansible Server to Zookeeper Nodes.
2. It will unpack Tar.gz to a given location like the “/zookeeper” folder.
3. It will create a symbolic link for the zookeeper to the given version of Apache Zookeeper.
4. It will create “/etc/profile.d/zookeeper.sh” for environment setup.

Configure
This role will actually create/update configuration for Apache Zookeeper.
1. create or update zoo.cfg / log4j.properties / java.env.
2. myid is being auto-generated for each host based on its IP Address.

Service
This role will create/update the SystemD file for Apache Zookeeper.

ServiceState
This role will allow the user to restart/stop/start service.

PortCheck
This role will allow the user to make a basic check on a given port with status up or down.

Nri-Zookeeper
This role will allow the user to install New Relic based Monitoring setup for Apache Zookeeper.

Let’s do it :)

When I was writing ansible playbooks for Apache Zookeeper at that time I realized that a couple of problems should be solved first.

Problem 1: how to generate predictable myid for each zookeeper node?

Method 1: # Using Jinja 2 / Native Ansible ( Preferred )
{% set id = hostvars[inventory_hostname]['ansible_default_ipv4']['address'].split('.')[3] | int | abs %}{{ id }}Method 2: # Using Bash Shell
shell: "echo {{ ansible_ssh_host }} | cut -d . -f 4"

Problem 2: how to set JVM Properties for Zookeeper?

Research on Google provided me with a couple of options but none of them was native so I finally checked Apache Zookeeper Documentation for it and they recommend using java.env but then I was wondering what format I have to use in this file and after a couple of more searches on google and couple of hit & retry one format worked for me.

export JVMFLAGS="-Xmx{{ zookeeperXmx }} -Xms{{ zookeeperXms }}"

Problem 3: how to set generate zoo. cfg for each server?

It was another problem where I had to add each zookeeper node address into the zoo. cfg with its myid.

Method 1: # Using Jinja 2 / Native Ansible ( Preferred )
{% for host in groups['clusterNodes'] %}
{% if not host | ipaddr %}
{% set ip = hostvars[host]['ansible_default_ipv4']['address'] %}
{% else %}
{% set ip = host %}
{% endif %}
{% set id = ip.split('.')[3] | int | abs %}
server.{{ id }}={{ ip }}:2888:3888
{% endfor %}Method 2: # Using Bash Shell
shell: "echo server.$(echo {{ item }} | cut -d . -f 4)={{ item }}:2888:3888 >> {{ zookeeper_install_dir }}/conf/zoo.cfg"
loop:
  - "{{ groups['zookeeper'] }}"

Problem 4: Optimization
why optimization? what to optimize? how to optimize?
I had to read quite a lot of blogs about Red Hat 7 Optimizations for Apache Zookeeper. Finally, I was able to gather different tweaks for OS / Network / Ulimits.
Ansible Code: systemTuning.yml

Problem 5: how to monitor logs?

This was the easiest one, I had to use Splunk. I was using one index for Apache Kafka and Apache Zookeeper.

[default]
host = $HOSTNAME

[monitor:///zookeeper/zookeeper-logs/*.out]
disabled = false
index = kafka
sourcetype = zookeeper
crcSalt = <SOURCE>

Problem 6: how & what to monitor Apache Zookeeper?

This was also one of the easiest solutions, I had to use New Relic that time.
Just follow steps from newrelic/nri-zookeeper.

After following New Relic Guide, I found that I have deployed more than 10 different clusters of Apache Zookeeper so how I am gonna differentiate these clusters in New Relic Dashboard and New Relic have a very elegant solution for it. Update the zookeeper-config.yml to use Labels. Now each cluster will have its own unique environment name.

integration_name: com.newrelic.zookeeperinstances:
  - name: {{ ansible_fqdn }}
    command: metrics
    arguments:
      host: localhost
      port: 2181
      cmd: nc
    labels:
      role: zookeeper
      env: {{ zookeeperEnvironment }}

Once I was able to publish Apache Zookeeper Metric then I realized that System Metrics ( CPU / Memory / Disk ) was also published using New Relic Infra Agent and I had to use those as well in my Dashboards so I did little more research on how to find those metrics with ease. Luckily :) New Relic Infra Agent Support labels as well so I just need to update its config ( /etc/newrelic-infra.yml ) for the same labels.

custom_attributes:
  label.env: {{ zookeeperEnvironment }}

Finally! Ansible Code Worked :)
It took me a couple of days because of the above-mentioned problems. “If you are persistent then problems will go away eventually”.

I used a flexible approach for my Ansible Code because I want to update each process/configuration of Apache Zookeeper to updated Separately if required like JVM / Logging / Zoo.cfg / etc.

Base Playbooks
clusterSetup.yml: It will install Apache Zookeeper on a given environment.
clusterNewRelicSetup.yml: It will install New Relic Monitoring Setup.
clusterUpgrade.yml: It will upgrade Apache Zookeeper to New Version.

Maintenance Playbooks
Notes*: Below Playbooks will restart Apache Zookeeper in Rolling Fashion to avoid Outage.
clusterJava.yml: It will install/update java packages.
clusterJvmConfigs.yml: It will update java.env.
clusterLogging.yml: It will update log4j.properties.
clusterRemoveNodes.yml: It will decommission nodes.
clusterRemoveOldVersions.yml: It will remove old versions of configs folders.
clusterSystemUpgrade.yml: It will upgrade OS if required.
clusterRollingRestart.yml: It will do a rolling restart of Apache Zookeeper.

Manual Steps :(
Creating New Relic Dashboard was another challenge because before this I never used it and It was a bit of a learning curve for me.

Couple of things to remember, New Relic Infra Agent Publish Metric to a different database in New Relic Insights.
SystemSample: Use to Store CPU Metrics.
StorageSample: Use to Store Disk Metrics.
NetworkSample: Use to Store Network Metrics.
ZookeeperSample: Use to Store Actual Zookeeper Metrics.

Use New Relic API Explorer to import the below dashboard JSON code.
New Relic Dashboard Code: newrelic-dashboard-zookeeper.json
New Relic Dashboard Sample: Apache-Zookeeper.pdf

My GitHub Repository has other Playbooks / Roles as well but I will cover them in the next articles because it’s my story and this article is not right for them.

The journey of Apache Kafka will start in the Next Article!

Journey of Apache Kafka & Zookeeper Administrator ( Part 2 )

Let’s do it :)

Written by Davinder Pal