Kafka and Zookeeper with Docker

λ.eranga
λ.eranga
Mar 3, 2018 · 7 min read

Background

In this post I’m gonna discuss about some internal details of kafka and zookeeper. Then I will show how to deploy single node kafka, zookeeper service with docker. All the deployment which related to deploying kafka and zookeeper can be found in gitlab. Please clone the repo and continue the post.

About Kafka

Kafka is Fast, Scalable, Durable, and Fault-Tolerant publish-subscribe messaging system which can be used to real time data streaming. We can introduce Kafka as Distributed Commit Log which follows publish subscribe architecture. Like many publish-subscribe messaging systems, Kafka maintains feeds of messages in topics. Producers write data to topics and consumers read from topics. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.

Messages in Kafka are simple byte arrays(String , JSON etc). When publishing, message can be attached to a key. Producer distributes all the messages with same key into same partition.

About Zookeeper

Zookeeper is distributed systems configuration management tool. Zookeeper provides multiple features for distributed applications like distributed configuration management, leader election, consensus handling, coordination and locks etc.

Zookeeper storing its data on tree like structure. It introduced as Data Tree and nodes introduced as zNodes. Zookeeper follows standard unix notations for file paths. For an example /A/B/C denote the path to zNode C, where C has B as its parent and B has A as its parent.

You can read more informations about Zookeeper, it’s architecture and some theoretical concepts behind the Zookeeper from here.

Zookeeper for Kafka

Distributed systems required to do coordination tasks, configuration managements, state managements etc. Some distributed systems built their own tools/mechanism to manage these tasks. Cassandra, Elasticsearch, Mongodb are some examples for these kind of systems. Other distributed systems such as Kafka, Storm, HBase chosen existing configuration management tools like Zookeeper, Etcd etc

Kafka uses Zookeeper to mange following tasks

  1. Electing a controller - The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes.
  2. Cluster membership - Which brokers are alive and part of the cluster? this is also managed through ZooKeeper.
  3. Topic configuration - Which topics exist, how many partitions each has, where are the replicas, who is the preferred leader, what configuration overrides are set for each topic
  4. Manage Quotas - How much data is each client allowed to read and write
  5. Access control - Who is allowed to read and write to which topic (old high level consumer). Which consumer groups exist, who are their members and what is the latest offset each group got from each partition.

Run Kafka and Zookeeper

docker run -d \
--name zookeeper \
-p 2181:2181 \
jplock/zookeeper
docker run -d \
--name kafka \
-p 7203:7203 \
-p 9092:9092 \
-e KAFKA_ADVERTISED_HOST_NAME=10.4.1.29 \
-e ZOOKEEPER_IP=10.4.1.29 \
ches/kafka

KAFKA_ADVERTISED_HOST_NAME is the IP address of the machine(my local machine) which Kafka container running. ZOOKEEPER_IP is the Zookeeper container running machines IP. By this way producers and consumers can access Kafka and Zookeeper by using that IP address.

docker run \
--rm ches/kafka kafka-topics.sh \
--create \
--topic senz \
--replication-factor 1 \
--partitions 1 \
--zookeeper 10.4.1.29:2181

Kafka container’s bin directory contains some shell scripts which can use to manage topics, consumers, publishers etc. By using kafka-topics.sh script I’m creating a topic named senz. 10.4.1.29:2181 is the zookeeper running host and port.

docker run \
--rm ches/kafka kafka-topics.sh \
--list \
--zookeeper 10.4.1.29:2181

This command will list all available topics.

docker run --rm --interactive \
ches/kafka kafka-console-producer.sh \
--topic senz \
--broker-list 10.4.1.29:9092

This command will creates a producer for senz topic. This producer will take inputs from command line and publish them to Kafka. broker-list 10.4.1.29:9092 specifies the host and port of Kafka.

docker run --rm \
ches/kafka kafka-console-consumer.sh \
--topic senz \
--from-beginning \
--zookeeper 10.4.1.29:2181

This command will create consumer for senz topic. This consumer will take the messages from topic and output into the command line. from-beginning parameter specifies to take the messages from the beginning of the topic. zookeeper 10.4.1.29:2181 specifies the host and port of Zookeeper.

Kafkacat client

There is kafkacat command line client utility tool to interact with kafka. kafkacat supports do various operation on kafka. For an example list topics, create topics, create publisher/consumers etc.

kafkacat can be installed with apt in linux or brew on macos. Following is the way to install kafkacat.

# linux
sudo apt-get install kafkacat# macos
# macos
brew install kafkacat
# command
kafkacat -L -b <kafka broker host>:<kafka broker port>
# example
kafkacat -L -b 169.254.252.155:9092

This command will connects to kafka broker in 169.254.252.155 server and list all available topics.

# command 
kafkacat -P -b <kafka broker host>:<kafka broker port> -t <topic>
# example
kafkacat -P -b 169.254.252.155:9092 -t senz

This command will creates a publisher for a topic named senz and publish the messages to that topic. If the given topic name does not exists, it will automatically create a topic with given name.

# command 
kafkacat -C -b <kafka broker host>:<kafka broker port> -t <topic>
# example
kafkacat -C -b 169.254.252.155:9092 -t senz

This command will creates a consumer for senz topic. In order to create a consumer the topic should be exists, otherwise it will raise an error.

Inside Zookeeper

We can go inside to Zookeeper container and see how Kafka related data(broker, publisher details) are stored.

docker exec -it zookeeper bash

Inside the bin directory we can find the commands which are available to manage Zookeeper

bin/zkCli.sh -server 127.0.0.1:2181

Since we are inside the Zookeeper container, we can specify server address as the localhost(127.0.0.1).

ls /
ls /brokers
ls /brokers/topics
ls /consumers
ls /consumers/console-consumer-1532/owners

Mutli node kafka cluster

In this article I have deployed one kafka broker with one zookeeper node. To prevent a single point of failure we need to run multi node kafka cluster. In here I have written about deploying multi-node kafka cluster.

Kubernetes deployment

The final step is to integrate docker deployment with kubernetes. I have written an article about deploying kafaka/zookeeper clsuter with kubernetes. You can read it from here.

Reference

  1. https://www.quora.com/What-is-the-actual-role-of-Zookeeper-in-Kafka-What-benefits-will-I-miss-out-on-if-I-don%E2%80%99t-use-Zookeeper-and-Kafka-together
  2. https://stackoverflow.com/questions/23751708/kafka-is-zookeeper-a-must
  3. https://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
  4. http://cloudurable.com/blog/what-is-kafka/index.html
  5. https://medium.com/@itseranga/kafka-zookeeper-cluster-on-kubernetes-43a4aaf27dbb

Rahasak

Have less, be more

λ.eranga

Written by

λ.eranga

Scala, Golang with Vim and Hockey: What else does a man need to be happy :)

Rahasak

Rahasak

Have less, be more

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade