Multi host kafka cluster with docker

(λx.x)eranga
Effectz.AI
Published in
2 min readMar 16, 2019

Background

In my previous post I have explained how to deploy kafka and zookeeper with docker. On that post I have deployed one zookeeper node and one kafka node with docker, all the services deployed on single host. In this posts I’m gonna show about deploying multi node kafka cluster with 3 zookeeper nodes and 3 kafka nodes.

All the deployment files and source codes which relates to this post are available on this gitlab repo. Please clone the repo and continue the post.

Server setup

I’m gonna deploy kafka nodes and zookeeper nodes in three different servers. Following is the server setup. Each server running one zookeeper node and one kafka node.

Docker compose

I’m deploying all the services with docker compose. Following is the docker-compose.yml file. It contains 3 zookeeper node configurations and 3 kafka nodes configurations.

Kafka log retention policy

In above compose file I have defined the kafka Log Retention Policy configuration. Kafka uses Log data structure to manage its messages. Log data structure is basically an ordered set of Segments (Segment is a collection of messages). Kafka provides retention at Segment level instead of at Message level. Hence, Kafka keeps on removing Segments from its end as these violate retention policies. There are two types of retention policies on kafka.

  1. Time based retention - Defines how much time a segment can live. After this time exceeds segments will mark for delete. Default retention time for segments is 7 days.
  2. Size based retention - Defines maximum size of a Log data structure for a Topic partition. When topic reaches this size it will starts to delete the segments from its end. This policy is ideal when we need to control the size of the log due to limited disk spaces.

I have setup these configurations with two environment variables. KAFKA_LOG_RETENTION_HOURS defines the log.retention.hours(time based retention) value and KAFKA_LOG_RETENTION_BYTES defines the log.retention.bytes(size based retention)value. These fields set on kafka configuration file which locates inside the kafka container(/opt/kafka/config/server.properties).

Deploy services

As mentioned previously I’m deploying these services on three servers. Each server runs one zookeeper node and one kafka node. SSH into different servers and deploy the services in following order.

Reference

  1. https://medium.com/rahasak/kafka-and-zookeeper-with-docker-65cff2c2c34f
  2. https://www.allprogrammingtutorials.com/tutorials/configuring-messages-retention-time-in-kafka.php
  3. https://stackoverflow.com/questions/42311100/kafka-optimal-retention-and-deletion-policy
  4. https://stackoverflow.com/questions/52970153/kafka-how-to-avoid-running-out-of-disk-storage

--

--