Hello Kafka World! The complete guide to Kafka with Docker and Python

Doron Vainrub
Big Data Engineering
4 min readJun 6, 2019

What is Kafka? — A quick overview

For a complete guide, the Kafka documentation does an excellent job. Below is just a quick overview.

Pub/Sub system

Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.

Kafka stores streams of records (messages) in topics. Each record consists of a key, a value, and a timestamp.

Producers write data to topics and consumers read from topics.

Topics and Logs

A topic is a category to which records are published. It can have zero, one, or many consumers that subscribe to the data written to it.

For each topic, the Kafka cluster maintains a partitioned log. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.

Producers and Consumers

Producers publish data to the topics of their choice. It is responsible for choosing which record to assign to which partition within the topic. This is usually done in a round-robin fashion or partitioning by a key or value.

Consumer groups can subscribe to one or more topics. Each one of these groups can be configured with multiple consumers.

Every message in a topic is delivered to one of the consumer instances inside the group subscribed to that topic. All messages with the same key arrive at the same consumer.

Setting up Kafka (and Zookeeper) with Docker

The following steps use bash commands. If you are on Windows use the equivalents.

Docker

Install Docker and Docker Compose.

We are going to use these two docker images: wurstmeister/kafka and wurstmeister/zookeeper.

Setting up Kafka’s Docker

Clone the Kafka’s git project and initialize the docker image.

Hello Kafka World!

Kafka Shell

To start it just run the command:

> ./start-kafka-shell.sh <DOCKER_HOST_IP/KAFKA_ADVERTISED_HOST_NAME># In my case:
> ./start-kafka-shell.sh 172.17.0.1

Hello Topic

From within the Kafka Shell, run the following to create and describe a topic:

> $KAFKA_HOME/bin/kafka-topics.sh --create --topic test \
--partitions 4 --replication-factor 2 \
--bootstrap-server `broker-list.sh`
> $KAFKA_HOME/bin/kafka-topics.sh --describe --topic test \
--bootstrap-server `broker-list.sh`

Hello Producer

Initialize the producer and write messages to Kafka’s brokers.

> $KAFKA_HOME/bin/kafka-console-producer.sh --topic=test \
--broker-list=`broker-list.sh`
>> Hello World!
>> I'm a Producer writing to 'hello-topic'

Hello Consumer

Initialize the consumer from another Kafka terminal and it will start reading the messages sent by the producer.

> $KAFKA_HOME/bin/kafka-console-consumer.sh --topic=test \
--from-beginning --bootstrap-server `broker-list.sh`

Kafka and Python

If you are inside the Kafka Shell, you’ll need to install python3:

> apk add python3

Install Kafka’s python package and initialize python’s shell:

> pip3 install kafka
> python3

You can also try Confluent’s Kafka Python Package

Python Kafka Consumer

From one of the Kafka Shell’s instances, run python3 and write:

  • auto_offset_reset: Tells the consumer from where to start reading if it crashes. ‘earliest’ will move to the oldest available message, ‘latest’ will move to the most recent.
  • enable_auto_commit: If True, the consumer’s offset will be periodically committed in the background
  • value_deserializer: A method that defines how to deserialize the data. In this case, it will read the data coming in JSON format from the producer.
  • bootstrap_servers: In my case, is the output of running ‘broker-list.sh’

Python Kafka Producer

From another Kafka Shell’s instance, run python3 and write:

Access Kafka from outside Docker

https://blog.docker.com/2016/05/docker-labs-docker-community/illus5-colorful-square/

If you want to run the code above from outside Kafka’s shell (outside a Docker container) you’ll need to do some easy changes. Let’s start!

Expose the Docker network and Kafka brokers

You’ll need to edit the docker-compose.yml file so it looks like this:

We basically added some environment to expose Kafka brokers to the host machine, if you want to read more about this, you can read this great explanation or read more about docker and Kafka’s connectivity.

Consume/Produce from within a Docker container

Previously, you had to start a Kafka Shell and then create a consumer/producer that connected to one of the servers in ‘broker-list.sh’.

With this new configuration, you’ll need to initialize the consumer/producer from within the Kafka docker and connect to the host kafka:9092.

Consume/Produce from Python in your host machine

Same code as above, but you’ll need to change the bootstrap_servers to be ‘localhost:9092’

--

--

Doron Vainrub
Big Data Engineering

I write about how to use AI and technology to automate the boring stuff and spend more time on the things you love | Data Engineer @Meta | Building sumly.ai