Hello Kafka World! The complete guide to Kafka with Docker and Python

Doron Vainrub
Jun 6, 2019 · 4 min read
Image for post
Image for post

What is Kafka? — A quick overview

For a complete guide, the Kafka documentation does an excellent job. Below is just a quick overview.

Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.

Kafka stores streams of records (messages) in topics. Each record consists of a key, a value, and a timestamp.

Producers write data to topics and consumers read from topics.

A topic is a category to which records are published. It can have zero, one, or many consumers that subscribe to the data written to it.

For each topic, the Kafka cluster maintains a partitioned log. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.

Image for post
Image for post

Producers publish data to the topics of their choice. It is responsible for choosing which record to assign to which partition within the topic. This is usually done in a round-robin fashion or partitioning by a key or value.

Consumer groups can subscribe to one or more topics. Each one of these groups can be configured with multiple consumers.

Every message in a topic is delivered to one of the consumer instances inside the group subscribed to that topic. All messages with the same key arrive at the same consumer.

Image for post
Image for post

Setting up Kafka (and Zookeeper) with Docker

The following steps use bash commands. If you are on Windows use the equivalents.

Image for post
Image for post

Install Docker and Docker Compose.

We are going to use these two docker images: wurstmeister/kafka and wurstmeister/zookeeper.

Clone the Kafka’s git project and initialize the docker image.

Hello Kafka World!

Image for post
Image for post

To start it just run the command:

> ./start-kafka-shell.sh <DOCKER_HOST_IP/KAFKA_ADVERTISED_HOST_NAME># In my case:
> ./start-kafka-shell.sh 172.17.0.1

From within the Kafka Shell, run the following to create and describe a topic:

> $KAFKA_HOME/bin/kafka-topics.sh --create --topic test \
--partitions 4 --replication-factor 2 \
--bootstrap-server `broker-list.sh`
> $KAFKA_HOME/bin/kafka-topics.sh --describe --topic test \
--bootstrap-server `broker-list.sh`

Initialize the producer and write messages to Kafka’s brokers.

> $KAFKA_HOME/bin/kafka-console-producer.sh --topic=test \
--broker-list=`broker-list.sh`
>> Hello World!
>> I'm a Producer writing to 'hello-topic'

Initialize the consumer from another Kafka terminal and it will start reading the messages sent by the producer.

> $KAFKA_HOME/bin/kafka-console-consumer.sh --topic=test \
--from-beginning --bootstrap-server `broker-list.sh`

Kafka and Python

If you are inside the Kafka Shell, you’ll need to install python3:

> apk add python3

Install Kafka’s python package and initialize python’s shell:

> pip3 install kafka
> python3

You can also try Confluent’s Kafka Python Package

From one of the Kafka Shell’s instances, run python3 and write:

  • auto_offset_reset: Tells the consumer from where to start reading if it crashes. ‘earliest’ will move to the oldest available message, ‘latest’ will move to the most recent.
  • enable_auto_commit: If True, the consumer’s offset will be periodically committed in the background
  • value_deserializer: A method that defines how to deserialize the data. In this case, it will read the data coming in JSON format from the producer.
  • bootstrap_servers: In my case, is the output of running ‘broker-list.sh’

From another Kafka Shell’s instance, run python3 and write:

Access Kafka from outside Docker

Image for post
Image for post
https://blog.docker.com/2016/05/docker-labs-docker-community/illus5-colorful-square/

If you want to run the code above from outside Kafka’s shell (outside a Docker container) you’ll need to do some easy changes. Let’s start!

You’ll need to edit the docker-compose.yml file so it looks like this:

We basically added some environment to expose Kafka brokers to the host machine, if you want to read more about this, you can read this great explanation or read more about docker and Kafka’s connectivity.

Previously, you had to start a Kafka Shell and then create a consumer/producer that connected to one of the servers in ‘broker-list.sh’.

With this new configuration, you’ll need to initialize the consumer/producer from within the Kafka docker and connect to the host kafka:9092.

Same code as above, but you’ll need to change the bootstrap_servers to be ‘localhost:9092’

Big Data Engineering

Best technical posts about data (we love both small and…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store