Hello Kafka World! The complete guide to Kafka with Docker and Python
What is Kafka? — A quick overview
For a complete guide, the Kafka documentation does an excellent job. Below is just a quick overview.
Apache Kafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.
Kafka stores streams of records (messages) in topics. Each record consists of a key, a value, and a timestamp.
Producers write data to topics and consumers read from topics.
Topics and Logs
A topic is a category to which records are published. It can have zero, one, or many consumers that subscribe to the data written to it.
For each topic, the Kafka cluster maintains a partitioned log. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.
Producers and Consumers
Producers publish data to the topics of their choice. It is responsible for choosing which record to assign to which partition within the topic. This is usually done in a round-robin fashion or partitioning by a key or value.
Consumer groups can subscribe to one or more topics. Each one of these groups can be configured with multiple consumers.
Every message in a topic is delivered to one of the consumer instances inside the group subscribed to that topic. All messages with the same key arrive at the same consumer.
Setting up Kafka (and Zookeeper) with Docker
The following steps use bash commands. If you are on Windows use the equivalents.
Setting up Kafka’s Docker
Clone the Kafka’s git project and initialize the docker image.
Hello Kafka World!
To start it just run the command:
> ./start-kafka-shell.sh <DOCKER_HOST_IP/KAFKA_ADVERTISED_HOST_NAME># In my case:
> ./start-kafka-shell.sh 172.17.0.1
From within the Kafka Shell, run the following to create and describe a topic:
> $KAFKA_HOME/bin/kafka-topics.sh --create --topic test \
--partitions 4 --replication-factor 2 \
--bootstrap-server `broker-list.sh`> $KAFKA_HOME/bin/kafka-topics.sh --describe --topic test \
Initialize the producer and write messages to Kafka’s brokers.
> $KAFKA_HOME/bin/kafka-console-producer.sh --topic=test \
>> Hello World!
>> I'm a Producer writing to 'hello-topic'
Initialize the consumer from another Kafka terminal and it will start reading the messages sent by the producer.
> $KAFKA_HOME/bin/kafka-console-consumer.sh --topic=test \
--from-beginning --bootstrap-server `broker-list.sh`
Kafka and Python
If you are inside the Kafka Shell, you’ll need to install python3:
> apk add python3
Install Kafka’s python package and initialize python’s shell:
> pip3 install kafka
You can also try Confluent’s Kafka Python Package
Python Kafka Consumer
From one of the Kafka Shell’s instances, run python3 and write:
- auto_offset_reset: Tells the consumer from where to start reading if it crashes. ‘earliest’ will move to the oldest available message, ‘latest’ will move to the most recent.
- enable_auto_commit: If True, the consumer’s offset will be periodically committed in the background
- value_deserializer: A method that defines how to deserialize the data. In this case, it will read the data coming in JSON format from the producer.
- bootstrap_servers: In my case, is the output of running ‘broker-list.sh’
Python Kafka Producer
From another Kafka Shell’s instance, run python3 and write:
Access Kafka from outside Docker
If you want to run the code above from outside Kafka’s shell (outside a Docker container) you’ll need to do some easy changes. Let’s start!
Expose the Docker network and Kafka brokers
You’ll need to edit the docker-compose.yml file so it looks like this:
We basically added some environment to expose Kafka brokers to the host machine, if you want to read more about this, you can read this great explanation or read more about docker and Kafka’s connectivity.
Consume/Produce from within a Docker container
Previously, you had to start a Kafka Shell and then create a consumer/producer that connected to one of the servers in ‘broker-list.sh’.
With this new configuration, you’ll need to initialize the consumer/producer from within the Kafka docker and connect to the host kafka:9092.
Consume/Produce from Python in your host machine
Same code as above, but you’ll need to change the bootstrap_servers to be ‘localhost:9092’