A Simulation of Distributed Streaming

A Simulation of Distributed Streaming: Consumer-producer communication using Python, Docker, and FastAPI with Apache Kafka

piash.tanjin
Cloud Native Daily
5 min readMay 14, 2023

--

Key Insights You’ll Gain From This Article

  • What is distributed streaming
  • What streaming stands for
  • What is Kafka
  • What is zookeeper
  • Key component of Kafka
  • Kafka Using Docker
  • Connection between producer and consumer

What is distributed streaming?
Distributed streaming refers to the processing of continuous, real-time data streams across a distributed system. One of the primary benefits of distributed streaming is the ability to process and analyze large volumes of data in real-time

What streaming stands for
The term “streaming” refers to the continuous transmission or processing of data, rather than the transfer of discrete packets or files.

What is Kafka?
Apache Kafka is a distributed stream streaming platform and its primary purpose is to provide a distributed platform for handling large volumes of real-time data streams.

What is ZooKeeper?
ZooKeeper is an open-source, distributed coordination service that is often used in distributed systems to manage configuration information, naming, synchronization, and other critical aspects of distributed applications. It uses distributed tree-like data structure called a znode. ZooKeeper uses the Zab (ZooKeeper Atomic Broadcast) protocol to ensure consistency and order of updates in the distributed system apart from this ZooKeeper replicates its data across a cluster of servers known as the ensemble, which provides fault tolerance by ensuring that multiple copies of the data are available across different nodes. This means that even if some nodes fail or become unavailable, the remaining nodes in the ensemble can continue to operate and serve requests from clients. This is why along with Kafka we need ZooKeeper to make it work together as a distributed systems.

Key components of Kafka

  1. Producers: Producers are responsible for publishing data records to Kafka topics. A producer sends messages to a specific topic, which can be thought of as a category or feed name to which records are published.
  2. Topics: Topics are the central channels or feeds in Kafka where messages are published. A topic represents a particular stream of data. Producers publish messages to topics, and consumers read messages from topics.
  3. Consumers: Consumers are responsible for subscribing to Kafka topics and processing the messages published to those topics. They read messages from topics and can process them in real time or store them for further analysis or processing.
  4. Partition: Partition pretty much works like database sharding. In Kafka,
    a partition is a linearly ordered, immutable sequence of records within a Kafka topic. Each partition is an independent and ordered log, and all messages within a partition are assigned a unique offset.
  5. Offset: An offset is a unique identifier that represents the position of a message within a partition of a topic. It is an integer value that is assigned to each message as it is written to the partition. The offset is used by consumers to keep track of their progress in reading messages from a particular partition.
  6. Consumer Group: Consumer group is a logical grouping of consumers that work together to consume data from one or more Kafka topics. A consumer group allows for parallel processing of messages and provides benefits such as load balancing, fault tolerance, and high throughput

Kafka Using Docker
In this tutorial, we will implement Kafka using Docker.
First, clone the GitHub repository https://github.com/TanjinAlam/sync-kafka-producer-consumer.git

The entire Docker compose file has the following services
1. Zookeeper
2. Kafka
3. Kafdrop
4. Consumer
5. Publisher

Inside Docker, we are using http://kafka:29029 as the Kafka broker server.

The connection between producer and consumer

Start container -> docker compose up
Producer : http://localhost:8000/docs

kafka producr

Consumer: http://localhost:8001/docs

kafka consumer

Kafdrop: http://localhost:9000

kafka cluster information

Let's see how the communication happens between producer and consumer.

Now make a post request from localhost:8000/docs
which is a producer

Now since we have made a request from producer to consumer.
We will be able to see the logs in the docker container but before that
Lets make a get request from consumer api from localhost:8001/docs
Which will help us to refresh the logs in docker container to see the consume msg that consumer got from producer.

consumer api to invoke get request

So after invoking let's jump into the docker container to see the details logs
check the publisher and consumer docker container log.

use docker logs consumer -f to see the log of consumer container logs
use docker logs publisher -f to see the log of producer container logs

--

--