Get Started with Kafka Producers and Consumers

Janani Seelam
3 min readAug 21, 2023

--

What can be expected:

  1. Understanding on Kafka Topics, Partitions, Producers, and Consumers
  2. Understand how records are produced and consumed

Prerequisites

Understanding of any publisher/subscriber model

Overview

Excited to learn..? Let’s get started.

Kafka is built based on the publisher/subscriber model, which means an application/user can produce records to Kafka and these records can be consumed by same/another application/user. Terminologies used are as below,

Producer: A process in the system which can produce the records

Consumer: A process in the system consuming the produced records

Consumer groups: Group of consumers

Topics: Logical grouping of the similar data

Partitions: Topics are split into multiple partitions to store records

Kafka Local Setup

  1. Download kafka from https://www.apache.org/dyn/closer.cgi?path=/kafka/3.5.0/kafka_2.13-3.5.0.tgz
  2. Untar and cd to kafka_2.13–3.5.0
  3. Run bin/zookeeper-server-start.sh config/zookeeper.properties to start zookeeper
  4. Run bin/kafka-server-start.sh config/server.properties to start broker

Now, we have basic Kafka setup ready. Let us understand how produced records are consumed by consumer in a consumer group.

Create topic through which producer and consumer know where to send/receive the records.

cd ~/Path/To/kafka_2.13–3.5.0
bin/kafka-topics.sh --topic test-topic --replication-factor 2 --create --zookeeper 127.0.0.1:2181 --partitions 2

Did it error out? Yup, it’s expected. Let us understand why it’s error!

Replication Factor: The number of times a topic should be replicated

As we are running in local, the 4th step we ran creates one broker and the topic we are trying to create have replication factor as 2, it is throwing error as there is no other broker where this partition can be replicated. So to resolve this, let’s start another couple of brokers(atleast one another).

5. Open terminal, and repeat the below process to create another broker, (We can create other broker by replacing 2 with other number in below commands)

cp config/server.properties config/server1.properties # Create Server properties for new Kafka broker
vi server1.properties
# Edit broker.id and logs.dir, by appending the appropriate broker number, 1 or 2
# port, increase by 1 i.e., from localhost:9092 to localhost:9093
# And Save
bin/kafka-server-start.sh config/server1.properties # Start Kafka broker1

6. Now, execute the create topic command, this will create a topic with 2 partitions which is replicated 2 times.

7. Open new terminal and run below to start producer

sh bin/kafka-console-producer.sh --topic test-topic --broker-list localhost:9092,localhost:9093

8. Open new terminal and run below to start consumer

sh bin/kafka-console-consumer.sh --topic test-topic --bootstrap-server localhost:9092,localhost:9093 --from-beginning --group test-topic-group

With this we are good with the basic Kafka setup.

How are produced records consumed?

Record: Each single set of data sent to producer/received by consumer is a record

Offset: Unique identifier for records

Go to the terminal where we started producer and start sending the records, type data and enter. Open the terminal where we started consumer and we can observe that produced records are getting consumed(printed line by line in terminal)

Let’s make it interesting now, Open a new terminal and start another consumer(same command as above). Come back to producer terminal and start sending records, 1,2,3,4,5,6,7,8,9,10. Open consumer1 terminal, boom..! Some of the records are missing, what just happened…? Well, open consumer2 terminal, we can find the missing records here. We can consume the records in parallel as the both consumers are belonging to same consumer group.

Considering there is only one producer, we have 3 possibilities of partitions in topics and consumers,

  1. Number of partitions = Number of consumers
  2. Number of partitions > Number of consumers
  3. Number of partitions < Number of consumers

In the first case, we each partition records will be consumed by each consumer(above explained example where topic has 2 partitions and we have 2 consumers)

In the second case, when partitions are higher, one/some of the consumers will start consuming records from one/more partitions.

In the third case, when partitions are lower, (Number of consumers minus Number of partitions) number of consumers will be a passive consumer till the consumer which is actively consuming is down.

That is how the producer, topics, partitions and consumers work. We can try out all the above possibilities by creating topics, starting consumers accordingly.

This is my first blog, any review comments or feedback is welcomed. Thank you.

--

--