Apache Kafka and its Components

Karthik Kumar B
4 min readSep 22, 2023

--

Apache Kafka is one of the most widely used open-source distributed event streaming platform originally developed at LinkedIn and was later open sourced in early 2011.

Why do we need Kafka?

As in the below image, we could see there are so many connections which could be complex in maintaining and sending data across different platforms.

Without Kafka

With the implementation of Kafka, the below is how it would look like at a high level.

With Kafka implementation

How does it work?

At a high level this is how Kafka works.

It is a pub/sub model wherein messages/events are sent from publisher to consumer/subscriber via a message broker.

Apache Kafka Architecture and Components

The components of Kafka are as below.

Producer

Consumer

Broker

Cluster

Topic

Partitions

Offset

Consumer Groups

ZooKeeper

Explanation of the components

The producer is responsible for sending messages/events via the broker which is nothing but a server. The broker helps in message exchanges between a producer and a consumer.

Producer-Consumer

There can be one or more brokers in the Kafka cluster.

Kafka Cluster

The term topic in Kafka is used to segregate different type of events or messages which makes it easy for different subscribers/consumers to subscribe to particular topic which they would like to consume.Listeners can then respond to the messages that belong to the topics they are listening on.Each Kafka topic is broken into multiple partition. This helps in handling huge amount of data and also helps in performance. Even if one of the partition goes down, the other partitions would be available to handle the load without any application down time.We decide the partitions during creation of kafka topic.

Kafka Partitioning

When producer sends a message, the message goes into any of the partition.We don't have control over it. It works on the round robin principle.

In Kafka, a sequence number is assigned to each message in offset partition of a kafka topic. This sequence number is called Offset.This is used to track the messages being consumed.

Kafka Offset

Below image describes how kafka offset would help during consumption failure.

Consumer Failure/Consumer Application being down
After the consumer application became up and running.

Multiple consumers that are grouped into a single unit by specifying a group name is called consumer groups. This helps in dividing work load to each and every consumer to achieve better throughput.Any consumer can talk to any partition.We dont have control on the order of partition vs the consumer that consumes from the partition.

Consumer Group

Zookeeper is a prerequisite of Kafka. It is used for coordination and to track the status of Kafka cluster nodes. It also keeps track of Kafka topics partitions, offsets,etc.

Zookeeper

Conclusion

I have covered the importance of Apache Kafka and information related to each of its components.

What’s next? I will write a blog to set-up Apache Kafka and also provide an example of Producer and Consumer using Springboot 3 , along with how to write E2E Integration tests with TestContainers.

Thanks for reading! See you soon with another blog :)

--

--

Karthik Kumar B

A passionate Senior Quality Engineer with a demonstrated history of working experience in the information technology sector.