Apache Kafka and its Components
Apache Kafka is one of the most widely used open-source distributed event streaming platform originally developed at LinkedIn and was later open sourced in early 2011.
Why do we need Kafka?
As in the below image, we could see there are so many connections which could be complex in maintaining and sending data across different platforms.
With the implementation of Kafka, the below is how it would look like at a high level.
How does it work?
At a high level this is how Kafka works.
It is a pub/sub model wherein messages/events are sent from publisher to consumer/subscriber via a message broker.
Apache Kafka Architecture and Components
The components of Kafka are as below.
Producer
Consumer
Broker
Cluster
Topic
Partitions
Offset
Consumer Groups
ZooKeeper
Explanation of the components
The producer is responsible for sending messages/events via the broker which is nothing but a server. The broker helps in message exchanges between a producer and a consumer.
There can be one or more brokers in the Kafka cluster.
The term topic in Kafka is used to segregate different type of events or messages which makes it easy for different subscribers/consumers to subscribe to particular topic which they would like to consume.Listeners can then respond to the messages that belong to the topics they are listening on.Each Kafka topic is broken into multiple partition. This helps in handling huge amount of data and also helps in performance. Even if one of the partition goes down, the other partitions would be available to handle the load without any application down time.We decide the partitions during creation of kafka topic.
When producer sends a message, the message goes into any of the partition.We don't have control over it. It works on the round robin principle.
In Kafka, a sequence number is assigned to each message in offset partition of a kafka topic. This sequence number is called Offset.This is used to track the messages being consumed.
Below image describes how kafka offset would help during consumption failure.
Multiple consumers that are grouped into a single unit by specifying a group name is called consumer groups. This helps in dividing work load to each and every consumer to achieve better throughput.Any consumer can talk to any partition.We dont have control on the order of partition vs the consumer that consumes from the partition.
Zookeeper is a prerequisite of Kafka. It is used for coordination and to track the status of Kafka cluster nodes. It also keeps track of Kafka topics partitions, offsets,etc.
Conclusion
I have covered the importance of Apache Kafka and information related to each of its components.
What’s next? I will write a blog to set-up Apache Kafka and also provide an example of Producer and Consumer using Springboot 3 , along with how to write E2E Integration tests with TestContainers.
Thanks for reading! See you soon with another blog :)