Apache Kafka: A Basic Intro

Ahmed Gulab Khan
Nerd For Tech
Published in
5 min readFeb 26, 2022

--

Image source

In this blog post I’ll be giving a brief and basic introduction regarding Apache Kafka and the terminology that would be necessary to know in order to get started with Kafka.

Kafka — What is it?

In a nutshell, Kafka is a distributed system that allows multiple services to communicate with each other via its queue based architecture. With that out of the way, let’s get to know about some basic Kafka terminology. Let’s get started ;)

Broker: A Broker is a server which has Kafka running on it and is responsible for the communication between multiple services. Multiple brokers would form a Kafka cluster.

Event: The messages that are produced to or consumed from the Kafka broker are called events. These messages are stored in the form of bytes in the broker’s disk storage.

Producer and Consumer: The services that produce these events to Kafka broker are referred to as Producers and those which consume these events are referred to as Consumers. It could also be possible that the same service can both produce and consume messages from Kafka.

Topic: In order to differentiate the type of events stored in Kafka, topics are used. In short, a topic is like a folder in a file system where only events or messages related to a specific type are stored. For example: “payment-details”, “user-details”, etc.

Partition: A topic can be further divided into partitions in order to attain higher throughput. It is the smallest storage unit which holds a subset of data of a topic.

Replication Factor: A replica of a partition is a backup of that partition. The replication factor of a topic decides how many replicas of a partition in that topic should be maintained by the Kafka cluster. A topic with partition as 1 and replication factor as 2 would mean that two copies of the same partition with same data would be stored in the Kafka cluster.

Offset: To keep a track of which events have already been consumed by the consumer, an index pointing to the latest consumed message is stored inside Kafka, this index is called the offset and helps keep a track of which events have already been consumed by the consumer. So if a consumer were to go down, this offset value would help us know exactly from where the consumer has to start consuming events. A producer producing messages to a kafka topic with 3 partitions would look like this:

Image source

Zookeeper: Zookeeper is an extra service present in the Kafka cluster that helps maintain the cluster ACLs, , stores the offsets for all the partitions of all the topics, used to track the status of the Kafka broker nodes and maintain the client quotas (how much data a producer/consumer is allowed to read/write).

Consumer Group: A bunch of consumers can join a group in order to cooperate and consume messages from a set of topics. This grouping of consumers is called a Consumer Group. If two consumers have subscribed to the same topic and are present in the same consumer group, then these two consumers would be assigned a different set of partitions and none of these two consumers would receive the same messages. Consumer Groups can help attain higher consumption rate, if multiple consumers are subscribed to the same topic.

I have talked in more detail about Kafka Partitions and Consumer Groups in this article, be sure to check it out.

How are events produced and consumed?

Let’s consider two cases in order to better understand this

Scenario 1:

In the first scenario, let’s consider that we have 1 Kafka broker consisting of 1 topic (say Topic-A with partition count and replication factor both as 1), 1 producer and 1 consumer in our Kafka cluster as depicted below

Consider that our producer sends events (either synchronously or asynchronously) to Topic-A of the Kafka broker. Since there is only one partition, all the messages sent to the same topic are stored in the Kafka queue and keep appending to the queue for every new event produced.

The consumer subscribes to the Topic-A, and consumes all the events from the Kafka broker in the same order that they have been produced.

Scenario 2:

Now, if we consider the same case as above but let’s say that our topic (Topic-A) has 2 partitions instead of 1 as before. The events sent from producer to Topic-A would be sent to these 2 partitions in such a way that no two partitions would receive the same event. And the consumer would receive the events from the partitions that it is assigned.

List of common Q&A’s

Q. Does Kafka push messages to the consumer?

A. No, Kafka consumers follow a pull strategy instead of a push strategy. This means that the consumer is the one that is responsible to initiate requests to the Kafka broker in order to retrieve messages.

Q. Do the messages in Kafka get deleted once consumed?

A. The messages stored in Kafka are not deleted once consumed, but are deleted by either of the below approaches:

  • Messages are deleted only after a certain time period is reached (time based retention) or
  • When a max message size for a partition is reached (size based retention)

Q. Are the same messages sent to partitions of the same topic?

A. No two partitions would receive the same messages from a producer

Q. Does consumption from different partitions of the same topic happen in the same order they are sent by the producer

A. The order of messages sent by the producer to the individual partitions is guaranteed by Kafka, but if a consumer consumes the partitions of this topic, the messages consumed by these different partitions are not in the same order that they are produced by producer

If you want to know about how partitions and consumer groups work in Kafka, you can go through the next article which is Kafka Partitions and Consumer Groups in 6 mins

More Kafka articles in the series that you can go through

  1. Kafka Partitions and Consumer Groups in 6 mins
  2. 3 Simple steps to set up Kafka locally using Docker

Follow for the next Kafka blog in the series. I shall also be posting more articles talking about Software engineering concepts.

You can also find me on
GitHub
Dev.to

--

--