Basic Kafka for the Uninitiated

A Visual Learning of What Kafka Is, What Problem It Solves and How We Got to It for Non-Tech folks

Photo by Fabien Martin on Unsplash

This article is split into 4 parts: Intro, Producer details, Consumer details, TLDR+suggestions.

Let’s start with the basics of communication: We have the source of the communication (a producer), the communication itself (a message) and someone to receive and process the communication (a consumer).

Rule: Producer sends messages
Rule: Consumer reads messages

Now, there are many ways in which this communication can work. Below let’s list the top 2

  • The consumer requests the message and producer gives a response. This is called “request-response based” communication.
  • The producer simply keeps producing, and hopes that consumers are reading appropriately. This is called “publish-subscribe based” communication. And Kafka is a pub-sub based communication.

When to use “req-resp based”:

  • the message needs to be produced/consumed at a particular time
  • each message needs to be unique and sent to an individual consumer
  • the communication success needs to be acknowledged by both parties
  • the communication needs to be relatively small and can be slow

Real world example: when I call someone on a phone to get a message (request), I expect to get either a positive (phone picked up) or negative (phone call rejected) response. While it might take longer (waiting for them to pick up, introduce myself etc), I know that the conversation is unique, I want the response immediately, and all parties will know I got the message.

When to use “pub-sub based”:

  • the communication needs to arrive in advance
  • the communication needs to be read by many possible consumers
  • The producer doesn’t need to know that the consumer has consumed the message
  • the communication needs to be large (or in large numbers) and fast

Real world example: when I send an email to someone to send me a message whenever they can. I might not get the response immediately, but I’ll be able to process the message quickly when it arrives, it can be sent to many people at the same time, and not all parties need to know when I read it.

For the rest of this post, I’m going to focus on “subscription based” communication.


About the producer

Now, with our subscription based model above, what if we scale up and increase the number of messages to be communicated? Well, the producer would then create the messages and send each message one after the other to the consumer. The consumer on its end would receive and process each message.

But what if the consumer stops reading for a while? And what if it loses the point when the last message was read? Well, in order to maintain this ordering, we annotate each message with a marker (an offset).

Rule: A message’s order is denoted by it’s offset

But what if we want to talk (publish messages) to multiple consumers at the same time? Well, simply put the messages into a list (a topic). And have the consumer read from the topic.

Rule: A topic is the ordered list of all the published messages

But what if we want to send a whole bunch of messages? That’s fine. But since we don’t have infinite memory, at some point, messages will be deleted from the topic. How we decide what’s deleted is a topic unto itself, beyond the scope of this article.

Rule: A topic has a certain retention time/space. If that is exceeded, messages will start to be deleted

But what if we want to send a whole bunch of messages all at once, and are worried that messages will not be read soon enough, or even that they might be deleted before they’re read? Well, let’s make a bunch of mini lists (a partition), within our bigger list (the topic).

But how do we choose which partition to put the message into? Well, for that, we pick a particular part of the message (the key) that:

  • is in every message
  • acts as an identifier
  • if we took all the messages ever sent in the topic, it’s value would be evenly distributed

For example in these messages

offset 60, message{id: abc; quantity:5;timestamp:1430}
offset 61, message{id: def; quantity:5;timestamp:1431}
offset 62, message{id: def; quantity:3;timestamp:1432}
offset 63, message{id: def; quantity:2;timestamp:1433}
offset 64, message{id: abc; quantity:2;timestamp:1434}
offset 65, message{id: def; quantity:3;timestamp:1435}
offset 66, message{id: abc; quantity:5;timestamp:1436}
offset 67, message{id: abc; quantity:5;timestamp:1437}

The field of “id” works as the best key, because, the timestamp is constantly increasing, and the quantity is consistent now, but can be widely different later.

Rule: Topics can be split into partitions. Messages are put into partitions based on keys.


About the consumer

Now, say we need to create a new consumer. How will it know where to find the topic and the current offsets and where to start reading from? And, where does the topic even sit? The solution to all this is the Kafka Broker.

Rule: Kafka brokers contain topics. Brokers are collected in “broker clusters”.

But what if we need to have multiple consumers on the same topic? Well, we could simply point those consumers to the same topic. But to help each consumer know which offset they need to look at, we create a common collection of consumers (consumer group) that’ll read messages and distribute it between consumers in that group. To note however, each of these consumers are unique, and not a collective outside of being in the same group.


TL;DR + Suggestions

  • Producer creates messages
  • Consumer reads messages
  • Kafka is subscription based communication
  • A message’s order is denoted by its offset
  • A topic is the ordered list of all the published messages
  • A topic will have a certain retention time/space. If that is exceeded, messages will start to be deleted
  • Topics can be split into partitions. Messages are put into partitions based on keys.
  • Kafka brokers contain topics. Brokers are collected in Broker clusters

Additional Reading:

Topics not covered, but can be in additional documentation

  • Consumer group lead and followers
  • East west cluster maintenance
  • Kafka monitoring
  • Offset resetting

Good luck and happy messaging!

Like what you read? Give SKANDA MOHAN a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.