Apache Kafka: Distributed Event Streaming Platform

An introduction for beginners to Kafka

Manthan Gupta
TheLeanProgrammer
4 min readSep 20, 2021

--

What is Publish/Subscribe Messaging?

Before diving into Kafka, it’s crucial to get some basics right and pub/sub messaging is one of them. In a pub/sub model, any message published to a topic is immediately received by all its subscribers via a broker. The publisher somehow classifies the message that subscribers receive if they are subscribed to that class.

There are multiple use-cases to pub/sub messaging, but the most popular one would be the chat application. When a user sends a message to a chat room, it gets published on the chat room topic. Subscribers or the member of the chat room receive the message.

Source: AWS

What is event streaming?

In layman's terms, event streaming is nothing but capturing data in real-time from event sources like sensors, mobile devices, databases, cloud services, and software applications in the form of streams of events.

These events are stored durably for all types of playing around with the data. It is an implementation of pub/sub architecture with its twist, ensuring the right information is present at the right place and right time.

Event streaming has a plethora of use-cases across industries and companies. But some of them include processing financial transactions in real-time, track real-time shipments, capture data from IoT devices and analyze them, etc.

Source: Confluent

But what is Apache Kafka?

According to the official website, Apache Kafka is an open-source distributed event streaming platform used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Kafka provides three main functions:

  1. Publish and subscribe to a stream of events
  2. Durable and reliable storage of stream of events
  3. Processing of streams in real-time

And all this functionality is provided in a distributed, fault-tolerant, elastic, secure manner, and highly scalable.

This comes in handy when creating a data pipeline that tracks user activity in real-time. Kafka here is used to ingesting and store streaming data. Kafka is also often used as a broker that processes and mediates communication between two applications.

Source: Confluent

The Architecture

An event means that something has happened in the scope where it’s listening. Producers are clients that publish events to Kafka, and consumers are those that subscribe to these events. These both are fully decoupled in Kafka that helps it achieve high scalability.

Events are stored durably and reliably in topics. Topics can be thought of as a folder and events as files inside the folder. Topics are multi-producer and multi-subscriber. A topic can have 0 to many producers that write events to it as well as 0 to many subscribers to the topic.

A topic is spread over several different Kafka brokers, which is very important for scalability as it allows read/write from multiple brokers at the same time.

To make the data highly available and fault-tolerant, the topic is replicated across geo-regions and data centers.

Source: Apache Kafka

The blog works as just a primer if you want to start with Kafka. If you want to dive deep into this then I recommend you to refer to the official documentation of Apache Kafka.

If you liked what you saw and read then don’t forget to click on that clap button and support me. You can reach out to me on LinkedIn and Twitter to tell me your thoughts about the blog or just to connect.

References

Don’t forget to follow The Lean Programmer Publication for more such articles, and subscribe to our newsletter tinyletter.com/TheLeanProgrammer

--

--