A Brief Introduction to Apache Kafka

A little overview of the Apache Kafka platform

Published in

Javarevisited

7 min readJul 22, 2020

During the last years, technologies for building real-time data pipelines and event streaming apps have emerged, promoting also the horizontal scalability and the fault tolerance of a system. One of these technologies is Apache Kafka.

Introduction

Apache Kafka is an open-source distributed streaming platform developed initially by LinkedIn and donated to the Apache Software Foundation. The project, written in Scala and Java, aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. By definition, a streaming platform has 3 key capabilities :

Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system
Store streams of records in a fault-tolerant durable way
Process streams of records as they occur

In general, Kafka is used to building real-time event streaming applications. To understand how Kafka works, let’s see some basic concepts :

Kafka is run as a cluster on one or more servers that can span multiple datacenters
the Kafka cluster stores stream of records in categories called topics

A Brief Introduction to Apache Kafka

A little overview of the Apache Kafka platform

Introduction

Written by Dario De Santis