A Brief Introduction to Apache Kafka

A little overview of the Apache Kafka platform

Dario De Santis
Javarevisited

--

Photo by Joshua Sortino on Unsplash

During the last years, technologies for building real-time data pipelines and event streaming apps have emerged, promoting also the horizontal scalability and the fault tolerance of a system. One of these technologies is Apache Kafka.

Introduction

Apache Kafka is an open-source distributed streaming platform developed initially by LinkedIn and donated to the Apache Software Foundation. The project, written in Scala and Java, aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. By definition, a streaming platform has 3 key capabilities :

  • Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system
  • Store streams of records in a fault-tolerant durable way
  • Process streams of records as they occur

In general, Kafka is used to building real-time event streaming applications. To understand how Kafka works, let’s see some basic concepts :

  • Kafka is run as a cluster on one or more servers that can span multiple datacenters
  • the Kafka cluster stores stream of records in categories called topics

--

--

Dario De Santis
Javarevisited

Software Architect, writing about Java, Spring, Microservices, Kubernetes and Cloud-native programming. Editor for Javarevisited.