What is Apache Kafka?

In less than 5 minutes or your time back ;)

Humberto Rendon
Byte-Sized Data
3 min readFeb 3, 2023

--

Minimalist Apache Kafka Logo

Apache Kafka is an open-source distributed event streaming platform. It focuses on producer/consumer architecture. It delivers messages with latencies as low as 2ms. Highly scalable to the point of being able to manage trillions of messages per day. It has durable and fault-tolerant data storage. Its able to connect clusters in different zones to provide high availability.

Minimalist icon that represents flow of information

What is it for?

Kafka is like the body’s nervous system. Its designed for real-time processing with high throughput and low latency, making it ideal for big data applications. Its used for real-time data pipelines and streaming applications.

Minimalist icon of the nervous system

Who is it for?

Usually Kafka is for Data Engineers, Big Data Engineers, DevOps Engineers, Solutions Architects, Software Engineers, Systems Engineers, IT Operations, Site Reliability Engineers, Cloud Engineers, Data Scientists, Data Analysts and Data Administrators.

Minimalist icon representing data workers

How does it work?

Kafka works by using topics where producers send data and consumers receive it. It has a publish-subscribe architecture, meaning that producers can publish data to multiple topics and consumers can subscribe to multiple topics.

Minimalist icons representing the producer/consumer architecture

What does it look like?

  1. Download and extract
$ tar -xzf kafka-3.3.2-src.tgz
$ cd kafka-3.3.2-src

2. Start environment (Must have Java 8+ installed)

$ ### TERMINAL 1 - ZooKeeper ###
$ # I had to build a project first. If you have the same problem do this
$ # ./gradlew jar -PscalaVersion=2.13.8

$ # Start the ZooKeeper service
$ bin/zookeeper-server-start.sh config/zookeeper.properties
$ ### TERMINAL 2 - Kafka Broker ###
$ # Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties

3. Create a topic

$ ### TERMINAL 3 - Client ###
$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092

4. Write some events

$ ### TERMINAL 3 - Client ###
$ bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
> Random chimp event
> Random chimp event 2 Return of Harambe
$ CTRL + C # To quit producer

5. Read some events

$ ### TERMINAL 3 - Client ###
$ bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
Random chimp event
Random chimp event 2 Return of Harambe
$ CTRL + C # To quit consumer

Who uses it?

Kafka is used by some of the largest companies in the world. It is the most popular open-source stream-processing software for collecting, processing, storing, and analyzing data at scale.

Kafka allows organizations to modernize their data strategies with event streaming architecture. It’s used in every industry. From computer software, financial services, healthcare to government and transportation.

Minimalist icons of companies that use Apache Kafka in shades of green

Company Use Cases

  • Pinterest — Powers real-time predictive budgeting system of their advertising infrastructure. Spend predictions are more accurate.
  • Adidas — Integrates source systems. Enables teams to implement real-time event processing (monitoring, analytics & reporting solutions).
  • Trivago — Stream processing of application logs.
  • Uber — Overall infrastructure stack. Powers various online and near realtime use-cases.
  • Spotify — Part of their log delivery system.
  • Netflix — Real-time monitoring and event-processing pipeline

--

--