Starting with Apache Kafka

4 min readAug 2, 2021

Event Streaming

If something happened, we can say that an event has occcured. So, when an event occurs, consequentely it generates data. Event streaming consists in capturing event data as soon the event occurs, storing and processing these events.

It can be applied to many use cases, such to process real-time financial transactions and to track a fleet of cars, for example.

Kafka

According to the Apache Kafka documentation, Kafka is a distributed event streaming platform. But, what does it mean? In essence, Kafka structure is composed by a set of servers and clients which can be stored in a single machine, or can be spread on multiple servers, virtually working like a single server, and communicating through TCP network protocol. Kafka uses this structure to publish/subscribe, to process and to store streams of events, in a distributed, scalable fault tolerant and secure way.

Kafka components

Servers

They can be brokers or Kafka Connect. Brokers are the storage layer and Kafka Connect is a tool for data streaming between Apache Kafka and other systems such APIs, databases or other Kafka clusters.

Clients

They allow that producers(publishers) and consumers(subscribers) to be created in APIs and microservices. There are clients for many programming languages.

Zookeeper

Kafka uses zookeeper to manage the cluster. Zookeeper coordinates the brokers/cluster structure

Kafka concepts

Kafka contains a few basic and important concepts that we need to understand before to get the hands dirty

Event -> record of a thing that happened. In Kafka, events are objects composed by:
- Event key -> is a number randomly generated by the producer by default, or can be custom.
- Event timestamp -> generated by producer in event recording. It can be custom, but why do this?
- Event value -> is the event body. It can be a number, a string, a JSON or anything.
Producers -> are the client applications responsible to publish the events in a Kafka topic.
Consumers -> are the client applications that subscribe to one or more topics.

Important: Producers and consumers are totally decoupled of each other. A producer can publish events even with no consumer active. In this case, when each consumer inactive becomes active, it will receive all the records that were published when it was down. Likewise, consumers still running when there is no producers working.

Topic -> a topic is like a folder in file system. Topics are responsible to store the events. Unlikely the traditional messaging systems, events are not deleted when consumed. It will be stored by a predefined retention time.
- Partitions -> topics can be spread in many “buckets” located on different kafka brokers, assuring high scalability, because it allows client applications to write and read to/from many brokers at same time. Events with same event key are written to the same partition. Any consumer will read partition’s events in exactly same order as they were written.

The P1-P4 represents partitions of a topic. The colors represents the event keys. Same keys on the same partition.

Here we have the basic concepts to start to play with Kafka. Let’s go!

Hands-on

Apache Kafka installation

There are two ways to run Kafka, by conventional installation or using docker containers with docker-compose. In this tutorial, I’ll use the second option.

Step by step

First, you must have docker and docker-compose installed in your machine. Download this docker-compose.yml file and open a terminal in the same directory where the docker-compose file is located. So, run docker-compose up -d, and wait for docker to download the zookeeper and kafka images and create the containers for you. After this, run docker-compose ps to assuring that containers are running, like the image below:

Now, run this command to see the images that is running. Result must be like the image below:

docker ps --format “{{.ID}}: {{.Command}}”

Now, we can enter the kafka container and run commands. Note that in result of the command in the image above, we have two lines. In first line, we have id and name of the kafka container. Run docker exec -it {docker-id} /bin/bash to enter in kafka container. Now, we can create a kafka topic, using the command:

kafka-topics --create --topic my_test_topic --partitions 3 --replication-factor 1 --bootstrap-server localhost:9092

Note that we just created a topic with name my_test_topic, with 3 partitions and replication-factor 1, that is, without replication. Now, create a kafka producer with the command:

kafka-console-producer --topic my_test_topic --bootstrap-server localhost:9092

Open a new terminal window, and run docker exec command again

docker exec -it {container-id} /bin/bash

An run the command below to create a kafka consumer

kafka-console-consumer --topic my_test_topic --bootstrap-server localhost:9092

Now, we can turn to terminal window where kafka producer is running, write messages and read it on terminal window where kafka consumer is running.

Conclusion

In this post, I explained the basic concepts about Apache Kafka and showed you how to create a simple Kafka topic, a Kafka producer and a Kafka consumer, then we sent and read messages on producer/consumer, like a conventional message system. There are many other ways to use Kafka. In the next post, I’ll show how to develop a Spring Boot Application with Spring Kafka. See ya!