Apache Kafka for Beginners — Learn Kafka by Hands-On

Rishabh Agarwal
5 min readJul 10, 2022

--

“An ounce of practice is better than tons of theory” — Vishnudevananda Saraswati

To better understand a new technology there is nothing more effective than trying it hands-on. Doing things on your own helps create that link between theory and practice. Learning by doing is the preferred way of many successful engineers and we are going to take a similar approach when learning Apache Kafka.

We have already discussed the minimum basics of Apache Kafka that are required for us to begin using it. If you are unaware of terminologies such as Producers, Consumers, Events, and Topics, I would suggest giving this article a look.

From here on I will assume that all the readers are equipped with knowledge of basic terminologies. But we need to understand one more definition before we can actually begin using Kafka.

ZooKeeper

Apache ZooKeeper

The ZooKeeper is an important part of the Kafka ecosystem. It is responsible for maintaining critical metadata such as cluster information and details of consumer clients. It keeps track of brokers and partitions. It is responsible for managing brokers and topics. It sends notifications to the Kafka server about things such as broker crashes and new topics. Note that no Kafka server can run without the Zookeeper server.

Thus we now have all the knowledge to start our own Kafka cluster on local machines. We begin with the downloading process.

Downloading Apache Kafka

Start with the creation of a new directory. Call it whatever you may like, I am using the name, kafka.

$ mkdir ~/kafka
$ cd ~/kafka

Once we are inside the directory, we will download the latest version of Apache Kafka. We are using wget to download the file, you can manually download it from here. Do not forget to move the file to ~/kafka

$ wget https://dlcdn.apache.org/kafka/3.2.0/kafka_2.13-3.2.0.tgz

The next step is to untar the file and start using its content. The following command untars this downloaded file.

$ tar -xvzf bitcoin-23.0-x86_64-linux-gnu.tar.gz bitcoin-23.0/

After running this command you can see a new directory created. You can also check it through the ls command as shown below.

$ ls
kafka_2.13-3.2.0 kafka_2.13-3.2.0.tgz

With this, we have completed the downloading process. Change the directory to this newly created folder and get ready for the upcoming section.

$ cd kafka_2.13-3.2.0

Starting the Kafka Zookeeper

The very first step in bringing up the Kafka cluster is to start the ZooKeeper. To start the ZooKeeper, Apache Kafka provides us with a bash file bin/zookeeper-server-start.sh. Let us try running that by the following command —

$ bin/zookeeper-server-start.sh
USAGE: bin/zookeeper-server-start.sh [-daemon] zookeeper.properties

The script file does not run and is hinting us at the possible solution. The fix, as suggested, is to provide a zookeeper’s properties file. This file should be passed as a parameter to this script.

Kafka provides us with some default configuration and property files. It also provides us with a property file for ZooKeeper which is present at config/zookeeper.properties. We can check the content of this file and we will find that it contains some properties such as the port at which ZooKeeper runs and the directory at which ZooKeeper stores its metadata.

In the beginning, we can use this configuration without any change. Thus we issue the following command —

$ bin/zookeeper-server-start.sh config/zookeeper.properties

Once we issue this command, our ZooKeeper server should start without any issue. We will leave this shell session open for the rest of this tutorial.

Starting the Kafka Server

The next step is to bring up the Kafka server. Again, Kafka provides us with a shell script to perform this task. The script file for bringing up the server can be found at bin/kafka-server-start.sh. Let us run this script to get up the Kafka server.

$ bin/kafka-server-start.sh
USAGE: bin/kafka-server-start.sh [-daemon] server.properties [--override property=value]*

We again see the need to pass a property file to this script. Fortunately, Kafka has got covered here as well. It provides a default property file which should be just fine for us to begin with. This file is present at config/server.properties. Let us try passing this properties file to the script.

$ bin/kafka-server-start.sh config/server.properties

You will see that the Kafka server will start listening on port 9092.

This completes the setup process. Let us now create a new topic, publish messages to it and consume messages from it.

Creating a New Topic

In this section, we will create our very first topic. We will again use the scripts provided by Kafka to perform this task. To handle various tasks related to topics, Kafka provides us with bin/kafka-topics.sh script file. This script tasks flags and parameters as input to perform various tasks. Issue the following command to create a topic called my-first-topic.

$ bin/kafka-topics.sh --create --topic my-first-topic --bootstrap-server localhost:9092

Note that we pass in the flag create to indicate that we want to create a new topic. We also pass as a parameter the endpoint where the Kafka server is running. This will create a new topic.

Producing Events/Messages to the New Topic

Now is the time to publish some messages on the newly created topic. We will make use of another script file to publish events. This script file can be found under bin/kafka-console-producer.sh. We add some flags and parameters to this script to start the process of message publication.

$ bin/kafka-console-producer.sh --topic my-first-topic --bootstrap-server localhost:9092

After running this command, you will see a cursor (>) on your terminal screen. Start typing and hit enter to publish a message. You can publish as many messages as you want.

> First Event in my topic
> Second Event in my topic

You can either leave the process open or close it by hitting Ctrl + C.

Consuming Events/Messages from the New Topic

Once we have published messages on the new topic, we can create consumers to consume messages from it. Kafka provides us with a script file to create a consumer. This script file can be found at bin/kafka-console-consumer.sh. Let us run this script by passing in appropriate flags and parameters.

$ bin/kafka-console-consumer.sh --topic my-first-topic --from-beginning --bootstrap-server localhost:9092
First Event in my topic
Second Event in my topic

As soon as you will run this command, the consumer will read events from the topic and will print them on your screen. You can try publishing more messages on the topic and they all can be seen in the output of the consumer.

Thus you have completed an end-to-end example of running Kafka. You can now spin up Kafka ZooKeeper, and Kafka Server, create new topics and produce and consume from them. Many more permutations can be made by several producers publishing and several consumers consuming in from the same topic. Readers are encouraged to go ahead and try some of these.

This concludes the article, I really hope you liked it. Continue your learning path by reading this article and if you would like to not miss more articles coming up then do follow me. You can also find me on Twitter.

--

--

Rishabh Agarwal

Software Engineer | Loves to write about Programming, Technology, and Mathematics!