Kafka Unleashed: A Comprehensive Guide to Setting Up Kafka

Manan Patadiya
Simform Engineering
8 min readOct 12, 2023

Discover Apache Kafka: Streamline data, master installation, and configure with ease.

In today’s data-driven world, real-time data processing has become a critical component for many applications. In response to this demand, Apache Kafka has emerged as the go-to solution for managing and processing data streams at scale. It is a robust and open-source stream processing platform.

In this blog, we will discuss the fundamental concepts of Apache Kafka, its pros and cons, when and why to use it, and the steps to configure it in the system. By the end of this journey, you’ll be well-equipped to use Apache Kafka to its full potential, from understanding the theory to configuring it within your system.

So, let’s start with basic information about Kafka and its benefits.

What is Kafka?

Apache Kafka is a powerful and versatile open-source stream-processing platform and messaging system. It is designed to handle real-time data streams efficiently, making it an essential tool in the modern data-driven landscape.

Kafka serves as a messaging platform that operates on a publish/subscribe model, equipped with inherent features for replication, partitioning, fault tolerance, and enhanced throughput. It’s particularly beneficial for applications requiring extensive data processing on a grand scale. The primary application of Kafka lies in constructing live-streaming data pipelines. Through the integration of fault-tolerant storage and stream processing functionalities, Kafka facilitates the storage and examination of both historical records and live data streams.

Advantages of using Kafka

Let’s check out the key benefits of using Kafka in your projects.

High throughput
High throughput refers to the ability to process many messages within a certain time frame. Apache Kafka can handle large incoming messages efficiently, handling around 10,000 messages per second or a maximum request size of one million bytes per request, whichever occurs first.

Scalability
Apache Kafka is highly scalable. It supports efficient sequential writes and separates topics into partitions to facilitate highly scalable reads and writes. This helps Kafka enable multiple producers and consumers to read and write simultaneously. Also, because Kafka works in a distributed manner, you can expand its capacity by introducing new nodes to the cluster.

High reliability
Apache Kafka is also very popular for its strong reliability and is designed to quickly recover from any problems. Kafka can copy and share data with multiple recipients. What makes it even more robust is that messages remain accessible even after they’ve been used. This means Kafka’s sender and receiver can operate at different times, making the system more reliable and resilient against issues.

Low latency
Low latency in Apache Kafka means messages are processed quickly, reducing the time it takes for data to move. This is crucial for high-speed data processing.

Real-Time Streaming
Kafka excels at handling real-time data streams, making it perfect for applications that require instant updates, like tracking the location of delivery trucks or monitoring IoT devices.

Durability
Kafka stores data securely on disk, like a reliable vault. This means your information is safe, even if something unexpected happens, like a power outage or a system crash.

Why choose Kafka over other messaging systems?

Kafka sets itself apart from conventional messaging queues through many distinct characteristics. Notably, Kafka preserves messages even after consumption, whereas its counterpart, RabbitMQ, promptly removes messages once they are consumed.

Unlike RabbitMQ’s approach of pushing messages to consumers, Kafka employs a pulling mechanism to retrieve messages.

Kafka’s horizontal scalability contrasts with the vertical scalability of traditional messaging queues.

When not to use Kafka:

While Kafka is a powerful messaging system, there are scenarios where it may not be the best choice. Here are situations when you might want to consider alternatives:

  1. Simple applications: If you’re building a straightforward application with minimal data exchange needs and low data volumes, Kafka’s complexity may outweigh its benefits. Simpler messaging systems or direct HTTP-based communication might be more suitable.
  2. Limited resources: If you have limited resources, both in terms of hardware and personnel, setting up and managing a Kafka cluster can be resource-intensive. In such cases, a lighter messaging system might be a more efficient choice.
  3. Small-scale projects: For small-scale projects or prototypes where the overhead of configuring and maintaining Kafka outweighs the advantages, opting for simpler messaging solutions is a practical decision.

What are the components of Kafka architecture?

Producer: The producer in Kafka is responsible for sending data or messages to Kafka topics. It plays a crucial role in initiating data flow within the Kafka ecosystem, allowing various data sources to publish information that can be processed by consumers.

Broker: Kafka brokers are the core servers that manage the storage, retrieval, and distribution of messages. They handle tasks like message persistence, partitioning of data, and replication across nodes to ensure fault tolerance and reliability.

Topic: Topics act as message categories in Kafka, where producers publish records. Each topic represents a stream of related data, providing an organizational structure that simplifies data management and consumption.

Partition: Kafka topics are divided into partitions, allowing for parallel data processing. Partitions enable horizontal scalability by distributing the data load across multiple brokers, enhancing both performance and fault tolerance.

Consumer: Consumers retrieve and process messages from Kafka topics. They enable downstream applications to react to real-time data, making them fundamental for building data-driven applications, analytics, and more.

Offset: Offsets are unique identifiers for messages within a partition. Consumers use offsets to keep track of their progress in reading messages, allowing them to resume from where they left off even after restarts.

Zookeeper: Although not a part of Kafka itself, Zookeeper is often used to manage and coordinate Kafka brokers. It maintains metadata about brokers, topics, and partitions, ensuring proper synchronization and configuration management.

How to configure Apache Kafka in our system

We will need the items listed below to be installed in our system to configure Kafka.

  • Apache Kafka
  • Java Runtime Environment (JRE)

You can download the above tools from given links:

1. Apache Kafka — Download Kafka

2. JRE — Download Java for Windows

Firstly, please download the Apache Kafka setup file from the above link. It will be stored in the Download folder on your PC. Now, extract the zip file and rename the folder as “kafka”. And for easy access, move the folder to C drive.

Now follow the steps outlined below:

  1. Go to the config directory on your computer. In my system, it is C:\kafka\config.
  2. Open the server.properties file in a text editor like Notepad.
  3. Find and replace the line log.dirs=/tmp/kafka-logs with log.dirs= C:/kafka/kafka-logs.
Configuring Kafka: Updating log directory to C:/kafka/kafka-logs

4. Save the changes and close the server.properties file.

5. Now, let’s follow the same steps for zookeeper.properties.

6. Open the file zookeeper.properties.

7. Find and replace the line dataDir=/tmp/zookeeper with dataDir=C:/kafka/zookeeper-data.

Configuring Kafka: Setting data directory to C:/kafka/zookeeper-data for enhanced coordination

8. Save and close the file.

Now switch to the “kafka“ folder, open the Command prompt there, and run the following command to start zookeeper:

.\bin\windows\zookeeper-server-start.bat .\config\zookeeper.properties

Once this command executes successfully, you will get the output as shown below:

Zookeeper: Smooth startup on Windows

Now open another command prompt in the same location and run the following command to start Kafka:

.\bin\windows\kafka-server-start.bat config\server.properties

Start Kafka server with a simple command

Once it executes successfully, it will look like the above image.

Now, let’s create Topic(s).

To create a topic, please use the following steps:

Go to C:\kafka\bin\windows, open a new command prompt, and run the below command:

kafka-topics.bat — create — bootstrap-server localhost:9092 — replication-factor 1 — partitions 1 — topic Test

create’: It shows that you want to create a new topic.

bootstrap-server localhost:9092’: It specifies the Kafka broker(s) where metadata information is stored.

replication-factor 1’: It sets the replication factor for the topic. In this example, it’s set to 1, meaning there is only one copy of each message in the topic.

partitions 1’: It specifies the number of partitions for the topic. In this case, there is only one partition.

topic Test’: It specifies that we want to create a topic named ‘Test’.

Creating Kafka Topic: ‘kafka-topics.bat -create’ command establishes ‘Test’ topic.

Execute this following command and our producer will be ready to send a message:

kafka-console-producer.bat — broker-list localhost:9092 — topic Test

Now, to receive a message from a consumer, we have to create a consumer topic too, as mentioned below:

Open a new command prompt in the same directory and run the below command:

kafka-console-consumer.bat — topic Test — bootstrap-server localhost:9092 — from-beginning

kafka-console-consumer.bat’: This is the script used to launch the Kafka console consumer on Windows.

topic Test’: Specifies the name of the Kafka topic you want to consume messages from.

bootstrap-server localhost:9092’: Specifies the Kafka broker(s) to connect to. In this example, it’s set to “localhost:9092,” which is the default address for a Kafka broker running on the local machine.

from-beginning’: This option tells the consumer to start consuming messages from the beginning of the topic. If you omit this option, the consumer will only consume messages that are produced after it starts running.

Now you can send a message from the producer setup command prompt, and it will be consumed by the consumer command prompt as mentioned in the below image.

Consuming Kafka Messages: Use ‘kafka-console-consumer.bat’ to read messages from the ‘Test’ topic.

Conclusion

This blog has covered everything you need to know about Apache Kafka. Starting with introducing Kafka’s fundamental concepts, we explored its core components, discussing the advantages and disadvantages of using Kafka in various scenarios. To help you get started, we provided a step-by-step walkthrough on how to download Kafka and configure it on your system. If you’re a beginner, this blog will also help you with the knowledge to harness the power of Apache Kafka for your data streaming needs.

For more updates on the latest tools and technologies, follow the Simform Engineering blog.

Follow Us: Twitter | LinkedIn

--

--