A simple understanding of Kafka

Kafka is basically an event queue system licensed by Apache. Every thing that happens in the world is an event. Kafka works on pub-sub mechanism (publish and subscribe). Kafka stores its data in topics. A topic is an immutable stream of records. Applications which publish their content to different topics are called producer applications or simply producers. Applications that subscribe to a topic and consumes its contents are called consumer applications or simply consumers.

Consider a scenario where one application (say a database server) requires data from another application. The data is supplied. But in case if hundreds of applications require data from the same producer application, then it is not possible for that application to provide data to all hundred consumer applications. Here Kafka comes into play.

Unlike conventional queue where data once read will be lost, Kafka will have all the data in its topics until a validity period and any number of consumers can consume data from the topics within the period.

Kafka can be downloaded from the official Kafka website https://kafka.apache.org/downloads . Download from the “Binary downloads” and extract it. Kafka can also be downloaded from the confluent platform https://www.confluent.io/download/. Confluent comes as a package along with other services like KSQL, Kafka connectors, Schema registry etc.
Lets take a look into Kafka’s architecture. A single instance of kafka server is called a broker. Many Kafka brokers form a Kafka cluster. The brokers in Kafka are maintained by Apache Zookeeper. In latest versions of Kafka, Zookeeper also comes along with Kafka so that separate installation is not required.
START ZOOKEEPER:
First, start the Zookeeper service before starting Kafka. Zookeeper runs on port 2181 by default. This article gives the commands to start the services both from Kafka official download and Kafka confluent download.
Kafka official download :
./bin/zookeeper-server-start.sh ./config/zookeeper.properties
Kafka confluent download :
./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties
START KAFKA SERVER:
Once Zookeeper is started, Kafka server has to be started. The commands are
Kafka official download :
./bin/kafka-server-start.sh ./config/server.properties
Kafka confluent download :
./bin/kafka-server-start ./etc/kafka/server.properties
CREATE A TOPIC:
The next step is the creation of topic. Before that, it is required to know about the structure of topics. Topics have partitions. Each partition is divided into offsets which holds the data ( just like the index of an array, where partition is the array and offset is the index). Consumers read from the different partitions using this offset. There can be any number of partitions in a topic and the partition can be of any size. But a partition must fit completely in the server. There is no ordering of messages within different partitions, but within a partition, messages are ordered by offsets.

The next concept is replicas. This is the failover mechanism. Each partition can be replicated and the replicas are contained in different servers. But only one partition can be elected as a leader at a time and all read writes can happen in the leader partition and the other replicas will just replicate the messages in the leader partition. The command for topic creation is as follows.

Kafka official download :
./bin/kafka-topics.sh — create — zookeeper localhost:2181 — replication-factor 1 — partitions 1 — topic <topicname>
Kafka confluent download :
./bin/kafka-topics — create — zookeeper localhost:2181 — replication-factor 1 — partitions 1 — topic <topicname>
The number of partitions and replication factors can be specified in the command line configuration itself.
COMMAND LINE PRODUCER:
Kafka has a core producer API which can be used to publish messages to a topic. Kafka server runs on port 9092 as default. The commands used are
Kafka official download :
./bin/kafka-console-producer.sh — broker-list localhost:9092 — topic <topicname>
Kafka confluent download :
./bin/kafka-console-producer — broker-list localhost:9092 — topic <topicname>
Once a producer is created, messages can be published through the producer to a topic.
COMMAND LINE CONSUMER:
To consume the messages in a topic, consumer API is provided by Kafka. The commands used for consuming are
Kafka official download :
./bin/kafka-console-consumer.sh — bootstrap-server localhost:9092 — topic <topicname> — from-beginning
Kafka confluent download :
./bin/kafka-console-consumer — bootstrap-server localhost:9092 — topic <topicname> — from-beginning
Consumers can be in consumer groups too. Please refer this link for further configuration details. https://kafka.apache.org/quickstart.
FIND CONSUMER OFFSET:
The offset can be found using the following command.
./bin/kafka-run-class.sh kafka.tools.GetOffsetShell — broker-list localhost:9092 — topic <topicname> — time -1
These are the basic operations in Kafka. In comparison to most messaging systems like ActiveMQ and RabbitMQ, Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications. A lot more stuff can be done with Kafka, which can be deeply studied with the official documentation from the official website. https://kafka.apache.org/.
