Getting Started with Apache Kafka

I have started learning Kafka and thought to document my understanding of Apache Kafka as I make some progress. The agenda for this post is

  • What is Apache Kafka?
  • Core Concepts
  • Pre-requisite(s)
  • Installing Apache Zookeeper (commonly called Zookeeper)
  • Installing Apache Kafka (commonly called Kafka)
  • Send message to topic via command-line
  • Receive message by consuming the topic from command-line

For the purposes of installing software, I will be making use of Docker

What is Kafka?

Kafka is a streaming platform that enable applications to publish and subscribe to events (stream of records).

Kafka also acts as a storage system as it stores the records and make them available for processing by applications.

Kafka is fault-tolreant as the data is partitioned across machines and replicated for redundancy.

Core Concepts

In order to work with Kafka, we must understand some of the terms used in this project.

Topics

Topics are channels where the records about a certain interest are published. For example ESPN is a content provider which publishes information about sports. At the same time, there will be many people interested in this information and they will subscribe to ESPN newsletters.

To relate to this analogy, newsletters are topics where information about sports is published and there will be people (applications) consuming this information.

In Kafka there could be multiple topics. For example, multiple applications could be sending infomation in different topics, Sports and Weather. Each topic can be subscribed by multiple subscribers.

Two topic in a Kafka cluster

Kafka stores these records in log files, specifically into structured commit log. This provides high efficiency in writes since they are sequential and happen at the tail of the log

Partitions

Kafka does not persists the records of a topic in one log file. If it does that, it would not be scalable since not all the information about a topic could be persisted on a single machine. So you would think how Kafka stores these records?

Kafka runs in a cluster with multiple machines(standalone is cluster of one machine). Every topic is partitioned and records are persisted in the partitions with a sequential id (called offset)

Topic with partitions P0 and P1.

Each partition has one server that acts as leader and other server who acts as followers and replicate the leader passively. In event of leader failure, one of the follower is elected as leader.

Producers

These are applications which sends the records of interests to some topic.

Consumers

These are applications interested in the records of certain topic.

Consumer Groups

Each consumer is labelled under a consumer group. A record is sent to one consumer instance inside a consumer group.

For example, Mike and Pat are interested in Sports topic and Maria and Jen are intersted in Weather topic.

2 Consumer groups with 2 consumers each

When a record arrives in Sports topic, it will be sent to one of the instances (Mike or Pat) and likewise when a record arrives in Weather topic one of instances (Maria or Jen) will receive it.

Broker

A broker is a running instance of kafka daemon listening for commands.

Now that we have covered some of the players in Kafka ecosystem, let’s get our hands dirty

Pre-requisite(s)

You must have Docker installed and access to command-line terminal.

Installing Zookeeper

Zookeeper is a distributed co-ordination service. Kafka uses Zookeeper to store meta information about Kafka Cluster and Consumer client.

For now, we will be working with standalone zookeeper and not the zookeeper cluster(also known as ensemble)

Open terminal and install zookeeper

✗ docker run --name zk -p 2181:2181 -p 2888:2888 -p 3888:3888 -d zookeeper
Unable to find image ‘zookeeper:latest’ locally
latest: Pulling from library/zookeeper
3690ec4760f9: Already exists 
cfdb77eb56b4: Pull complete
857cbad9cd9a: Pull complete
711263dfc2db: Pull complete
eb4bdb431d73: Pull complete
45d8562ee836: Pull complete
874864a3453a: Pull complete
Digest: sha256:50cfe2c77fe203ab528ffb808f1a8b505db73b1f85aedbc52e4fdde69e2ebfe8
Status: Downloaded newer image for zookeeper:latest
81429f15ed706c034c2652cc5d05b61d792e38f3c94aee17516ba85ffbbf906d
✗ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
81429f15ed70 zookeeper “/docker-entrypoint.s” 8 seconds ago Up 7 seconds 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp zk

As you can see that zookeeper is up and running and we have mapped ports 2181, 2888 and 3888 inside the container to our localhost.
We can also confirm this by doing the following. Telnet to zookeeper on port 2181 and send srvr command. This responds back with the information from zookeeper.

✗ telnet localhost 2181
Trying ::1…
Connected to localhost.
Escape character is ‘^]’.
srvr
Zookeeper version: 3.4.9–1757313, built on 08/23/2016 06:50 GMT
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x0
Mode: standalone
Node count: 4
Connection closed by foreign host.

Installing Kafka

On terminal, in a separate session, we will start a linux container with jre 8.

docker run --link zk --name kafka -it java:jre bash

This will link out zookeeper container with our linux container so that they can talk to each other. Now your terminal prompt would change since you are not in your localhost terminal, but inside a linux container called kafka. Continue with installation with following instructions on kafka container

mkdir -p /usr/local
cd /usr/local
Installing Kafka
docker run — link zk — name kafka -it java:jre bash
mkdir -p /usr/local/; cd /usr/local/
wget http://www-us.apache.org/dist/kafka/0.10.1.0/kafka_2.10-0.10.1.0.tgz
tar -xf kafka_2.10–0.10.1.0.tgz
rm kafka_2.10–0.10.1.0.tgz
mv kafka_2.10–0.10.1.0/ kafka
cd kafka

We now need to tell kafka where our zookeeper is running. We do this by changing the zookeeper.connect property in config/server.properties.

sed “116s/.*/zookeeper.connect=$ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT/” config/server.properties > config/server1.properties

Now, we are ready to start Kafka

bin/kafka-server-start.sh -daemon config/server1.properties

You can follow log/server.log to see that kafka is running successfully.

Send Message

Before we can send message, we need to create a topic. To do this, run

bin/kafka-topics.sh --create --zookeeper $ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT --replication-factor 1 --partitions 1 --topic test

To confirm that the topic is created successfully, run the describe command

# bin/kafka-topics.sh --describe --zookeeper $ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0

Now send the message

# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
I am Sending Secret Message!

Once to run the command, the command-line waits to accept input, one message per line

Receive Message

On a separate terminal session, get inside the kafka container

$ docker exec -it kafka bash
/# cd /usr/local/kafka

To receive the message, run the following command

# bin/kafka-console-consumer.sh --topic test --zookeeper $ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
I am Sending Secret Message!

This concludes our introduction to Kafka. In later posts, we will use Kafka from application to send and receive messages. Stay tuned!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.