Getting Started with Apache Kafka
I have started learning Kafka and thought to document my understanding of Apache Kafka as I make some progress. The agenda for this post is
- What is Apache Kafka?
- Core Concepts
- Installing Apache Zookeeper (commonly called Zookeeper)
- Installing Apache Kafka (commonly called Kafka)
- Send message to topic via command-line
- Receive message by consuming the topic from command-line
For the purposes of installing software, I will be making use of Docker
What is Kafka?
Kafka is a streaming platform that enable applications to publish and subscribe to events (stream of records).
Kafka also acts as a storage system as it stores the records and make them available for processing by applications.
Kafka is fault-tolreant as the data is partitioned across machines and replicated for redundancy.
In order to work with Kafka, we must understand some of the terms used in this project.
Topics are channels where the records about a certain interest are published. For example ESPN is a content provider which publishes information about sports. At the same time, there will be many people interested in this information and they will subscribe to ESPN newsletters.
To relate to this analogy, newsletters are topics where information about sports is published and there will be people (applications) consuming this information.
In Kafka there could be multiple topics. For example, multiple applications could be sending infomation in different topics, Sports and Weather. Each topic can be subscribed by multiple subscribers.
Kafka stores these records in log files, specifically into structured commit log. This provides high efficiency in writes since they are sequential and happen at the tail of the log
Kafka does not persists the records of a topic in one log file. If it does that, it would not be scalable since not all the information about a topic could be persisted on a single machine. So you would think how Kafka stores these records?
Kafka runs in a cluster with multiple machines(standalone is cluster of one machine). Every topic is partitioned and records are persisted in the partitions with a sequential id (called offset)
Each partition has one server that acts as leader and other server who acts as followers and replicate the leader passively. In event of leader failure, one of the follower is elected as leader.
These are applications which sends the records of interests to some topic.
These are applications interested in the records of certain topic.
Each consumer is labelled under a consumer group. A record is sent to one consumer instance inside a consumer group.
For example, Mike and Pat are interested in Sports topic and Maria and Jen are intersted in Weather topic.
When a record arrives in Sports topic, it will be sent to one of the instances (Mike or Pat) and likewise when a record arrives in Weather topic one of instances (Maria or Jen) will receive it.
A broker is a running instance of kafka daemon listening for commands.
Now that we have covered some of the players in Kafka ecosystem, let’s get our hands dirty
You must have Docker installed and access to command-line terminal.
Zookeeper is a distributed co-ordination service. Kafka uses Zookeeper to store meta information about Kafka Cluster and Consumer client.
For now, we will be working with standalone zookeeper and not the zookeeper cluster(also known as ensemble)
Open terminal and install zookeeper
✗ docker run --name zk -p 2181:2181 -p 2888:2888 -p 3888:3888 -d zookeeper
Unable to find image ‘zookeeper:latest’ locally
latest: Pulling from library/zookeeper
3690ec4760f9: Already exists
cfdb77eb56b4: Pull complete
857cbad9cd9a: Pull complete
711263dfc2db: Pull complete
eb4bdb431d73: Pull complete
45d8562ee836: Pull complete
874864a3453a: Pull complete
Status: Downloaded newer image for zookeeper:latest
✗ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
81429f15ed70 zookeeper “/docker-entrypoint.s” 8 seconds ago Up 7 seconds 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp zk
As you can see that zookeeper is up and running and we have mapped ports 2181, 2888 and 3888 inside the container to our localhost.
We can also confirm this by doing the following. Telnet to zookeeper on port 2181 and send srvr command. This responds back with the information from zookeeper.
✗ telnet localhost 2181
Connected to localhost.
Escape character is ‘^]’.
Zookeeper version: 3.4.9–1757313, built on 08/23/2016 06:50 GMT
Latency min/avg/max: 0/0/0
Node count: 4
Connection closed by foreign host.
On terminal, in a separate session, we will start a linux container with jre 8.
docker run --link zk --name kafka -it java:jre bash
This will link out zookeeper container with our linux container so that they can talk to each other. Now your terminal prompt would change since you are not in your localhost terminal, but inside a linux container called kafka. Continue with installation with following instructions on kafka container
mkdir -p /usr/local
docker run — link zk — name kafka -it java:jre bash
mkdir -p /usr/local/; cd /usr/local/
tar -xf kafka_2.10–0.10.1.0.tgz
mv kafka_2.10–0.10.1.0/ kafka
We now need to tell kafka where our zookeeper is running. We do this by changing the zookeeper.connect property in config/server.properties.
sed “116s/.*/zookeeper.connect=$ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT/” config/server.properties > config/server1.properties
Now, we are ready to start Kafka
bin/kafka-server-start.sh -daemon config/server1.properties
You can follow log/server.log to see that kafka is running successfully.
Before we can send message, we need to create a topic. To do this, run
bin/kafka-topics.sh --create --zookeeper $ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT --replication-factor 1 --partitions 1 --topic test
To confirm that the topic is created successfully, run the describe command
# bin/kafka-topics.sh --describe --zookeeper $ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT --topic test
Topic:test PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Now send the message
# bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
I am Sending Secret Message!
Once to run the command, the command-line waits to accept input, one message per line
On a separate terminal session, get inside the kafka container
$ docker exec -it kafka bash
/# cd /usr/local/kafka
To receive the message, run the following command
# bin/kafka-console-consumer.sh --topic test --zookeeper $ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT --from-beginning
Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
I am Sending Secret Message!
This concludes our introduction to Kafka. In later posts, we will use Kafka from application to send and receive messages. Stay tuned!