How to process streams with Kafka Streams?
Step 1 — Setup Kafka with Docker
Kafka Streams is a powerful and easy to use Java library which is a component of Apache Kafka. It is very easy to make distributed stream processing using Kafka Streams. This article will show you how to create a basic data streaming application with Java.
Prerequisites:
- A Docker installed computer
- Java 8 (This application uses Java 8 however you can change it to work with older versions)
I used Landoop’s fast-data-dev Docker image to not to spend time on configuring Kafka and Zookeeper. It also contains a GUI where you can visually see topics and data in them. Here is the command for setting up the Docker container. After running the command you can access the GUI at http://localhost:3030:
$ docker run -d \
-p 2181:2181 -p 3030:3030 \
-p 8081-8083:8081-8083 \
-p 9581-9585:9581-9585 \
-p 9092:9092 -e \ADV_HOST=127.0.0.1 \
--name=kafka landoop/fast-data-dev:latest
Now, by using following commands, get into the Docker container and create two topics named messages
and processedMessages
:
$ docker exec -it kafka /bin/bash
$ cd /bin$ kafka-topics \
--create \
--zookeeper localhost:2181 \
--replication-factor 1 \
--partitions 1 \
--topic messages$ kafka-topics \
--create \
--zookeeper localhost:2181 \
--replication-factor 1 \
--partitions 1 \
--topic processedMessages
Step 2 — Create the Kafka Streams Application
Kafka, Zookeeper and topics are ready. Now we can create a Java application with Maven with following pom.xml
file. This file contains kafka-streams
dependency from Confluent’s Maven repository.
Now create a Java class with main method to implement the stream. Messages come to messages
topic will be modified and streamed to processedMessages
topic. Following class does the streaming:
Configurations in the file are minimum configurations for a Kafka Streams application to work. Full list of configurations can be found at: http://docs.confluent.io/current/streams/developer-guide.html#required-configuration-parameters
The application id must be same for applications that run distributed and do the same job. I will mention how Kafka and distributed applications behave in my next post.
The boostrap servers config is the Kafka’s broker urls and Serde is the type of object to be serialized or deserialized. Available Serdes can be found here. http://docs.confluent.io/3.0.0/streams/developer-guide.html#available-serializers-deserializers-serdes
Step 3 — Testing the Application
Now lets run the application. After application gets running, we can start to send messages to our messages
topic inside our container. Run following commands in the Docker container to send messages to topic:
$ kafka-console-producer --broker-list localhost:9092 --topic messagesthis is a message
this is another message
Now check the topics section of the GUI and find messages
topic. You should see the messages we just sent. Now check the messagesProcessed
topic. You should see the messages which are processed by our Kafka Streams application.
this is a message - Processed
this is another message - Processed
As you see, it is very easy with Kafka Streams to create applications that process data as a stream. A queue can be quickly processed using distributed Java applications with Kafka Streams library.
In my next post, I will mention how Kafka and Kafka Streams applications behave when applications are distributed.
References:
Confluent Kafka Stream examples on Github
Landoop/fast-data-dev on Github
Image from: https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/