How to process streams with Kafka Streams?

Yağız Demirsoy
hepsiburadatech
Published in
3 min readAug 19, 2017

Step 1 — Setup Kafka with Docker

Kafka Streams is a powerful and easy to use Java library which is a component of Apache Kafka. It is very easy to make distributed stream processing using Kafka Streams. This article will show you how to create a basic data streaming application with Java.

Prerequisites:

  • A Docker installed computer
  • Java 8 (This application uses Java 8 however you can change it to work with older versions)

I used Landoop’s fast-data-dev Docker image to not to spend time on configuring Kafka and Zookeeper. It also contains a GUI where you can visually see topics and data in them. Here is the command for setting up the Docker container. After running the command you can access the GUI at http://localhost:3030:

$ docker run -d \
-p 2181:2181 -p 3030:3030 \
-p 8081-8083:8081-8083 \
-p 9581-9585:9581-9585 \
-p 9092:9092 -e \ADV_HOST=127.0.0.1 \
--name=kafka landoop/fast-data-dev:latest

Now, by using following commands, get into the Docker container and create two topics named messages and processedMessages:

$ docker exec -it kafka /bin/bash
$ cd /bin
$ kafka-topics \
--create \
--zookeeper localhost:2181 \
--replication-factor 1 \
--partitions 1 \
--topic messages
$ kafka-topics \
--create \
--zookeeper localhost:2181 \
--replication-factor 1 \
--partitions 1 \
--topic processedMessages

Step 2 — Create the Kafka Streams Application

Kafka, Zookeeper and topics are ready. Now we can create a Java application with Maven with following pom.xml file. This file contains kafka-streams dependency from Confluent’s Maven repository.

Now create a Java class with main method to implement the stream. Messages come to messages topic will be modified and streamed to processedMessages topic. Following class does the streaming:

Configurations in the file are minimum configurations for a Kafka Streams application to work. Full list of configurations can be found at: http://docs.confluent.io/current/streams/developer-guide.html#required-configuration-parameters

The application id must be same for applications that run distributed and do the same job. I will mention how Kafka and distributed applications behave in my next post.

The boostrap servers config is the Kafka’s broker urls and Serde is the type of object to be serialized or deserialized. Available Serdes can be found here. http://docs.confluent.io/3.0.0/streams/developer-guide.html#available-serializers-deserializers-serdes

Step 3 — Testing the Application

Now lets run the application. After application gets running, we can start to send messages to our messages topic inside our container. Run following commands in the Docker container to send messages to topic:

$ kafka-console-producer --broker-list localhost:9092 --topic messagesthis is a message
this is another message

Now check the topics section of the GUI and find messages topic. You should see the messages we just sent. Now check the messagesProcessedtopic. You should see the messages which are processed by our Kafka Streams application.

this is a message - Processed
this is another message - Processed

As you see, it is very easy with Kafka Streams to create applications that process data as a stream. A queue can be quickly processed using distributed Java applications with Kafka Streams library.

In my next post, I will mention how Kafka and Kafka Streams applications behave when applications are distributed.

References:

Confluent Docs

Confluent Kafka Stream examples on Github

Landoop/fast-data-dev on Github

Image from: https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/

--

--