Your First Data Pipeline Using Kafka

In this article we are going to build our very basic data pipeline using Kafka.

In my another article I have already discussed how to setup Apache Kafka.

If you are working on large amount of data & using Kafka probably you are using a stream processing framework like samza, spark for processing the data from Kafka producer. After the desired operation on Kafka topic’s data it will be dumped to a sink.

But here we are using Kafka’s own stream processing library instead of using a full featured framework which is available after Kafka 0.10 release.

Rough Diagram Of our Implementation.

REST Proxy implementation of Kafka from is used for exposing the Kafka producer. REST Proxy and the services depends on: ZooKeeper, Kafka, and the Schema Registry.

$ ./bin/zookeeper-server-start ./etc/kafka/ &
$ ./bin/kafka-server-start ./etc/kafka/ &
$ ./bin/schema-registry-start ./etc/schema-registry/ &
$ ./bin/kafka-rest-start ./etc/kafka-rest/ &

Kafka topic will be written by a REST API.

curl -X POST \ 
-H "Content-Type: application/vnd.kafka.json.v1+json" \
-d '{"records":[{"key":"12354", "value":{"foo6":"bar1"}}]}' \

Now this data will be processed by Kafka stream process API. In this post we will print the data from Kafka topic to console.

$ git clone
$ cd examples/kafka-streams/src/main/java/io/confluent/examples/streams/

Now append below snippet in before stream starts.

textLines.foreach(new ForeachAction<String, String>() {
public void apply(String key, String value) {
System.out.println(key + ": " + value);

Then compile & Run

$ mvn -DskipTests=true clean package
$ java -cp target/streams-examples-3.1.2-standalone.jar \