Your First Data Pipeline Using Kafka

In this article we are going to build our very basic data pipeline using Kafka.

In my another article I have already discussed how to setup Apache Kafka.

If you are working on large amount of data & using Kafka probably you are using a stream processing framework like samza, spark for processing the data from Kafka producer. After the desired operation on Kafka topic’s data it will be dumped to a sink.

But here we are using Kafka’s own stream processing library instead of using a full featured framework which is available after Kafka 0.10 release.

Rough Diagram Of our Implementation.

REST Proxy implementation of Kafka from https://www.confluent.io/ is used for exposing the Kafka producer. REST Proxy and the services depends on: ZooKeeper, Kafka, and the Schema Registry.

$ ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties &
$ ./bin/kafka-server-start ./etc/kafka/server.properties &
$ ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties &
$ ./bin/kafka-rest-start ./etc/kafka-rest/kafka-rest.properties &

Kafka topic will be written by a REST API.

curl -X POST \ 
-H "Content-Type: application/vnd.kafka.json.v1+json" \
-d '{"records":[{"key":"12354", "value":{"foo6":"bar1"}}]}' \
"http://localhost:8080/topics/gps-logs"

Now this data will be processed by Kafka stream process API. In this post we will print the data from Kafka topic to console.

$ git clone https://github.com/confluentinc/examples.git
$ cd examples/kafka-streams/src/main/java/io/confluent/examples/streams/

Now append below snippet in WordCountLambdaExample.java before stream starts.

textLines.foreach(new ForeachAction<String, String>() {
public void apply(String key, String value) {
System.out.println(key + ": " + value);
}
});

Then compile & Run

$ mvn -DskipTests=true clean package
$ java -cp target/streams-examples-3.1.2-standalone.jar \
io.confluent.examples.streams.WordCountLambdaExample