Real-time activity tracking with Kafka

Chamath Kirinde
Oct 10, 2017 · 10 min read

Baby Steps

When LinkedIn started growing its member base, the site’s functionalities got complex by the day. In 2010, they decided to invest in redesigning the infrastructure to facilitate the blooming need of scaling their multiple data pipelines without much hassle. As a result, Kafka, a single, distributed pub-sub platform, was born to handle real-time data streams in each pipeline. In the very next year, Kafka went open-source under Apache and has been used in huge production scales ever since.


What is Apache Kafka?

Kafka is a fast, scalable, durable, fault-tolerant pub-sub messaging system. It’s written in Scala and Java, and uses Apache Zookeeper for reliable distributed coordination. Kafka provides four core APIs, namely Producer, Consumer, Streams, and Connector.

Image for post
Image for post
source: http://kafka.apache.org/documentation.html

Real world use cases of Kafka

Over big data, fast data is becoming more of a household name lately, as companies are struggling to process real-time data streams. Since Kafka is capable of handling real-time data feeds with high throughput, low latency, and guaranteed reliability, more than a third of the Fortune 500 companies now use Kafka in production.


Why Kafka?

Handling enormous volumes of real-time data streams generated by systems like IoT has erupted pivotal challenges for enterprise giants. Precursory technologies and tools were not equipped to tackle the problems caused due to the scale and speed of these systems. This gave rise to a growing need for real-time analytics rather than traditional big data analytics. Apache Kafka is a fast, scalable, durable, fault-tolerant pub-sub data streaming platform, and hence is endowed to address many of these business problems.


Messaging with Kafka in UltraESB-X

In this article, we are going to discuss integrating Apache Kafka with UltraESB-X using UltraStudio, for real-time messaging. If you want to know more about UltraESB-X, this post would provide a good starting point. UltraStudio provides a graphical IDE to build, test and deploy integration projects without any fuss.

Use Case

Hogwarts is in deep waters lately. After Dolores Umbridge took over the headmastership, not even a pixie could flutter without her knowing. She made sure that school’s webmaster, Prof. Quirinus Quirrell is also dancing under her imperius. All he had to do is to monitor the web activities for the news articles and find which are gaining momentum. Original website looks as follows, with 0 read counts.

Image for post
Image for post
Original website view with 0 read counts

UltraESB-X takes care of the magic

Quirrell deligates this task to one of the Slytherin students, Gregory Goyle. As the solution for this, Goyle has decided to use UltraESB-X to integrate this web server with a Kafka server to provide real-time activity tracking. When the audience queries for the full news article as below, the request is used to populate an internal statistics engine to process read counts.

Image for post
Image for post
Original website view with updated read counts
  1. Persist the records, passing data into Hadoop or data warehousing systems for real-time processing and reporting
  2. Update the read count back on the server

Prerequisites

You can access the complete source for the website with functionalities from here, and the integration part will be covered throughout this article. Before going into details, make sure that you have the latest version — 17.07.1 — of UltraStudio with you. This tutorial assumes that you have Kafka and Zookeeper installed and started — and that there’s a Kafka topic that we can work with; if you are starting fresh, simply follow the first 3 steps of the Kafka quick-start guide.

Step-by-step Walkthrough

Segment 1:

First, we’ll see about recording the news article reads. For brevity, only the required configuration parameters are shown.

Image for post
Image for post
NIO HTTP Ingress Connector properties
Image for post
Image for post
Extract HTTP Query Parameter properties
Image for post
Image for post
String Payload Setter properties
Image for post
Image for post
Kafka Egress Connector properties
Image for post
Image for post
Add New Transport Header properties
Image for post
Image for post
Complete integration flow for Segment 1

Segment 2:

Next, the messages stored in Kafka topics are ready to be ingested to Spark, Storm or any other streaming data processing engine. For comprehensibility, we’ll settle for processing the messages and injecting the read updates of the news articles into an internal statistics engine. In addition to that, we need a processing element to manipulate this statistics engine and update with new impressions.

Image for post
Image for post
Impression Injector custom processing element logic
Image for post
Image for post
Kafka Ingress Connector properties
Image for post
Image for post
XPath String Extractor properties
Image for post
Image for post
Impression Injector properties
Image for post
Image for post
Complete integration flow for Segment 2
Image for post
Image for post
Read Count to Payload Setter custom processing element logic
Image for post
Image for post
NIO HTTP Ingress Connector properties
Image for post
Image for post
Extract HTTP Query Parameter properties
Image for post
Image for post
Read Count to Payload Setter properties
Image for post
Image for post
Add New Transport Header properties
Image for post
Image for post
Complete integration flow for Segment 3

Call To Action

  • Clap. Appreciate and let others find this article.
  • Comment. Share your views on this article.
  • Follow me. Chamath Kirinde to receive updates on articles like this.
  • Keep in touch. LinkedIn, Chummy Charms

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store