DATA INGESTION PLATFORM(DIP) — REAL TIME DATA ANALYSIS — FLINK STREAMING

3 min readAug 23, 2016

This blog is an extension to that and it focuses on using Flink Streaming for performing real time data ingestion.

The previous blog DiP (Storm Streaming) showed how we can leverage the power of Apache Storm and Kafka to do real time data ingestion and visualization.

DiP currently supports three more data streaming engines using Apache Storm , Apache Spark and Apache Apex.

This work is based on Xavient co-dev initiative where your engineers can start working with our team to contribute and build your own platform to ingest any kind of data in real time.

All you need is a running Hadoop cluster with Kafka, Flink, Hive ,HBase and Zeppelin. You can deploy the application on the top of your existing cluster and ingest any kind of data.

You can download the code base from GitHub
Flink Streaming Features

One runtime for Streaming and Batch Processing

Java, Scala, client bindings
Declarative API
Very High Throughput
Own memory management inside the JVM
Growing community support

Technology Stack

Source System– Web Client
Messaging System– Apache Kafka
Target System– HDFS, Apache HBase, Apache Hive
Reporting System– Apache Phoenix, Apache Zeppelin
Streaming API– Apache Flink
Programming Language– Java
IDE– Eclipse
Build tool– Apache Maven
Operating System– CentOS 7

High Level Process Workflow with Flink-Streaming

Input to the application can be fed from a user interface that allows you either enter the data manually or upload the data in XML, JSON or CSV file format for bulk processing
Data ingested is published by the Kafka broker which streams the data to Kafka consumer process
Once the message type is identified, the content of the message is extracted from the kafka source and is sent to different sinks for its persistence
Hive external table provides data storage through HDFS and Phoenix provides an SQL interface for HBase tables
Reporting and visualization of data is done through Zeppelin

DiP Front End

Flink Execution Flow

The job submitted to flink will look like this:

Open the UI for the application by visiting the URL “http://:/DataIngestGUI/UI.jsp” , it will look like this:

DiP Data Visualization
Using Apache Zeppelin, data ingested in HBase can be viewed as a report/graphs by simply using phoenix interpreter which provides SQL like interface to HBase table. These graphs can be embedded to any other applications using JFrames.

Credits :

Xavient Information Systems

Technical team:

Neeraj Sabharwal

Mohiuddin Khan Inamdar

Sumit Chauhan

Gautam Marya

Puneet Singh

DATA INGESTION PLATFORM(DIP) — REAL TIME DATA ANALYSIS — FLINK STREAMING

Written by Xavient