Stream Processing in Enterprise Integration World

Mohanadarshan Vivekanandalingam
siddhi-io
Published in
7 min readNov 18, 2018

Some of you might be wondering about the subject and thinking what is the relation between the Stream Processing and Enterprise Integration. Before discussing more details about the subject we need to understand what is meant by Stream Processing, why enterprise integration and then we’ll focus on understanding the relationship between them.

What is Stream Processing?

I don’t want to bring a totally new interpretation for this. Let me give you the well-accepted definition given by the analysts.

Forrester defines Stream Processing as the heart of all streaming flows is a sequence of streaming operators that are configured and threaded together to process and analyze the incoming data streams.

It also defines the Stream Processing applications as the software that provides analytical operators to orchestrate data flow, calculate analytics, and detect patterns on event data from multiple, disparate live data sources to allow developers to build applications that sense, think, and act in real time.

Gartner defines [2] Streaming Processing platforms as the software systems that perform real-time or near-realtime calculations on event data “in motion.” The input is one or more event streams containing data about customer orders, insurance claims, bank deposits/withdrawals, tweets, Facebook postings, emails, financial or other markets, or sensor data from physical assets such as vehicles, mobile devices or machines. The platforms process the input data as it arrives (hence “in motion”), before optionally storing it in some persistent store. They retain a relatively small working set of stream data in memory, just long enough to perform calculations on a set of recent data for the duration of a time window.

More or like, both analyst provides similar definitions for Stream Processing. In simple words, Stream Processing is an approach to process incoming events on the fly based on predefined queries.

You all can refer below article written by Dr. Srinath Perera to get more in-depth knowledge about the Stream Processing and its internals.

Why Enterprise Integration?

I am sure, all of you know or aware about integration (If not, why did u looking at the article noh? 😃). But, let’s find what analysts say about Enterprise Integration.

Gartner says Enterprise Integration (or Enterprise Application Integration) enable independently designed applications to work together. Key capabilities of application integration technologies include communication functionality that reliably moves messages among endpoints; support for fundamental web and web services standards; functionality that dynamically binds consumer and provider endpoints; message validation, mapping, transformation, and enrichment functionality; orchestration; and support for multiple interaction patterns, content-based routing, and typed messages

Now, we have understood what analysts say about Stream Processing and enterprise integration; it is time to understand the relation between them.

Streaming Data Integration

I believe, most of your worked either with stream processing use cases or integration use case or both of them. When you dealt with streaming data in the integration world then it is called Streaming Data Integration. Even though concept of streaming data integration is used for a long time, the buzzword Streaming Data Integration is evolved in the recent past. This has become one of the common requirement of an Event Stream Processing platform. It is also highlighted in the Gartner report that referred above; it mentioned as Stream Data Integration is one of the two subsegments of the latest event stream processing applications.

As per Gartner, streaming data integration primarily focus on the ingestion and processing of data sources targeting real-time extraction, transformation and loading (ETL) and data integration use cases. The products filter and enrich the data, and optionally calculate time-windowed aggregations before storing the results in a database, file system or some other store such as an event broker. Analytics, dashboards and alerts are a secondary concern for products in this subsegment.

Legacy Approach For Streaming Data Integration

In the past, this was done by integrating an Integration Server with a Stream Processor/Complex Event Processor . Refer below Figure to get a better understanding of the approach.

Legacy Approach for Streaming Data Integration

As shown in above Figure, Integration Server or Enterprise Service Bus (ESB) is used to connect with data sources to consume streaming data. Then those events are pass through or minimally processed/transformed events are sent to the Complex Event Processor (CEP) for event processing. In this approach, CEP does the stateful event processing.

Latest Approaches For Streaming Data Integration

Currently, there are various products or platforms available in the market to achieve this but if we have an open source solution then it would be an added advantage. WSO2 Stream Processor provides extensive capabilities that required for Streaming Data Integration.

WSO2 Stream Processor

Logo of WSO2 Stream Processor

WSO2 Stream Processor is a lightweight, open source, high performance, stream processing platform that understands streaming SQL queries in order to capture, analyze, process and act in real time. This facilitates real-time, intelligent, actionable business insights. With the product’s simple deployment and its ability to adapt to changes rapidly, enterprises can go to market faster and achieve greater ROI. Unlike other offerings, it also provides high availability, and throughput with just two nodes, along with its distributed deployment. The Siddhi Streaming SQL language also enables users to adapt to the market faster with quicker development times.

Let’s have a quick look about the Streaming Data Integration related capabilities.

  • Extensive connector support to consume or publish events from/to various event sources/sinks : HTTP, Kafka, TCP, In-memory, WSO2Event, Email, JMS, File, RabbitMQ, MQTT, WebSocket, Amazon SQS andTwitter and allows customization.
  • Support to connect with various external data stores : RDBMS (H2,mysql, ms sql, postgres, oracle and DB2), HBase, Solr, MongoDB, Redis, Hazelcast, Elastic search and Cassandra
  • Extensive support for event transformation and manipulation with dedicated feature support : JSON operations, regex operations, string operations, math operations, time operations and etc..
  • Support for Change Data Capture : WSO2 Stream Processor, provides built-in support for Change Data Capture. At the moment, this support is available through a Siddhi extension.

You could refer below blog written by Suho (Team Lead of Stream Processor Product) to get more interesting facts about related to ETL on Stream Processor.

Ballerina

Ballerina is a cloud native programming language which designed to provide first class support for integration constructs.

Logo of Ballerina Programming Language

Ballerina is a compiled, transactional, statically and strongly typed programming language with textual and graphical syntaxes. Ballerina incorporates fundamental concepts of distributed system integration into the language and offers a type safe, concurrent environment to implement microservices with distributed transactions, reliable messaging, stream processing, and workflows. Ballerina is a language designed to be integration simple. Based around the interactions of sequence diagrams, Ballerina has built-in support for common integration patterns. (Source: https://ballerina.io)

Ballerina provides a number of various features that required to make integration simple. Below are some areas that focused by Ballerina.

  • API Constructs
  • Asynchronous
  • Workers
  • JSON AND XML
  • Annotations
  • Streams
  • Transactions
  • Taint Analysis

You could refer below link to get a better understanding about the language, its design principles and philosophy.

As a integration focused language Ballerina clearly understands the need of stream processing in the integration world. Due to this Ballerina provides you the first class support native support for stream processing.

Now, if you have both integration capabilities and streaming processing capabilities in one place then it would be the ideal thing that can happen. Cool Isn’t it ? 😄

Now, let’s find a simple use case in the integration space.

An API which able to serve HTTP requests.

With the few lines of code, you could develop a HTTP service and run it. Now, let’s assume that you wanted to throttle those requests. This requirement can be simply implemented using the stream processing support available in Ballerina.

Query for a Throttling Requirement

Above is a simple query example to trigger an alert when API requests are more than the expected limit within a time duration. The streaming query simply, listens to the events coming through requestStream and if it identified more than 10 requests from the same host for last 10 seconds then it send an alert by publishing an event to the stream holds the alerts.

You could find more examples and references on Ballerina Stream processing capabilities in below references.

How to Select the Best Option?

Some of you might question why do we have two types of approaches to achieve the same thing. Yes, it is a very important question to raise. The simple answer is Code vs Configuration.

It is a choice of the user. Some users like to achieve their need with configurations but some likes to code. It also how it is going to aligned with your existing infrastructure and model. If you are planning to with the layered architecture then choosing WSO2 Stream Processor would be an ideal.

If you are more towards the micro services approach and love to code then Ballerina would be the ideal approach to achieve your streaming requirements.

Note : Just an head up, Ballerina Streaming constructs are evolving in a rapid manner these days then expect some interesting updates and changes 😌.

--

--

Mohanadarshan Vivekanandalingam
siddhi-io

Senior Tech Lead, Speaker @ WSO2. Closely works on Stream Processing and Integration