What’s New in Siddhi CNSP 5.1.0?

Suhothayan Sriskandarajah
siddhi-io
Published in
7 min readOct 17, 2019

After a continuous effort for the past three months and sleepless nights, finally, we have released Siddhi Cloud-Native Stream Processor 5.1.0 on the 8th of October 2019. It is not a simple task; it was a marathon effort to release the Siddhi Stream Processor on time.

Before going into much detail about the release, we would like to share some words about the journey of Siddhi. As many of you already know, the Siddhi engine (library) is already being used in various open-source and proprietary frameworks, and tools such as Apache Flink, Apache Eagle, Uber, and Punch platform. Especially, Siddhi has been a core component of various WSO2 products such as WSO2 Complex Event Processor, WSO2 Data Analytics Server, WSO2 Stream Processor, WSO2 API Manager (for throttling purposes) and WSO2 Identity Server (for adaptive authentication); and served a lot of customers and their needs. All these years, Siddhi has been pitched as an on-premise Complex Event Processing and Stream Processing engine.

As you are aware, nowadays a lot of companies evaluating and revisiting their enterprise architecture and move applications to the cloud. As the initial step, they have started converting their services and apps into microservices. Since stream processing and event-driven data processing are some of the key components in the enterprise architecture; we were able to see them struggling to port legacy big data solutions to the cloud, or trying to build microservices using on-premise solutions such as Siddhi library, in order to fulfill the serious gap for a stream processing systems in the Cloud.

For a stream processor to play in the cloud-native space it should have characteristics such as being lightweight, loosely coupled and adhering to the agile dev-ops process. However, most of the traditional stream processors are heavy and depend on bulky monolithic technologies which makes them harder moving to the cloud. Therefore, with the feedback of the users, we have implemented Siddhi Stream Processor (https://siddhi.io) as a 100% open source stream processor under Apache License v2, to solve the event-driven data processing and stream processing use cases natively in the Cloud. Now users can use Siddhi to collect, inject, process, analyze, integrate (with services and databases), and send messages to other systems, all in one tool both natively on cloud and on-premise.

In the following sections, we will highlight some of the key features of Siddhi Stream Processor 5.1.0 and provide necessary details on those.

Native support to run Siddhi Streaming Apps in K8s

One of the major features that we have embedded in this release is to natively deploy and run Siddhi apps in Kubernetes. To fulfill the native Kubernetes support we have developed a Kubernetes operator for Siddhi. The Siddhi operator has the responsibility to deploy, run, and reconcile Siddhi app deployments in a Kubernetes cluster. The users can now deploy Siddhi apps directly in a Kubernetes cluster using the Siddhi operator Custom Resource Definition(CRD) namely SiddhiProcess by simply using kubectl commands. Siddhi supports two types of deployments via the Siddhi operator, and they are :

  1. Simple Deployment
  2. Distributed Deployment

The simple deployment deploys selected Siddhi Applications in a single pod in Kubernetes, whereas the distributed deployment breaks the Siddhi App into stateless and stateful parts, integrate them using NATS (and NATS streaming) messaging system, and deploy them in two different pods as shown below, in such a way they can automatically achieve high availability and zero event loss. Find out more details about the Siddhi Kubernetes deployments from the official documentation.

Distributed Siddhi deployment in K8s supporting High Availability

You can also try out the deployment in Katacoda.

Tooling support to generate Docker and K8s artifacts

Previously, to deploy Siddhi apps in Docker or Kubernetes environments, the users are required to write Docker files and Kubernetes YAML files on their own. Writing these files required a reasonable amount of knowledge and experience in Docker and Kubernetes. At times, users have to write the same type of files again and again, because all those files follow common standards, which makes the process time-consuming and error-prone. This release introduces tooling support to automate the generation of Docker and Kubernetes artifacts for the Siddhi apps.

Now, users can implement and test Siddhi apps using the Siddhi tooling editor and then generate Docker and K8s artifacts with few button clicks. Apart from downloading the artifacts, the tooling editor also supports to building and pushing of Docker images directly to the docker registry, making the overall Siddhi app development lifecycle more convenient and aligning with the CI/CD process.

Proven CI/CD reference implementation for Siddhi

CI/CD has become an essential part of every organization's effective software development and delivery process. It contains two separate but complementary parts that enable users to continuously integrate the Siddhi Applications using automated testing, and continuously deliver the apps by verifying that they are ready for production. With Siddhi 5.1.0 we support the entire CI/CD pipeline from the development to the deployment, as shown in the reference implementation below.

CI/CD reference implementation for Siddhi

With the new Siddhi test framework, users can now create Sandbox Siddhi runtimes and perform unit tests independently on Siddhi Apps without connecting to any external systems, then they can perform integration tests on those Siddhi Apps by integrating with external systems running as Docker containers, and finally perform the same integration tests against an actual deployment environment as black-box tests to verify the deployment before moving into production. These streamline CI/CD and enable users to move to production much faster. Refer to the blog “Building an efficient CI/CD pipeline for Siddhi” for detailed information on using the Siddhi test framework.

Caching and query optimizations when integrating with external databases

Database integration is an integral part of any stream processing use case as it helps to enrich event data. However, traditional databases can be slow, and they can introduce high latency to the event processing pipeline. With this release, Siddhi introduces caching support for all external databases to reduce data retrieval latency via caching policies such as FIFO, LRU and LFU and preloading data when every possible. For more info refer to the blog on “In-memory Database Caching for Stream Processing — Siddhi”.

Further, Siddhi has also been improved to reduce RDBMS database query latency up to 3 times by directly executing most of the execution in the database itself. For more information see refer the blog “Siddhi 5.1 reduces Database Query Latency up to 3 times”.

gRPC Connector

gRPC is a modern high-performance RPC framework. Due to its high performance, it has become widely used in cloud environments. In this release, we have released an IO extension for Siddhi to perform gRPC communications, and a mapper to support any Protobuf messages via gRPC. Here, the gRPC IO extension supports the following features.

  1. gRPC sink
    gRPC sink enables to publish events to a gRPC server. It uses the fire-and-forget approach to send events. In other words, it just sends events in a streaming manner and does not wait for a response.
  2. gRPC call sink & gRPC call-response source
    gRPC call sink can be used to send events to a gRPC server, and here gRPC call response source will be used to receive responses from the server. Note that these two cannot be used in isolation.
  3. gRPC source
    This gRPC source listens to gRPC requests that are sent by gRPC clients in a streaming manner and passes them to Siddhi for processing. Note, here the clients do not expect any response from the server.
  4. gRPC service source & gRPC service response sink
    gRPC service source listens to gRPC requests from clients, and synchronously sends responses using the gRPC service response sink to the clients. Note that these two cannot be used in isolation.

In all these cases input and output messages can be formatted as JSON, TEXT, XML messages or mapped to a Protobuf message using respective mappers.

Support for complex List & Map operations

List and Map data types are necessary elements for Siddhi Apps when performing complex message transformations. Therefore these functionalities are now supported using the siddhi-execution-list and siddhi-execution-map extensions. These provide useful functionalities to process list and map data structures, using functions such as create(), isMap()/isList(), size(), getKeys(), getValues(), etc. Furthermore, Siddhi also supports Scatter & Gather mode of processing by splitting each item in the list using tokenize(), process them in isolation, and then combine them back using collect() as shown below.

Scatter-gather data pipelining

Pattern query improvements

When counting patterns are used such as,

Simple Pattern Query in Siddhi

Siddhi now allows retrieving all the events matched against the counting condition as a list. Since the list is not native to Siddhi, it is returned with type ‘object’, but this can be accessed and processed with the newly introduced siddhi-execution-list extension.

Conclusion

You can download the Siddhi 5.1.0 from siddhi.io site try out the various use cases which are newly introduced each covering a complete end to end use case in sections such as :

For more information about writing Siddhi queries refer to Siddhi Query Guide and refer the documentation to deploy Siddhi in Java, Python, Docker, and Kubernetes.

For questions and support please reach out to the development team via Mailing lists, StackOverflow or Slack. You can also participate in the weekly sync up meetings and start collaborating and contributing to the project.

Thank You

Siddhi Team

--

--

Suhothayan Sriskandarajah
siddhi-io

Director at WSO2, work on Big Data and Stream Processing solutions.