Anomaly Detection in Machines via High Speed RDF Message Processing: A Solution based on WSO2 Siddhi

Malith Jayasinghe
4 min readOct 29, 2017

--

photo credit

Introduction

RDF is a standard model developed for exchange of data on the web. RDF was adopted as a W3C recommendation in 1999. The RDF 1.0 and RDF 1.1 specifications were published 2004 and 2014 respectively. Some core benefits of using RDF message include easier data integration, easier versioning (it can independently version clients and services), consistent semantics across services and emphasis on domain modeling.

Siddhi is a 100% open source Java library thoroughly optimized for high performance. It performs Stream Processing and Complex Event Processing on real time data streams. It listens to events from data streams, detects complex conditions described via a Streaming SQL language, and triggers actions. It performs both Stream Processing and Complex Event Processing. A Siddhi Application can be written in a Streaming SQL language to process event streams and identify complex event occurrences. A Siddhi Application can run

  1. By embedding Siddhi as a Java library (in your project)
  2. Or within WSO2 DAS

The solution presented in this blog uses Siddhi as a library. We consider RDF messages that are generated by machines digital and analogue sensors embedded within manufacturing equipment. These messages contain measurements and setting parameters from injection molding machines equipped with sensors that measure various parameters of a production process: distance, pressure, time, frequency, volume, temperature, time, speed and force. All the measurements are taken at a certain point in time resulting in a 120-dimensional vector consisting of values of different types (e.g. text or numerical values). The aim is to automatically detect abnormal behavior of a manufacturing machine based on the observation of the stream of measurements provided by such a machine. The anomalies detection is done for each numerical value in the vector. This involved (1) Finding Clusters (2) Training a Markov Model and (3) Finding Anomalies.

Note that the solution presented in the blog has been specially developed for DEBS 2017. It can be easily extended or modified to function in similar other similar scenarios/environments depending on the requirements. The performance evaluation and correctness of the outputs were tested using the HOBBIT platform (automated evaluation platform).

Architecture

The following figure shows the overall architecture of the system.

The system consists of multiple components. RabbitMQ message queue (component of HOBBIT), which is a message broker is used to input the data to the solution and output queue (component of HOBBIT) into which the solution publishes the anomalies. The input data for the solution is in the form of RDF messages and are generated using the data generators of the HOBBIT platform and fed into a RabbitMQ queue (refer to the diagram). The correctness of the anomalies detected by solution is verified (by the HOBBIT platform) using the output anomalies.

As illustrated in the figure the RabbitMQ thread pool fetches the messages from the RabbitMQ queue. In order to achieve the best latency, we process RDF messages in multiple phases (in a pipeline fashion). Some initial processing (i.e. extraction of machine number and timestamp) of RDF messages is done by the RabbitMQ threads and the output is published to the disruptor (disruptor allows the consumers to scale by allowing the events to be processed concurrently with minimal contention). The disruptor has two handlers

Handler 1:

(Further) RDF Processing

The extraction of each pair property (e.g. temperature) and value (e.g. 30) from the RDF message

Clustering and Markov model generation

Recall that the aim is to detect abnormal behavior of a manufacturing machine based on the observation of the stream of measurements provided by such a machine. In order to achieve this, the data produced by each sensor needs to be clustered and the state transitions between the observed clusters have to be modeled as a Markov chain. The clustering and Markov model generation is implemented using the WSO2 Siddhi library which is the core component of this solution. The results are published to ring buffer.

Handler 2: Anomaly Detection

The anomaly handler reads the result published back to the ring buffer in the call back of the Siddhi query. A comparison is then made whether the observed transitional probability for the last n events in the given window is less than the threshold probability defined. If a lesser value is observed an RDF message is generated as described in the challenge containing the values of the probability, machine, dimension and the anomaly number. This message is then published to the output queue specified in the platform

Performance

On the HOBBIT platform the solution processed 35 megabytes/second with while the events spent only 1 ms time on average within the solution (this excludes the time that event spent RabbitMQ which is not a component of the solution rather a component of the evaluation system). This result was obtained when using a sliding time window of size 10 s, a probability threshold of 0.005, transition count of 5, maximum cluster iteration of 50 and a clustering precision of 0.00001

Conclusion

In this blog, I presented a solution which is cable of detecting abnormal behavior of a manufacturing machine based on the observation of the stream of measurements provided by such machines. The solution was based on WSO2 Siddhi. While developing the solution we went through numerous performance optimization phases (particularly in relation to RDF message processing)and the average latency of the final solution was around 1 ms.

--

--

Malith Jayasinghe

Software Architect, Programmer, Computer Scientist, Researcher, Senior Director (Platform Architecture) at WSO2