GCP IIoT: Industrial anomalies detection

Published in

Badal-io

6 min readFeb 21, 2023

Introduction

This is the 3rd part of the series of blog posts demonstrating how to build an Industrial Internet of Things (IIoT) data management and analytics platform on the Google Cloud Platform (GCP). In previous parts, we created a framework for first-mile data collection, storage, real-time analytics, and alerting using FogLAMP, Dataflow, and BigQuery. We also set up a suite of ready-to-use Looker dashboards for visualizing and analyzing real-time IIoT data with a focus on oil & gas use cases.

This blog post will describe how to design an anomaly detection system specific to the industrial sector using GCP. More specifically, the project will demonstrate how anomaly detection algorithms are applied to streaming data using BigQuery ML.

How does anomaly detection help the Industrial Sector?

Recognizing and responding to anomalies, especially in the metal industry, is essential for maintaining the stability and efficiency of business processes. One way to achieve this is to apply various tools offered by the Google Cloud Platform. These services allow for the implementation of a wide range of artificial intelligence (AI) and machine learning (ML) instruments in conjunction with an enterprise-class streaming analytics platform.

Given that any delays in industrial processes, including equipment breaks and unscheduled technical services, could lead to significant financial losses, it is desired for businesses to keep the process uninterrupted as long as possible. One way to reduce the risk of unexpected breakage is to predict the future condition of the specific piece of equipment — in our case, it is a chain of an industrial ladle — based on its current motion behavior. For this purpose, implementing the anomaly detection method allows us to identify abnormal patterns in the data streams and detect the behavioral signs signifying the deformation of the chain.

What is BigQuery ML?

By utilizing GCP tools in this project, they enable a clear operationalization of complex projects, which usually require advanced techniques for managing large amounts of data. Specifically, BigQuery ML is an instrument that allows one to create and execute ML models in BigQuery using standard SQL queries. BigQuery increases the speed of model development by removing the need to export data from the data warehouse. It reduces complexity as fewer tools are required. Furthermore, it makes the production process easier and faster because moving and formatting data for Python ML frameworks is not necessary for model training in BigQuery.

Industrial Ladle Failure Detection

The object of interest is an industrial ladle which is a bucket-shaped container or vessel used to transport and pour out molten metal. It needs to be strong enough to contain a heavy load of metal and heat-resistant like a furnace. Considering the specifics of the instrument and its high operation loads, the susceptibility of the chain to mechanical damage becomes evident. Additionally, the mechanics of the instrument makes it prone to sudden ruptures. As mentioned previously, any unexpected interruption of the industrial process imposes a high risk of financial losses. Thus, predicting the occurrence of such accidents could significantly minimize the potential risks and allow for more effective planning of the costs.

As part of our project, an anomaly detection system was built to recognize abnormal patterns of industrial ladle movements.

The data used in the project contains the following:

ladle’s chain movement setpoint velocity — a desired velocity chosen by the ladle control system
ladle’s chain actual movement velocity
chain’s position, which describes its fulcrum point

Micro-ruptures and other damages at the chain’s fulcrum induce deviations of the actual velocity from the setpoint velocity, which are more notable than those observed under normal conditions. Those outliers can be considered as an indication that particular segments of the chain need to be inspected for possible damages. As such, the purpose of the project is to detect anomalies (i.e., to determine statistically significant deviations of the actual motion velocity from the setpoint velocity).

Architecture

The key components of the anomaly detection pipeline are highlighted below:

Main steps of the pipeline:

Ingesting data from the FogLAMP instance to a Pub/Sub topic.
Normalizing the data sample and writing it to BigQuery with Dataflow.
BigQuery ML-based clusterization and anomaly detection using scheduled BigQuery scripts.
Retraining the clustering model using scheduled BigQuery scripts.

Data streaming

Data is generated based on a historical dataset of the industrial ladle movements. Combining multiple measurements with Gaussian noise allows us to generate new time series objects representing a whole cycle of ladle movements. FogLamp collects and publishes the generated objects to a Pub/Sub topic. Part 1 of the blog post series contains more details on the ingestion process.

Below is an example of a single data sample describing a ladle movement cycle in time series format.

*Figure 3: Example of a cycle of an industrial ladle movement: time series visualization*

*Figure 4: Example of a cycle of an industrial ladle movement: a JSON message*

After normalization, the data is organized and stored as follows:

*Figure 5: Example of a cycle of an industrial ladle movement: normalized data stored in BigQuery*

BigQuery: Anomalies calculation

First, the k-means clustering algorithm in BigQuery ML is used to train the model and label clusters of various operating modes and setpoint velocities. After experimenting with multiple cluster sizes, the model evaluation based on the Elbow method suggested that eleven clusters should be used. The velocity setpoint’s standard deviation for each cluster was calculated for further use.

*Figure 6: Elbow method and the average of the squared distances from the cluster centers*

BigQuery ML allows training and storing a model directly in BigQuery without using other external instruments. Below is the code of the clustering model training:

CREATE OR REPLACE MODEL foglamp_demo.clustering_model  
options(model_type='kmeans', num_clusters = 11, standardize_features = false) 
AS 
select
  value as setpoint_velocity
from foglamp_demo.measurements_raw
where property_measured = 'setpoint_velocity'
  and value is not null

The next step is to apply the clustering model for each new data sample and to detect anomalies by calculating Z-scores. This approach is reasonable because the actual velocity deviations from the setpoint are normally distributed. By default, the standard deviation threshold was chosen as three; however, it could be further tuned to achieve the desired sensitivity.

Calculation code:

create table industrial_ladle_demo.ladle_clusters_statistics as 
select 
  CENTROID_ID,
  STDDEV(setpoint_velocity) as velocity_std,
  3 as num_std,
  3 * STDDEV(setpoint_velocity) as velocity_std_threshold
FROM ML.PREDICT(MODEL industrial_ladle_demo.clustering_model, 
  (
    select 
      setpoint_velocity 
    from industrial_ladle_demo.ladle_hist
        )
  )
group by CENTROID_ID

A scheduled BigQuery task calculates anomalies every 15 minutes. Finally, the anomaly detection results are stored in BigQuery as follows:

*Figure 7: Anomaly detection results table*

Given that the results stored in BigQuery provide both the time of anomaly occurrence and the fulcrum point of the chain, it becomes readily identifiable if the chain needs inspection and which particular segment needs to be examined.

*Figure 8: Example of a cycle of industrial ladle movements with anomalies detected*

BigQuery ML: Clustering model retraining

Since BigQuery ML enables training models using standard SQL to automate overall model creation and training processes, scheduled queries were used. Therefore, the clustering model retraining and standard deviation statistics calculation for each cluster is planned to be scheduled every week, provided that this period is reasonable in terms of collecting new amounts of data.

Visualization

One way to implement visualization for the project is to apply a set of ready-to-use Looker dashboards. This instrument allows us to assess real-time analytics of the target variable and generate alerts for anomalies detected. A detailed description of this approach can be found in part 2 of the blog post series.

Conclusion

This blog has demonstrated how to build a streaming anomaly detection solution for industrial targets using GCPinstruments and BigQuery ML in particular. Although detecting anomalies based on the historical distribution of the target variable may not be completely accurate, it indeed does find abnormal cases. Further investigation of the patterns where anomalies were present could provide a more sustainable solution and should be case-specific.