How We Adapted Our Models to Better Fit the Business Needs

Oren
Machines talk, we tech.
7 min readJan 17, 2022

At Augury we focus on bringing value to our customers, by training predictive models that support smarter and more efficient maintenance of their machines. In this post series, we will review the recent process of adapting the way we build, train and evaluate models to meet the business needs. We see this as a good example of business-driven Algo development.

First, we’ll go over some background of the problem we’re trying to solve in Augury, then focus on the approach we’ve taken until recently. In the next post we’ll elaborate on the recent alterations, the motivation for this and results.

Learning from Vibration Data at Augury

Augury’s Halo endpoint contains multiple sensor types, such as vibration, temperature and magnetic field sensors. After a vibration signal from our IOT devices has been transformed into an electrical analog signal, we convert it into bits of digital information, preprocess it and move it to our algorithms. We use it to train AI models to detect physical problems, such as component imbalance, bearing failures, electrical faults, and more.

Models Overview

At the preprocessing step, the received data is transformed into a set of predefined features. They are domain related and specifically designed to represent certain physical phenomena, which have been identified as indicative of at least one fault (or that help differentiate between faults).

Here are some examples of features:

As for algorithms, we have two main paths: Anomaly detection and Fault detection.

Anomaly detection algorithms are a family of algorithms that are concerned with identifying overall important changes in a machine’s health data over time and are optimized for recall, as we do not tolerate misses.

Fault detection algorithms are a family of algorithms concerned with determining if specific health conditions are present/absent on a machine at any given time. They are optimized for precision, as this conceptually provides an insight layer that should be accurate and reliable.

When a model alerts — in some cases, when the algorithms aren’t confident enough to generate an automatic alert, we send the alert for review by an Augury expert for a second opinion and to label it for the purpose of informing the customer with the findings. In any case, expert feedback provides the label used in our dataset to train future models.

Let’s focus on the fault detectors

As noted, fault detectors are our way of providing accurate and specificity-driven insights to customers, and fall under the responsibility of a designated squad. This squad is driven by the general goal of providing accurate insight that can support faster analysis by the vibration experts or customer, and promote complete automation (as this can facilitate more scalable growth in the number of machines covered). Since the number of machines covered is growing rapidly, we aim to provide fully automatic, high quality coverage to as many machines as possible, to avoid bottlenecks in expert reviews.

Let’s overview the data and its unique characteristics, including some of the labeling pitfalls. Then we’ll outline our existing framework, why we decided it is not the most optimal way to meet the business goals of the squad, and our proposed alternate solution.

Our Data

As explained, our features are the set of discrete features that are extracted from the signal.

Our endpoint is continually recording data (once every hour, for 4 seconds). Each recording goes to the sensitivity-driven anomaly detector.

When an anomaly is triggered, the alert is enriched with any additional context we can generate, such as specific fault confidences, and delivered to our users. In cases where confidence is not high enough, we can use our vibration experts for a second opinion and further interaction with the customers. Some of the key challenges we have are related to data and labeling. The way we account for them defines the strengths and weaknesses of our solution from a business and algo perspective. Our aim is to define clear business goals and propagate this view to an overall algorithmic solution, all the way from general approach to metric optimization.

Some of the main challenges are:

Label Inconsistency
As the vibration experts are exposed to more evidence over the lifetime of the machine, or following interaction with the customers, they may change their initial interpretation. Moreover, as the problem does not necessarily have a single clear answer, we might see variations in the distribution of labeling between different experts, where some might have slightly different biases or personal guidelines.

Imbalance
There can be severe imbalance, as only 1%-20% of the data is positive, depending on the specific faults. This significantly decreases as the severity level elevates (normally 50–70% of the positives are in the lowest of the 3 severity levels). Additionally, as we only have an expert review on sessions that have passed the anomaly detector, the training data doesn’t necessarily represent the real-world distribution of the data.

Multi Labels
Sessions tend to include more than one fault with various degrees of typicality for the present faults, and there might be interaction or interference between faults. In addition to co-occurrence of faults, there can be a causal relationship between faults (e.g. unbalance leading to bearing wear) that is very hard to model and account for.

Customer-facing Labels:
As our primary goal is to assist the customer, and annotators constantly interact with them. In case of uncertainty and non-urgent cases, they might choose to first alert on the fault that is easier/cheaper for the customer to validate, even if they suspect there might be additional conditions.

Rigid and Limited Labels
When we ask the annotators to choose a severity level (as it is important to our customers), we force them to assign a discrete label about a phenomenon that’s more likely continuous, which may cause the labeling to be more sensitive to personal bias.

Current Approach

In order to enhance a specific session with fault-specific insights, we use what we denote as detectors. Each detector is a microservice that wraps a predictive (normally Deep-Learning) model with the required preprocessing & post-processing to generate insight and alerts. It analyzes a machine’s condition from a specific fault (or fault family) perspective and yields fault-specific output, such as “The motor bearing has Bearing Wear”.

Each detector is responsible for managing all the events that are related to this specific fault in the machine’s lifecycle — that is the motor, timeframe of the fault, severity, confidence level and any other possible information layer.

As noted, each detector is responsible for a specific fault (e.g bearing wear, misalignment) or family (electric faults).

Insights provided by a detector can be beneficial in two ways:

1. Facilitate and improve the process of reviewing a session by an expert, or enhancing it

2. Support scale by allowing automation (skip expert review) in case of severe conditions and/or high confidence

Currently, the main usage for the detector is 1. — by providing the expert with additional information that should help make more accurate and quicker decisions.

However, the main problem with the current approach is that while the number of machines we cover increases exponentially over time (we just raised 180M$!), the number of machines an expert can support increases much more slowly. To this end, we must introduce a fully automated flow to as many machines as possible. To allow automation, a prerequisite is severity estimation, or at least identification of the highly severe cases (danger; i.e. two weeks to expected failure).

Hence we decided, as a first iteration of a model that supports automation, to build a severity estimator with the possibility of reducing its capacity to only detect the most severe cases. We have 4 levels of severity (Healthy, Monitor, Alarm, Danger) in our system, so this is basically a multiclass ordinal regression with the possibility of transforming it into a binary danger detector.

We tried various common methods to train such a model — ordinal regression, transfer learning from binary model, bagging of designated binary sub-models (danger vs. all, alarm vs. all, danger+alarm vs. all, etc.), level-specific features engineering/selection, etc.
While we reached seemingly adequate results from a pure metric point of view, when we translated it to the business needs and properties of the actual data, we failed to reach satisfactory results.

For example, transfer learning from binary model yielded this TPR/FPR curve:

BUT we need to remember that when it comes to automatic alerts, every False Positive is highly costly and we can’t afford these cases.

This, combined with the realization that the aforementioned challenges are even more problematic when detecting high severity or building (ordinal) multiclass classifiers. For instance, a machine changing fault severity from high to low (but still fault) due to a customer’s actions, might not influence binary detection at all — but does hurt us when trying to learn severity.
Also, the imbalance is much more extreme for estimation of severity level rather than the existence of a certain fault. Therefore, inaccurate labels have a much larger impact on the task of predicting severity compared to binary fault detection, as each fault prediction influences the PR curve to a much larger extent than in our standard binary detectors.

Taking that all together, we realized that when building models with the eventual goal of supporting automation, we need to take a different approach than when building models that aim to support expert labeling — as the challenges are enhanced and requirements differ.

These could not be mitigated by common practices, such as custom metrics or weighting with the current data or approach.

In the next post we’ll outline the alternative solution of severity estimation, results and comparison.

--

--