Machine Learning for Predictive Maintenance

Roshan Alwis
Tech Vision
Published in
5 min readNov 17, 2016

We are in a world that links out lives with technology. Most of our time we spend with a machine directly or indirectly. Maybe it is your watch or an elevator that used to go to your floor. Sometimes it is inconvenient to have them broken. If you are in a broken elevator, it will increase your temper and also waste your precise time.

Predictive maintenance tries to predict failure and carry out preventive operations improve the availability of systems. Some breakdown can be costly, and predictive maintenance, if done properly, can lead to major cost savings and increased availability.

These savings come in two forms.

  1. First is that by avoiding the downtime, we eliminate the cost of those downtime, which is sometime money and sometime unhappy customer.

2. On the other hand, it lets us optimize the periodic maintenance operations.

Clear the Way to an Implementation

Identify the problem domain

Implementing of such a system is not trivial. It needs some serious studies about event history, and identify their relevance to the context. It is better to model the scenario using minimum number of parameters. Because increase of number of parameters in the model can also increase the complexity of it. So, removal of unnecessary events is required. For example, effect of wind direction to breakdown a car can be irrelevant or may be insignificant to the context. But there can be some connection, which we cannot avoid.

Early predictions and Late predictions

It is required to predict maintenance early, but not too early [Early predictions]. As well as predicting a maintenance after a failure is not worthy [Late predictions]. Late prediction is the worst thing that could be happen. So, when implementing a such system, it is advisable to avoid Late predictions, and to minimize early predictions.

Turbofan Engine Degradation Use case

In NASA, they have used a fixed control parameter schedules to control nominal production engine. But deterioration in engine components cause off-nominal operation which results loss of performance at the run-time. So, this fixed control schedule is not optimal for an engine which deteriorates with time.

Data set

Given data-set has published by NASA in order to predict remaining useful lifetime (RUL) of Turbofan engines based on their sensor readings. Data sets consist of several multivariate time series. Each data set is further divided into training, test and ground truth data. All the engines are considered to be the same type, where the engine starts with different degrees of initial wear and variations in the manufacturing process which is unknown to the user. There are three optional settings which have an effect on the performance of each machine. There are 21 sensors collecting different measures related to the engine state at run-time. And also, these data contaminated with sensor noise.

The engine is operating normally at the beginning of each time series, and develop a fault at some point during the series. The fault grows magnitude until system failure. The time series ends some time prior to the failure. The objective of this experiment is to predict the remaining useful life cycles after the last cycle that the engine will continue to operate.

Flow

Phase 1: Define the Evaluation Criteria

Since this is a regression type problem, the methods of evaluation need to be relevant to that context. The purpose of evaluation is to measure the difference between the actual value and the predicted value. Root Mean Squared Error is a good method of evaluation since it penalizes large errors severely.

After training a model, the testing data set apply to the generated model to predict values. Based on the predicted values and the ground truth data, root mean squared error can be calculated.

Phase 2: Identifying the Present State

It is required to know about the present state of the experiment before going into detail testing. So, without doing any feature engineering process to the data set, original data set has applied on top of different machine learning algorithms. The results are given below.

Phase 3: Use Auto-encoders to Remove Noises

In the problem description, they have given that the sensor readings contaminated with noises. So, it is necessary to eliminate noises before proceeding to the next step. You can use auto-encoders to measure the reconstruction error and based on the error it is possible to remove noises.

Phase 3: Selecting Features

Use feature importance or recursive feature elimination to select most contributing features to the final output. Because selecting all the features might not be the optimal approach which can increase the complexity of the model.

Note: Feature should be selected according to the algorithm that is going to be used

Phase 4: Feature Engineering

In this phase, the training data set is updated with new features (columns) generated from existing data. For example, calculating moving average, moving standard deviation, auto-correlation etc. can be added as new features to the data-set.

This will help machine learning algorithms to identify the patterns in the data-set.

Phase 5: Optimizing Hyper-Parameters Using Grid Search

As well as tuning the data set, model also need to be tuned for identifying the existing patterns in the data-set. Performing a grid search can identify the best hyper parameters according to the evaluation metric.

Normally each model has default hyper-parameter values which might not be the optimal configuration for the given context.

Discussion

Prime objective of predictive maintenance is to predict when equipment failure can occur. Then prevent that failure by taking relevant action. Predictive Maintenance System (PMS) monitors future failures and will schedule maintenance in advance. It also reduces the frequency of maintenance.

This brings several costs savings.

  • Minimize the time that a particular equipment being maintained and let that time to be productive.
  • Minimize the cost of maintenance.

PMS will effectively fit into scenarios where failures can be cost effectively maintained and the systems which have critical operational functions.

Pros.

PMS guarantees that the system will only stop right before an imminent failure. This will increase the up-time of the system by reducing total time and cost spent on maintaining components.

Cons.

Not like in preventive maintenance this will require a sophisticated system to monitor equipment needed for predictive maintenance. To accurately interpret conditions to perform maintenance based on collected data also requires an additional skill set.

Since failure data are costly to collect, it is hard to identify the patterns of failures

--

--

Roshan Alwis
Tech Vision

Software Engineer at Sysco Labs. (Computer Science & Engineering Graduand at University of Moratuwa)