Jet Engine Remaining Useful Life (RUL) Prediction

15 min readAug 28, 2019

High-level overview:

The main aim of this post is to document my implementation of a model that can be used to perform predictive maintenance on commercial turbofan engine. the predictive maintenance approach used here is a data-driven approach, meaning that data collected from the operational jet engine is used to perform predictive maintenance modeling.
to be specific, the project aim is to build a predictive model to estimate the Remaining Useful Life ( RUL) of a jet engine based on run-to-failure data of a fleet of similar jet engines. the algorithm documented here follows closely [1]

Data-set overview

NASA has created the Prognostics and Health Management PHM08 Challenge Data Set and is being made publicly available. the data set is used to predict the failures of jet engines over time. The data set was provided by the Prognostics CoE at NASA Ames

The data set includes time-series measurements of various pressures, temperatures, and rotating equipment speeds that for the jet engine. these measurements are typically measured in a commercial modern turbofan engine. All engines are of the same type, but each engine starts with different degrees of initial wear and variations in the manufacturing process, which is unknown to the user. There are three optional settings that can be used to change the performance of each machine. Each engine has 21 sensors collecting different measurements related to the engine state at runtime.
six different flight conditions were simulated that comprised of a range of values for three
operational conditions: altitude (0–42K ft.), Mach number (0–0.84), and throttle resolver angle (TRA) (20–100).

see the picture of a schematic of the engine and location of various sensor measurements.

Collected data is contaminated with sensor noise. Over time, each engine develops a fault, which can be seen through sensor readings. the data stops for each engine when a failure has occurred for that particular engine. hence the actual RUL is known based on the length of the data.
the data is actually a simulated data using C-MAPSS (Commercial Modular Aero-Propulsion System Simulation). The simulated data generated were used as challenge data for the 1st Prognostics and Health Management (PHM) data competition at PHM’08.

for more details on the data set see Damage propagation modeling for aircraft engine run-to-failure simulation

Problem statement

the problem at hand is to come up with a machine learning model to predict RUL based on time-series data of sensor measurements typically available from aircraft gas turbine engines.

Solution Strategy:

the main strategy will be to use the dataset to train a regression model to predict RUL. since the data is in a form of a time trajectory of many sensor data, then there will be a need to fused these sensors into a condition indicator or a health index that help in identifying the occurrence of a failure

the model in testing mode will compare how similar/ correlated the testing fused signal to the training fused signal. based on this similarity comparison, a prediction is made.

since the training data composed of run-to-failure trajectories, whereas the testing data contains trajectory up to undefined health state, then the training process will include training the model on a portion of the trajectory before the failure has occurred to simulate the real use of the model in online prediction mode.

for detailed strategy, please refer to [1], [2]

Overview of the RUL estimation strategy. The remaining life of a test unit is estimated based on the actual life of a training unit that has the most similar degradation pattern [1]

unit is estimated based on the actual life of a training unit that has the most

similar degradation pattern.

Data Exploration:

the Data used in the analysis here is part of the ‘Simulation_Data’, and the specific file for it is: ‘train_FD001.txt’. this specific data set contains 100 run-to-failure engine simulation (corresponding to 100 different engine)

first, let’s look at the correlation between sensors

from the first look, it seems that some sensors are very correlated with each other. this seems to hold for all engine together and a single-engine data as well. this might cause issues during modeling and I might need to delete/fuse some sensor data.

also, it seems that some columns need to be removed since they are all white. this might indicate non-changing values
let’s look at the correlation deeper

('sn_1', 'sn_5', 1.0)
('sn_1', 'sn_10', 1.0)
('sn_1', 'sn_16', 1.0)
('sn_5', 'sn_10', 1.0)
('sn_5', 'sn_16', 1.0)
('sn_9', 'sn_14', 0.9631566003059564)
('sn_10', 'sn_16', 1.0)

the above sensors are perfectly correlated with each other. which means that I can remove some columns for the analysis

now let’s look at the time series plot and distribution of various data columns

from the above distribution plot, we can see that some sensors do not change and hence they can be removed from the analysis

also the main takeaways:
1. the distribution of almost all variables is single skewed gaussian
2. op_cond_2 and sn_17 seem to be discrete variables and not continuous ( maybe these need to be deleted to keep only continuously varying variables for modeling)
3. the observations above holds when plotting all engines and when the plot is made for a specific engine. (means all engine are very similar in their output response)

columns to be removed from analysis since they do not change with time 
 ['op_cond_3', 'sn_1', 'sn_5', 'sn_6', 'sn_10', 'sn_16', 'sn_18', 'sn_19']high correlation columns all engines:
('sn_9', 'sn_14', 0.9631566003059564)

after removing columns that do not change with time, only sensors 9 and 14 have correlation larger than 0.9

let’s take a look right now at the raw sensor data with the operational cycle for all engines to get an idea about the time variation of the sensors and also git some glimpse about the engine to engine variability:

From the time series plot of all engine, we can find the below takeaways

columns [op_cond_1, op_cond_2] do not have an apparent trend toward the end life of the engine. they are just random noise. so with great confidence, I can say that these two columns cannot help a predictive model that is based on the trend of the series to discover relevant information regarding estimating the RUL
columns [s n_9, sn_14] indicates that the trend depends on the specific engine. some engines at the end of life tend to increase in these two columns while others tend to decrease. what is common about these two sensors is that the magnitude at the end life gets amplified.
it is good to see that all other columns show an apparent trend as the fault propagate throughout the engine cycles and cause them to fail. [ this helps the model that will try to use the data to predict RUL :) ]

sensors remaining for analysis after considering trends in the time series plot 
['sn_2' 'sn_3' 'sn_4' 'sn_7' 'sn_8' 'sn_11' 'sn_12' 'sn_13' 'sn_15'
 'sn_17' 'sn_20' 'sn_21']

Implementation :

Linear Trending:

what I want to check right now is the trend of the remaining sensors as a function fo operating cycles. the higher trend probably means better predictability of events at the end of life for the engine.

the trend is found simply by fitting a linear model to each sensor. I order then the sensors based on their absolute value of the linear slope. of course, the sensor values have been normalized before doing the linear regression!

the order of trend slope magnitude is 
['sn_11' 'sn_4' 'sn_12' 'sn_7' 'sn_15' 'sn_21' 'sn_20' 'sn_17' 'sn_2'
 'sn_3' 'sn_13' 'sn_8']

Dimensionality reduction PCA:

I want to reduce the dimensionality of the problem more. so let’s check the how much each of the first few Principle Components PC explain about the variability in our sensors

first PC: 74%

second PC: 4.1%

third PC: 3.5%

now using a subset of the data that has only the highest 6 sensors in terms of the linear trend slope, let’s do PCA again:

first PC: 81.6%

second PC: 5%

third PC: 4.4%

based on the analysis of linear trend, the top 6 sensors are chosen based on the magnitude of their linear trend, i.e. the magnitude of their linear regression slope. it looks that based on these 6 sensors, taking 3 first principle components s captures about 90% of the data variability. hence the further reduction in dimensionality comes at a low loss of information.

Summary of data exploration and dimensionality reduction:

the sensors that do not change with time ( do not have variation with engine operational cycles) are dropped since they do not offer any information toward prediction the end of life
the sensors that do not have apparent trend (looks like noise only, or do not have a trend toward the end of life) are dropped as well. this contains the sensors that behave differently for different engines ( since these will confuse the learning algorithm and can cause large testing errors since their behavior are not universal concerning all engines)
based on linear regression of the remain sensor data with RUL, the highest 6 sensors in terms of the absolute values of the slopes are kept only. these sensors change predictably at the end of life for the engines.
further, reduce the dimensionality by taking the first 3 principal components for the data

the remaining 3 components of the data will be fused to make a Health Index (HI) function with RUL for each engine

Fusing Sensors:

to create a fused health index (HI) sensor, first, we need to train the extreme data ( data at the beginning of the engine cycle and at the end of engine cycle life ).

beginning of life gets values of 1, while the end of life gets values of 0.
so the model takes the sensor values and finds a fused signal that gives the health indication HI.

I’ll call engine with prefect health as the one which has (RUL_hgih=300) or more cycles until its failures. zero health is considered as the last (RUL_low=5) cycles of each engine operation.

the above figure shows the results of fusing all sensors into one signal, the Health Index (HI). the result above is for a representative engine. notice that the HI can be obtained by various fusing technique, here I’m showing only fusion using linear and logistic regression model.

notice how logistic regression give value strictly between [0,1] while the linear regression model does not guarantee that.

it was decided to use the linear model to fuse the signal in the rest of the analysis

since the HI is noisy, a single smoothing will be applied Savitzky-Golay (Sav_gol) filter is used in the remaining analysis

It is worthwhile to mention that there is engine-to-engine variability in their respective HI. this is, of course, expected since the engines have variability in their original unfused sensors. see below figure

Fitting the Model:

the way we are going to use the HI in the model is to create a simple exponential model y = a[exp(b*t)-1] = HI for each engine and then use this HI exponential fitted in predicting RUL using similarity between the fitted HI and the raw HI for new engine. the plot below shows the curve for exponentially fitted HI for a representative engine along with the raw HI and the filtered HI

Summary of the training model above:

the training model of the algorithm is just to find the health index(HI) for a given engine from its time series sensor. the training outcome is an exponential model that gives the HI for an engine as a function of its previous operating cycles ( its remain life RUL) the steps can be summarized as follows:

after removing the sensors that have low variance or no definite trend toward the end of life among all engines, we will end up with a sensor data 𝐱 =(𝑥1,𝑥2,…,𝑥𝑚) where 𝑚 is the number of sensors (columns) remaining after data cleaning
normalize 𝐱 using StandardScalar to get 𝐱_𝑛𝑜𝑟𝑚
find the linear trend (slope) of each 𝑥𝑖xi with the RUL (engine operating cycles)
from 𝐱𝑛𝑜𝑟𝑚, take a subset of 𝑟 sensors which have 𝑟 highest absolute linear slope, the new subset is 𝐱𝑠𝑙𝑜𝑝𝑒
perform PCA on 𝐱𝑠𝑙𝑜𝑝𝑒 to reduce dimensionally to space of 𝑛 columns. this last sensor subset is called 𝐱𝑝𝑐𝑎

using 𝐱𝑝𝑐𝑎 , a linear model is used to fuse the 𝑛n sensors together to produce a one-dimensional sensor 𝑦𝑓𝑢𝑠𝑒𝑑 notice

the linear model is obtained by taking a subset from 𝐱𝑝𝑐𝑎 which corresponds to the operational cycle larger than 𝑅𝑈𝐿ℎ𝑖𝑔ℎ and smaller than 𝑅𝑈𝐿𝑚𝑖𝑛. sensor values with cycles larger than 𝑅𝑈𝐿ℎ𝑖𝑔ℎ get mapped to 1 ( because they have a nearly perfect health) whereas sensor values with cycles smaller than 𝑅𝑈𝐿𝑚𝑖𝑛 get mapped to 0 since they almost have no remaining life. notice that the subset is taking from all engines so this fused model is a global model for the whole engine set.

𝑦𝑓𝑢𝑠𝑒𝑑=𝜃^𝑇 𝐱_𝑝𝑐𝑎 +𝜃0 where 𝜃 is the results of linear regression model described above.

now the sensor values for all engine at each engine cycle can be fused and converted to 𝑦𝑓𝑢𝑠𝑒𝑑 by applying the linear model.

finally, the values obtained for 𝑦𝑓𝑢𝑠𝑒𝑑 for each engine is molded as an exponential curve that gives a model for the health index 𝐻𝐼 as a function of engine cycles for all engine in the data set

predicting new engine RUL — lookahead process

what comes next is the algorithm for predicting RUL for a new engine based on sensor values that have operating cycles less than the full life-cycle.

I’ll use mainly interpolation scheme to compare the new engine 𝑦𝑓𝑢sed with the library of health index models just obtained from training.

Predicting new engine RUL:

Inputs:

the prediction of new engine RUL takes as input:

1. a fused sensor with observation equals the current life cycle for the new engine
2. fitted exponential model of the HI for all trained engines

Find/compare the similarity
the model then tries to compare the observation of the fused sensor for the new engine with the HI models (exponential fitted curves) in the model library from the training stage.

1. basically, the model finds the sum of squared differences (SSD) between the fused sensor and each fitted model in the library and defines this SSD as the similarity measure between the fused sensor and each engine model in the library.
2. since normally the new engine fused sensor total observations (M) which are less than the model library engine observations (Ti), the model also shifts the fused sensor curves with time (by cycle gap parameter) to find several SSD for each engine model. the number of SSD per training engine depends on the difference between the fused sensor and training engine observations (Ti — M)
3. the minimum of SSD from each training engine along with the respective estimated RUL for the new engine is obtained.
4. the model now has N estimated RUL and associated N SSD that need to be fused to give the final RUL for the test engine. N here is the total number of engines in the training set

RUL fusion and final model output

the N estimated RUL are reduced to N_red RUL. the reduction happens by eliminating RUL predicted from engine model with high SSD ( low similarity) and by consideration related to the statistics on the training set (simple outlier removal)

N_red RUL then gets fused into one RUL which is the model ultimate output.

the way to do this is by using a weighted average of the N_red RUL or taking the median (Method 1) of RUL. three different weightings are used here:

1. weighted average based on SSD, higher weights for low SSD (Method 2)
2. min max weighted where only the maximum and minimum RUL out of N_red are weighted together using higher weight for the low RUL since it is more conservative and this is preferable (Method 3)
3. simple arithmetic mean, where all weights are equaled Method 4)

Model Evaluation and Validation:

For RUL estimation, an early prediction is preferred over late predictions. Therefore, the scoring of the model is asymmetric around the true time of failure such that late predictions were more heavily penalized than early predictions. In either case, the penalty grows exponentially with an increasing error. The asymmetric preference is controlled by parameters a1 and a2 in the scoring function given below

It must be noted that predicting farther into the future is more difficult than predicted at a time closer to the end of life. this means that it is expected that the accuracy of the model be worse for cases where the actual engine health state is very high (engine has a long time before the fault develops into major failure)

from the above figures showing the penalty score (lower values for penalty score are better in terms of accuracy) versus the percentage of the observation from total data for the four different methods, we can make the following comments on the final model performance to new engine data:

Method 2 ( fusing RUL based on SSD weighted average) is the best in terms of penalty score. the method performs exceptionally great even from a low percentage of used data ( i.e. predicting more in future)
all methods seem to be very similar when the percentage of used data is high, i.e. when the test engine current operation cycles are close to the total life of the engine. this is expected since predicting for few cycles in the future is easier and extrapolating for large cycles into the futures.
smoothing the test data before feeding it to the model seems to have very little effect. however, smoothing the data gives a better score for all methods and percentages.

Summary of the end-to-end problem solution:

problem:

given run to failure measurements of various sensors on a sample of similar jet engines, estimate the remaining useful life (RUL) of a new jet engine that has measurements of the same sensor for a period of time equal to its current operational time.

solution:

for each run to failure measurements, take sensors that have a predictable, high trend toward the end life cycles
using these sensors, make a model that will fuse these sensors to give a virtual one-dimensional health index (HI) that varies from roughly 1 at begging of cycles up to 0 near the end-life.
model and store these HI for each available run to failure engine and call HI the “model”
using new engine measurements, fuse its measurements to make a fused HI.
compare this HI with the HI models exist in the model library
find RUL of the new engine based on RUL of the model engines.

Improvement:

Code structure improvement

creation of a transform pipeline that takes the input sensor and gives as the output the fused HI sensor
create/inherent sklearn estimator and make the fitting and prediction methods here into one estimator for ease of use and for the purpose of using proper cross-validation for
make the selection of sensors an automatic process based on condition internal to the data

Algorithm improvement

use different than a linear regression model to fuse the sensors
instead of cooking-up a one-dimensional health index, using impending or Neural Network Models that are trained directly to predict RUL based on input sensor measurements. this option seems promising and should give better accuracy, but requires more computational resources, but eliminates the feature engineering process and could generalize better
use cross-validation to select various model parameters instead of using heuristics. these improvements need to crate custom estimator and custom transformer first

References:

[1] T. Wang, J. Yu, D. Siegel, J. Lee, “A similarity-based prognostics approach for remaining useful life estimation of engineered systems”, Proc. Int. Conf. Prognostics Health Manage., pp. 1–6, Oct. 2008

[2] Chao Hu, Byeng D. Youn, Pingfeng Wang, Joung Taek Yoon,
“Ensemble of data-driven prognostic algorithms for robust prediction of remaining useful life”, Reliability Engineering & System Safety, Volume 103, 2012, Pages 120–135, ISSN 0951–8320, https://doi.org/10.1016/j.ress.2012.03.008.