Prediction of Remaining Useful Life of an Engine Based on Sensors; Building A Random Forest in Python

18 min readDec 25, 2021

Photo Credit: https://www.researchgate.net/

“The Remaining Useful Life (RUL) is a subjective estimate of the number of remaining years that an item, component, or system is estimated to be able to function in accordance with its intended purpose before warranting replacement.”

Preamble

Can one predict when an engine or device breaks down? This seems a pure engineering question. But nowadays it is also a data science question.

More concretely, it is an important question everywhere engines and devices use data to guide maintenance, such as wind-turbine engines, and rotating machinery.

With regards to human safety and logistic planning, it is not a good idea to just wait until an engine breaks down. It is essential to proactively plan maintenance to avoid costly downtime.

Proactive planning is enabled by multiple sensor technologies that provide powerful data about the state of machines and devices. Sensors include data such as temperature, fan and core speed, static pressure etc.

Can we use these data to predict within certain margins how long an engine will be functioning without failure? And if so, how to do it?

This is the question the concept of remaining useful life (RUL) attempts to answer. It aims to estimate the remaining time an item, component, or system is able to function in accordance with its intended purpose before warranting replacement.

1. Business Understanding

This article shows how to use machine learning in Python Scikit-learn to predict the RUL. It is meant to provide an example case study, not an exhaustive and ultimate solution. There is a lack of real data to answer this question. However, data simulations have been made and provide a unique resource.

I will use a train data that show sensor-based time-series until the timepoint the engine breaks down.

In contrast, the test data constitute of sensor-based time-series a “random” time before the endpoint.

The key aim is to estimate the RUL of the tes tset, that is, how much time is left after the last recorded time-point.

import seaborn as sns
import os
import pandas as pd
import numpy as np
np.random.seed(1337)
from IPython.display import Image
import matplotlib as mpl
import matplotlib.pyplot as plt
from pandas import read_csv
from matplotlib import pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

The following settings will be used to avoid exponential values in output or tables and to display 50 rows maximum:

pd.set_option(‘display.float_format’, lambda x: ‘%.3f’ % x)
pd.options.display.max_rows=50

A simplified line-plot illustrates best the question at hand. Given a fictive temperature sensor, for which we have 8 cycles, are we able to predict the remaining cycles?

The kind of questions that can be addressed for such a time-series:

Can we forecast the temperature of the future time-points?
Can we predict how long the time-series continues until a pre-defined event happens?

A=[60,63,67,74,77,81,82,87,92]
B=[92,99,104,110,116,125]
C = np.append(np.repeat(np.nan, len(A)-1), B)
plt.figure(figsize = (16, 8))
plt.plot(A, color=’red’, linewidth=3)
plt.plot(C, ‘r:’, linewidth=3)
plt.axvline(x=len(A)-1, color=’grey’, linestyle=’:’, linewidth=4)
plt.axvline(x=len(C)-1, color=’grey’, linestyle=’:’, linewidth=4)
plt.title(‘Example temperature sensor’, fontsize=16)
plt.xlabel(‘# Cycles’, fontsize=16)
plt.ylabel(‘Degrees’, fontsize=16)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.show()

2. Data Understanding

At this phase, we’ll do some exploratory analyses and visualizations as a starting point in order to get insights into the characteristics of the data.

2.1. Data Overview

Let’s read the data using Pandas read_csv. I will load a train set and test set, as well as a RUL train set.

train = pd.read_csv(‘RUL_train.csv’)
test = pd.read_csv(‘RUL_test.csv’)
RUL = pd.read_csv(‘RUL_target.csv’)

Let’s inspect the train set

print(train.shape)
print(list(train.columns))

(20631, 26)
[‘unit’, ‘cycles’, ‘op_setting1’, ‘op_setting2’, ‘op_setting3’, ‘s1’, ‘s2’, ‘s3’, ‘s4’, ‘s5’, ‘s6’, ‘s7’, ‘s8’, ‘s9’, ‘s10’, ‘s11’, ‘s12’, ‘s13’, ‘s14’, ‘s15’, ‘s16’, ‘s17’, ‘s18’, ‘s19’, ‘s20’, ‘s21’]

train.isna().sum()

unit 0
cycles 0
op_setting1 0
op_setting2 0
op_setting3 0
s1 0
s2 0
s3 0
s4 0
s5 0
s6 0
s7 0
s8 0
s9 0
s10 0
s11 0
s12 0
s13 0
s14 0
s15 0
s16 0
s17 0
s18 0
s19 0
s20 0
s21 0
dtype: int64

train.head()

train.tail()

Let’s inspect the test set

print(test.shape)
print(list(test.columns))

(13096, 26)
[‘unit’, ‘cycles’, ‘op_setting1’, ‘op_setting2’, ‘op_setting3’, ‘s1’, ‘s2’, ‘s3’, ‘s4’, ‘s5’, ‘s6’, ‘s7’, ‘s8’, ‘s9’, ‘s10’, ‘s11’, ‘s12’, ‘s13’, ‘s14’, ‘s15’, ‘s16’, ‘s17’, ‘s18’, ‘s19’, ‘s20’, ‘s21’]

test.isna().sum()

unit 0
cycles 0
op_setting1 0
op_setting2 0
op_setting3 0
s1 0
s2 0
s3 0
s4 0
s5 0
s6 0
s7 0
s8 0
s9 0
s10 0
s11 0
s12 0
s13 0
s14 0
s15 0
s16 0
s17 0
s18 0
s19 0
s20 0
s21 0
dtype: int64

test.head()

test.tail()

RUL contains the true values for remaining useful life to which we are going to compare our predicted values in the test set.

Let’s inspect the RUL set

print(RUL.shape)
print(list(RUL.columns))

(100, 1)
[‘RUL’]

RUL.isna().sum()

RUL 0
dtype: int64

RUL.head()

In short meaning that the remaining useful life for the first unit was 112 cycles, the second unit 98 cycles, etc.

RUL.tail()

sns.distplot(RUL[‘RUL’],kde=True)

RUL[‘RUL’].describe()

2.2. Checking for Outliers and Flat lines

To know if there are outliers (extreme values) in the data, we could use descriptive statistics, train.describe().T, and see if the min. and max. values are far away from the central tendency. As we can see below, this is not the case for any of the sensors.

train.describe().T

However, if we look carefully, we can see something else that is quite remarkable: there are several sensors where the min. and max. values are identical, and where the standard deviation (std) is zero. In time-series, this is called a flat line, which means there is no activity, possibly caused by sensor malfunctioning.

test.describe().T

The sensors s1, s5, s10, s16, s18, and s19 as well as op_setting 3, will for this reason be removed from further analyses:

train.drop([‘s1’, ‘s5’, ‘s10’, ‘s16’, ‘s18’, ‘s19’, ‘op_setting3’], axis=1, inplace=True)
test.drop([‘s1’, ‘s5’, ‘s10’, ‘s16’, ‘s18’, ‘s19’, ‘op_setting3’], axis=1, inplace=True)

Let’s check out the distribution of the columns in the train set

train.hist(bins=50, figsize=(18,16))
plt.show()

Let’s check out the distribution of the columns in the test set

test.hist(bins=50, figsize=(18,16), color="red")
plt.show()

2.3. Exploratory Analyses of the Maximum Number of Cycles per Unit

Exploratory data analyses provide insight into the engines in action. For example, it would be good to have an idea of the maximum lifetime of the 100 different units.

The bar-plots below show that there is a large variation across units regarding maximum number of cycles, and that, as expected, the number of cycles is shorter for the test set than the train set.

cyclestrain = train.groupby(‘unit’, as_index=False)[‘cycles’].max()
cyclestest = test.groupby(‘unit’, as_index=False)[‘cycles’].max()fig = plt.figure(figsize = (16,12))
fig.add_subplot(1,2,1)
bar_labels = list(cyclestrain['unit'])
bars = plt.bar(list(cyclestrain['unit']), cyclestrain['cycles'], color='blue')
plt.ylim([0, 400])
plt.xlabel('Units', fontsize=16)
plt.ylabel('Max. Cycles', fontsize=16)
plt.title('Max. Cycles per unit in the train set', fontsize=16)
plt.xticks(np.arange(min(bar_labels)-1, max(bar_labels)-1, 5.0), fontsize=12)
plt.yticks(fontsize=12)
fig.add_subplot(1,2,2)
bars = plt.bar(list(cyclestest['unit']), cyclestest['cycles'], color='red')
plt.ylim([0, 400])
plt.xlabel('Units', fontsize=16)
plt.ylabel('Max. Cycles', fontsize=16)
plt.title('Max. Cycles per unit in the test set', fontsize=16)
plt.xticks(np.arange(min(bar_labels)-1, max(bar_labels)-1, 5.0), fontsize=12)
plt.yticks(fontsize=12)
plt.show()

In fact, in machine learning it is considered good behavior to put apart the test set. So from now we do not touch or look at it anymore.

2.4. Visualization of Several Sensors, for a Particular Unit

The following visualizes different sensors of a particular unit. It gives a good impression that all sensors have a different range of values (as they measure different entities as speed, temperature), and that they do not all show an identical pattern or trend. In fact, some do increase in amplitude, while others decrease in amplitude over time.

values = train[train.unit==1].values
groups = [5, 6, 7, 8, 9, 10, 11,12,13]
i = 1
plt.figure(figsize=(10,20))
for group in groups:
 plt.subplot(len(groups), 1, i)
 plt.plot(values[:, group])
 plt.title(train.columns[group], y=0.5, loc=’right’)
 i += 1
plt.show()

We can also show a single sensor for different engine units. This nicely illustrates across units that amplitudes decrease over time and seem to go to a certain minimum threshold of about 551–552.

By the way, the data seem rather noisy and filtering may help at some point.

plt.figure(figsize = (8, 8))
plt.plot(train[train.unit==1].cycles, train[train.unit==1].s7)
plt.plot(train[train.unit==2].cycles, train[train.unit==2].s7)
plt.plot(train[train.unit==3].cycles, train[train.unit==3].s7)
plt.plot(train[train.unit==4].cycles, train[train.unit==4].s7)
plt.plot(train[train.unit==5].cycles, train[train.unit==5].s7)
plt.plot(train[train.unit==6].cycles, train[train.unit==6].s7)
plt.plot(train[train.unit==7].cycles, train[train.unit==7].s7)
plt.plot(train[train.unit==8].cycles, train[train.unit==8].s7)
plt.plot(train[train.unit==9].cycles, train[train.unit==9].s7)
plt.plot(train[train.unit==10].cycles, train[train.unit==10].s7)
plt.plot(train[train.unit==11].cycles, train[train.unit==11].s7)
plt.xlabel(‘# Cycles’)
plt.ylabel(‘Sensor measurements’)
plt.show()

So would it be possible that different units have a similar minimum and maximum value for single sensors? This would make sense if sensors started at a low amplitude (for example a temperature) and went up to a high amplitude over time (or the other way around for other metrics). We could test this for ten units:

minb = train.groupby(‘unit’, as_index=False).min().head(10)
maxb = train.groupby(‘unit’, as_index=False).max().head(10)
mmtable = minb.append(maxb, ignore_index=True)plt.figure(figsize = (12,12))
col = np.concatenate((np.repeat('red', 10), np.repeat('blue', 10)), axis=0)
bar_labels = list(mmtable['unit'])
x_pos = list(range(len(bar_labels)))
bars = plt.bar(x_pos, mmtable['s2'], color=col)
plt.ylim([640, 645])
plt.xlabel('Units', fontsize=14)
plt.ylabel('Measure s2', fontsize=14)
plt.xticks(x_pos, bar_labels, fontsize=14)
plt.yticks(fontsize=14)
plt.show()

The plot suggests that sensors follow a similar kind of pattern for different units.

3. Data Preparation

3.1. Establishing Remaining Useful Life (RUL) in Cycles

It is now about time to determine the remaining useful life (RUL) for the train set, for each row. First, we determine in the train set for each row the maximum cycles for the particular unit. We use the groupby command to obtain for every unit the maximum number of cycles, and in turn use pd.merge to bring these values into the original train set:

train = pd.merge(train, train.groupby(‘unit’,
 as_index=False)[‘cycles’].max(),
 how=’left’, on=’unit’)train.rename(columns={"cycles_x": "cycles",
                      "cycles_y": "maxcycles"},
             inplace=True)

We then determine the time to failure (TTF) for every row, which is the number of cycles subtracted from the maximum number of cycles in a particular unit.

train[‘TTF’] = train[‘maxcycles’] — train[‘cycles’]sns.distplot(train[‘TTF’],kde=True)

train[‘TTF’].describe()

3.2. Scaling

Another preparatory step that is important is scaling. We are going to use the MinMaxScaler in Python:

scaler = MinMaxScaler()

Before scaling, let us inspect the original descriptive statistics. This shows that there are huge differences between multiple sensors in minimum and maximum values, as expected since the sensors measure different entities (such as temperature, speed):

train.describe().T

We first make a copy of the data, so that we have a dataset for unscaled and scaled data:

ntrain = train.copy()

And then we select the data that we would like to scale:

ntrain.iloc[:,2:19] = scaler.fit_transform(ntrain.iloc[:,2:19])

To inspect the scaling, it would be important to see the minimum and maximum value for each column.

ntrain.describe().T

We are going to scale the test data using the scaler settings of the train data.

ntest = test.copy()

It concerns the following columns:

pd.DataFrame(ntest.columns).T

ntest.iloc[:,2:19] = scaler.transform(ntest.iloc[:,2:19])ntest.describe().T

3.3. Visualize the Scaled Data

It is always a good idea to visualize the scaled data. This to ensure that the data look similar after scaling (except from the numbers on the Y-axis, of course).

fig = plt.figure(figsize = (8, 8))
fig.add_subplot(1,2,1)
plt.plot(train[train.unit==1].s2, color=”blue”)
plt.plot(test[test.unit==1].s2, color=”red”)
plt.legend([‘Train’,’Test’], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode=”expand”, borderaxespad=0)
plt.ylabel(‘Original unit’)
fig.add_subplot(1,2,2)
plt.plot(ntrain[ntrain.unit==1].s2, color=”blue”)
plt.plot(ntest[ntest.unit==1].s2, color=”red”)
plt.legend([‘Scaled Train’,’Scaled Test’], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode=”expand”, borderaxespad=0)
plt.ylabel(‘Scaled unit’)
plt.show()

3.4. Fraction Time-to-Failure (FTTF)

Time to failure (TTF) for each unit has a different length, and it would be good to express this in a fraction of remaining number of cycles.

This starts for a particular unit at 1.00, and goes to 0.00, the point where the engine fails (TTFx). It is in fact similar to scaling, but here it is applied at the unit level.

In Python, we can express this in a function:

def fractionTTF(dat,q):
 return(dat.TTF[q]-dat.TTF.min()) / float(dat.TTF.max()-dat.TTF.min())fTTFz = []
fTTF = []for i in range(train[‘unit’].min(),train[‘unit’].max()+1):
 dat=train[train.unit==i]
 dat = dat.reset_index(drop=True)
 for q in range(len(dat)):
 fTTFz = fractionTTF(dat, q)
 fTTF.append(fTTFz)
ntrain[‘fTTF’] = fTTFmx = cyclestrain.iloc[0:4,1].sum()fig = plt.figure(figsize = (8, 8))
fig.add_subplot(1,2,1)
plt.plot(ntrain.TTF[0:mx])
plt.legend(['Time to failure (in cycles)'], bbox_to_anchor=(0., 1.02, 1., .102),
           loc=3, mode="expand", borderaxespad=0)
plt.ylabel('Original unit')
fig.add_subplot(1,2,2)
plt.plot(ntrain.fTTF[0:mx])
plt.legend(['Time to failure (fraction)'], bbox_to_anchor=(0., 1.02, 1., .102),
           loc=3, mode="expand", borderaxespad=0)
plt.ylabel('Scaled unit')
plt.show()

The following plot shows on the left TTF in cycles, for the first 4 units. On the right, the fraction TTF that starts at 1.00 and ends at 0.00. However, the number of cycles that it takes to failure remains identical. Some units have a longer duration than others, as can be clearly seen in the X-axes of the zigzag figure.

Finally, we have the following columns that we select from for training the model.

pd.DataFrame(ntrain.columns).T

X_train = ntrain.values[:,1:19]
X_train

X_train.shape

(20631, 18)

Importantly, for the target variable Y_train, we take the fraction of the time to failure TTFx. The features are the scaled values. Note that the data are in Numpy.ndarray format, which can be checked by type(X_train)

Y_train = ntrain.values[:, 21]
Y_train

Y_train.shape

(20631,)

X_test = ntest.values[:,1:19]
X_test

X_test.shape

(13096, 18)

4. Modeling

4.1. Random Forest Model Fitting

regressor = RandomForestRegressor()
regressor.fit(X_train, Y_train)

RandomForestRegressor()

4.2. Predict Test Score, Fraction RUL from 1.00–0.00

We can now predict the fraction RUL values for the tes tset:

score = regressor.predict(X_test)

Remind that the values Y_train were before transformed to a fraction of remaining useful life, from 1.0 till 0.00, in order to cancel out the possible effect of total cycle duration. So score should also contain values in this range:

score[0:10]

The values are roughly in the range 0–1:

print(score.min(), score.max())

0.016046823266308744 1.0

Finally, we would like to re-convert the fraction of remaining life (0.00–1.00) to remaining useful life as expressed in number of cycles.

For this, we would first need a column with the maximum number of cycles per unit that were in the test set, which can be obtained by the groupby and merge functions (we did this before for the train set as well):

test = pd.merge(test, test.groupby(‘unit’,as_index=False)[‘cycles’].max(), how=’left’, on=’unit’)test.rename(columns={"cycles_x": "cycles", "cycles_y": "maxcycles"}, inplace=True)test['score'] = scoretest.head()

Remind that “test” contains only the unscaled values, whereas for modeling and predicting “score” the scaled features were used.

Second, knowing the predicted remaining life (fraction), we need to estimate the predicted total number of cycles per unit in the test set. This can be done with the following function:

def totcycles(data):
 return(data[‘cycles’] / (1-data[‘score’]))
 
test[‘maxpredcycles’] = totcycles(test)

Last, we subtract the maximum cycles per unit from the predicted total number of cycles in the testset to obtain the RUL, remaining useful lifetime:

def RULfunction(data):
 return(data[‘maxpredcycles’] — data[‘maxcycles’])test[‘RUL’] = RULfunction(test)

From this column RUL in the test set we can reconstruct the remaining useful life at the point the maximum cycle is reached, in each unit:

test[‘RUL’].head()

5. Evaluation

5.1. Predict RUL in Cycles

The following will compute the RUL per unit (based on the maximum cycles) from the RUL column that contains predicted values for each row.

t = test.columns == ‘RUL’
ind = [i for i, x in enumerate(t) if x]predictedRUL = []for i in range(test.unit.min(), test.unit.max()+1):
 npredictedRUL=test[test.unit==i].iloc[test[test.unit==i].cycles.max()-1,ind]
 predictedRUL.append(npredictedRUL)

We had the so-called “zigzag” figure for the trainset. Now we can re-construct and visualize this for the testset, for the predicted and true values. Let us first create a list of values that discounts for every RUL per unit:

xtrueRUL = list(RUL[‘RUL’])
otrueRUL = []for i in range(0,len(xtrueRUL)):
 otrueRUL = np.concatenate((otrueRUL,
 list(reversed(np.arange(int(xtrueRUL[i]))))))xpredictedRUL = list(round(x) for x in predictedRUL)
opredictedRUL = []for i in range(0,len(xpredictedRUL)):
    opredictedRUL = np.concatenate((opredictedRUL,
                                    list(reversed(np.arange(xpredictedRUL[i].item())))))xpredictedRUL1= []
for i in range(0,len(xpredictedRUL)):
    xpredictedRUL1.append(int(xpredictedRUL[i]))  
xpredictedRUL2 = np.array(xpredictedRUL1)xpredictedRUL2 = np.array(xpredictedRUL1)

The following figure shows the ZigZag pattern (first 1000 rows) for predicted and true RUL, and should show quite similar peak patterns:

mx = 1000fig = plt.figure(figsize = (12, 8))
fig.add_subplot(1,2,1)
plt.plot(opredictedRUL[0:mx], color=’blue’)
plt.legend([‘Predicted RUL’], bbox_to_anchor=(0., 1.02, 1., .102),
 loc=3, mode=”expand”, borderaxespad=0)
plt.ylim(0, opredictedRUL[0:mx].max()+10)
plt.ylabel(‘RUL (cycles)’)fig.add_subplot(1,2,2)
plt.plot(otrueRUL[0:mx], color=’red’)
plt.legend([‘True RUL’], bbox_to_anchor=(0., 1.02, 1., .102),
 loc=3, mode=”expand”, borderaxespad=0)
plt.ylabel(‘RUL (cycles)’)
plt.ylim(0,otrueRUL[0:mx].max()+10)
plt.show()

5.2. Comparison Predicted and True RUL

We can compare the predicted with the true RUL values using the following line-plot. Actually, a line-plot is strictly spoken not valid here, since there are no measures in between the units. But it is for visualization purposes: the eye catches quickly that the predicted RULs are often a bit higher than the true values.

plt.figure(figsize = (16, 8))
plt.plot(RUL, color=”red”)
plt.plot(predictedRUL, color=”blue”)
plt.xlabel(‘# Unit’, fontsize=16)
plt.xticks(fontsize=16)
plt.ylabel(‘RUL’, fontsize=16)
plt.yticks(fontsize=16)
plt.legend([‘True RUL’,’Predicted RUL’], bbox_to_anchor=(0., 1.02, 1., .102),
 loc=3, mode=”expand”, borderaxespad=0)
plt.show()

Let‘s inspect the differences in a DataFrame. First, we concatenate the true and predicted RUL:

df1 = pd.concat([pd.Series(RUL[‘RUL’]), pd.Series(xpredictedRUL2)], axis=1)
df1.columns = [‘true’, ‘predicted’]

And compute the difference score, which will show us if more values are positive or negative:

df1[‘diff’] = df1[‘predicted’]-df1[‘true’]
df1.head()

There is a tendency for the RUL test values being overestimated, as we can see in the histogram:

plt.hist(df1[‘diff’], bins=26, color=”pink”, edgecolor=’brown’, linewidth=1.2)
plt.axvline(0, color=”red”, linestyle=’dashed’, linewidth=1.6)
plt.show()

And the below table confirms this as well:

pd.DataFrame({‘Count’: [(df1[‘diff’]<0).sum(), (df1[‘diff’]==0).sum(),
 (df1[‘diff’]>0).sum()]}, columns=[‘Count’],
 index=[‘Smaller’, ‘Zero’, ‘Larger’])

In many data science projects, slight overestimation is not a real problem.

5.3. Regression Metrics

Calculate R squared:

# compute the R Square for model from sklearn import metricsprint(“Random Forest R-squared”, “{:.2%}”.format(metrics.r2_score(RUL, xpredictedRUL)))

Random Forest R-squared 65.77%

So, in our model, 65.77% of the variability in Y can be explained using X. This is not that exciting.

Calculate root-mean-square error (RMSE)

# compute the RMSE of our predictionsprint(“The Root Mean Squared Error (RMSE) is:”,
 round(np.sqrt(metrics.mean_squared_error(RUL, xpredictedRUL)),2))

The Root Mean Squared Error (RMSE) is: 24.31

Our model was able to predict the RUL of every “random” time in the test set (before the endpoint) within 24.31 cycles of the real RUL (using Euclidean distances).

Calculate mean absolute error (MAE):

# calculate MAE using scikit-learnprint(“The Mean Absolute Error (MAE) is:”,
 round(metrics.mean_absolute_error(RUL, xpredictedRUL),2))

The Mean Absolute Error (MAE) is: 18.2

Our model was able to predict the RUL of every “random” time in the test set (before the endpoint) within 18.2 cycles of the real RUL (using Manhattan distances).

5.4. Feature Importance

We have used 18 features (variables) in our model. Let’s find out which features are important and vice versa.

feature_labels = np.array([‘unit’, ‘cycles’, ‘op_setting1’, ‘op_setting2’, ‘s2’, ‘s3’, ‘s4’, ‘s6’,
 ‘s7’, ‘s8’, ‘s9’, ‘s11’, ‘s12’, ‘s13’, ‘s14’, ‘s15’, ‘s17’, ‘s20’,
 ‘s21’, ‘maxcycles’, ‘score’, ‘maxpredcycles’])
importance = regressor.feature_importances_
feature_indexes_by_importance = importance.argsort()[::-1]
for index in feature_indexes_by_importance:
 print(‘{}-{:.2f}%’.format(feature_labels[index], (importance[index] *100.0)))

The most important features are unit, s9, s3, s8, s11 and so on. And the least important feature are op_setting1, s15, and s4, which means that regardless of whether there are in the model or not, that does not matter to the RUL.

6. Deployment

Saving the finalized model to pickle saves us a lot of time as we don’t have to train our model every time, we run the application. Once we save our model as pickle, you can load it later while making the prediction.

import picklemodel = regressor

First, let’s open a new file for our finalized model and call it “fw_model3”

f3 = open(“fw_model3”, “wb”)

Then, let’s save into this file our Random Forest model

pickle.dump(model , f3)

And let’s close this file.

f3.close()

Now, Let’s open a new Python notebook and write

import pickle
f3 = open(“fw_model3”, “rb”)
model = pickle.load(f3)

Now, we can use our trained model to make a new prediction

Summary

So far, the results are quite reasonable for a first model in Random Forest, not denying that a lot can still be improved.

Several factors can still improve the result:

Parameter tuning was not used and may give better results
Cross-validation may control for overfitting
Features may be improved by for example filtering out the noise

About the Author

Roi Polanitzer, PDS, ADL, MLS, PDA, CPD, F.IL.A.V.F.A., FRM, is a data scientist with an extensive experience in solving machine learning problems, such as: regression, classification, clustering, recommender systems, anomaly detection, text analytics & NLP, and image processing. Mr. Polanitzer is is the Owner and Chief Data Scientist of Prediction Consultants — Advanced Analysis and Model Development, a data science firm headquartered in Rishon LeZion, Israel. He is also the Owner and Chief Appraiser of Intrinsic Value — Independent Business Appraisers, a business valuation firm that specializes in corporates, intangible assets and complex financial instruments valuation.

Over more than 16 years, he has performed data science projects such as: regression (e.g., house prices, CLV- customer lifetime value, and time-to-failure), classification (e.g., market targeting, customer churn), probability (e.g., spam filters, employee churn, fraud detection, loan default, and disease diagnostics), clustering (e.g., customer segmentation, and topic modeling), dimensionality reduction (e.g., p-values, itertools Combinations, principal components analysis, and autoencoders), recommender systems (e.g., products for a customer, and advertisements for a surfer), anomaly detection (e.g., supermarkets’ revenue and profits), text analytics (e.g., identifying market trends, web searches), NLP (e.g., sentiment analysis, cosine similarity, and text classification), image processing (e.g., image binary classification of dogs vs. cats, , and image multiclass classification of digits in sign language), and signal processing (e.g., audio binary classification of males vs. females, and audio multiclass classification of urban sounds).

Mr. Polanitzer holds various professional designations, such as a global designation called “Financial Risk Manager” (FRM, which indicates that its holder is proficient in developing, implementing and validating statistical models and mathematical algorithms such as K-Means, SVM and KNN for credit risk measurement and management) from the Global Association of Risk Professionals (GARP), a designation called “Fellow Actuary” (F.IL.A.V.F.A., which indicates that its holder is proficient in developing, implementing and validating statistical models and mathematical algorithms such as GLM, RF and NN for determining premiums in general insurance) from the Israel Association of Valuators and Financial Actuaries (IAVFA), and a designation called “Certified Risk Manager” (CRM, which indicates that its holder is proficient in developing, implementing and validating statistical models and mathematical algorithms such as DT, NB and PCA for operational risk management) from the Israeli Association of Risk Managers (IARM).

Mr. Polanitzer had studied actuarial science (i.e., implementation of statistical and data mining techniques for solving time-series analysis, dimensionality reduction, optimization and simulation problems) at the prestigious 250-hours training program of the University of Haifa, financial risk management (i.e., building statistical predictive and probabilistic models for solving regression, classification, clustering and anomaly detection) at the prestigious 250-hours training program of the program of the Ariel University, and machine learning and deep learning (i.e., building recommender systems and training neural networks for image processing and NLP) at the prestigious 500-hours training program of the John Bryce College.

He had graduated various professional trainings at the John Bryce College, such as: “Introduction to Machine Learning, AI & Data Visualization for Managers and Architects”, “Professional training in Practical Machine Learning, AI & Deep Learning with Python for Algorithm Developers & Data Scientists”, “Azure Data Fundamentals: Relational Data, Non-Relational Data and Modern Data Warehouse Analytics in Azure”, and “Azure AI Fundamentals: Azure Tools for ML, Automated ML & Visual Tools for ML and Deep Learning”.

Mr. Polanitzer had also graduated various professional trainings at the Professional Data Scientists’ Israel Association, such as: “Neural Networks and Deep Learning”, “Big Data and Cloud Services”, “Natural Language Processing and Text Mining”.