As the advent of Industry 4.0 has led to the increased adoption of sensors and other advanced monitoring tools in manufacturing facilities, the ability of companies to apply advanced algorithms to the data generated by such devices has enabled them to improve their manufacturing processes and solve numerous business problems, from poor product quality to the low efficiency and frequent failure of the machines themselves. Indeed, one area that has gained traction in recent years — and is seen as a panacea by maintenance personnel — is predictive maintenance, whereby data is used to predict the failure of machines well in advance so that proactive corrective actions can be taken.
In order to make accurate predictions around potential machine failures, companies need the most accurate data possible. But tabular data isn’t always available. Instead, maintenance personnel across industries can incorporate thermal data by way of image processing to make predictions around their facilities’ machines. They can even use it to augment the accuracy of information generated by other predictive maintenance models.
The power of indirect measurement using imaging
As the table below makes clear, there are three main sources of data that can be used in a manufacturing environment:
In manufacturing facilities, it is hard and sometimes even impossible to directly measure the equipment being used, especially that of the core machines. One reason is accessibility: the working temperature of equipment can often be over 1000 degrees Celsius, and virtually no measuring device or sensor can function in such temperatures, at least not sustainably. The other reason is economics. Production requirements make it almost impossible to stop the equipment and directly measure key parameters at a high frequency. In most process manufacturing, stopping one piece of equipment means stopping the whole production line, especially when it’s a core piece of equipment.
Companies do, of course, perform regular checks on their manufacturing equipment in order to measure key parameters, but the frequency is typically low and as such, far from sufficient. Moreover, the frequency of checks suggested by equipment manufacturers don’t always match the frequency required, especially when the working condition of the equipment in question is different from what operating manuals prescribed by OEMs take into consideration. Without the ability to perform direct measurement, identifying when the equipment needs to be stopped for maintenance so that there are no unplanned disruptions would seem impossible. But in fact, thermal imaging provides a way to perform indirect measurement that in combination with advanced data analytics will provide all the predictive maintenance information needed to avoid unnecessary shutdowns.
Imaging is a common measurement tool used for thermal equipment, such as steam boilers, heat pipes, kilns, steel furnaces, etc. The images produced are typically thermal- or heat-related in nature, which means they feature relatively low resolution and a high degree of noise. They cannot be used to diagnose the core equipment’s overall health; rather, they will provide a measurement of the equipment — primarily its core system — through a monitoring system. Moreover, given that no image capture device is able to work in temperatures that average over 1000 degrees Celsius, the acquired image is typically a transferred measurement. For example, thermal imaging of a kiln will measure the temperature of its outside body, which itself may be affected by the temperature of the surrounding air. Another example would be heat pipes, the images of which would be captured across large distances and contain thermal light-scattering noise. All of which makes processing the images and applying advanced data analysis to them to enable predictive maintenance especially challenging.
The role of image processing
Image processing plays an important role in utilizing the equipment’s data for predictive maintenance, as that’s how the heat-related information is extracted. Image processing typically consists of the following tasks:
a) Reading the image and data in terms of their RGB dimensions
b) Extracting line detections and regions of interest (ROIs)
d) Value-mapping temperature and RGB values
e) Validating the processed data
After processing, the RGB image is converted into a 2D matrix that corresponds to the measuring area, such as the surface of a steam boiler or the expanded cylindrical surface of a kiln. This 2D matrix will then be aligned in space (which requires the anchor point) and time (can be per hour or per day) for further analysis.
ROI extraction and denoising
Extracting regions of interest, or ROIs, is common in image processing, as there are always numerous areas of an image that are irrelevant and in fact might introduce noise during processing. The ROI can be a predefined area within a given X, Y axis — the most common method, as ROI is captured through a fixed image acquisition device — or one identified using a boundary detection algorithm. Manually inputting the X, Y axis also works well. An example of extracted ROI is shown in the figure below.
Another important step in image processing is denoising, where random variations of brightness or color information in images are treated as noise. There are two common denoising methods:
1. Filtering using Fourier transform, a 2D Gaussian filter, wavelet transform, etc. The idea behind filtering is to separate the signal from the noise in different frequency domains, ranging from Fourier transform to more advanced domains such as wavelet transform, or even using the average to remove the impact of noise (instead of the noise itself) if the noise is white noise. When the filtering method might blur the image or introduce an artifact, removing the impact of noise on an image is as beneficial as removing the noise itself.
2. Designing the sparse representation of the image in transform domain, such as with an auto-encoder model. The idea here is to identify the fundamental model of the image by minimizing the difference between the reconstructed image and the original one. This method assumes that the noise is white noise. It’s used in pattern detection and recognition applications, and is less common than the filtering method.
Sometimes there are artifacts that need to be removed from the acquired image; for example, the anchor area which allowed for the position of the image to be manually adjusted. That’s because we need the precise RGB value on each pixel in order to convert that value to temperature. We have to not only detect the shape in the image (such as the shape of the high temperature area) but also the absolute value of the temperature in the entire captured image. This cannot be accomplished using denoising or image processing; it requires detecting zones and interpolating the image before denoising or image processing begins.
For example, the anchor area can be a triangle or triangular-shaped box within the image with a highly contrasted RGB value compared to the neighboring area, for high visibility. The area will be movable, meaning that the anchor can be in a different position in the image during each instance of image capture. A shape-detection algorithm with manual support will be necessary in this case, as we do not want to have to perform too much advanced image processing work. The interpolation part is also important as it will help us identify the “right” RGB value of the pixel and the temperature in the “noisy” area. The nearest neighboring linear or quadratic interpolation can then be applied depending on the shape and size of that area. For example, it might need linear interpolation for a triangular-shaped area but wouldn’t need one if it’s a triangular-shaped area that includes a thin border.
Temperature extraction by vector mapping
We’ve discussed thermal image creation, ROI extraction, and denoising, all of which falls into the category of classic image processing work. But in order to identify changes in temperature and how they differ from the image, we need to go a step further, to identify the vector mapping between the RGB value and the corresponding temperature. This is no trivial task, as the change in RGB value is not proportional to the change in temperature. A typical vector would be the temperature from high to low with the RGB color ranging from yellow to dark red, as shown in the figure below.
The value in three of the channels (red, blue, and green) varies from 0 to 255, and the RGB vector determines the color each one of them displays. Classic color interpolation methods like image scaling do not work here, however, as the full range of the mapped vector is incorrect. You can see this by comparing the reference temperature value with the corresponding color; the cyan in the color bar corresponds to 420 degrees Celsius, for example. Instead of using interpolation across the full range of colors, a more practical way is to apply it within each temperature range and corresponding color, such as where the temperature ranges from 420 to 495 degrees Celsius and the corresponding color goes from cyan to dark and then to light yellow. Linear interpolation in most thermal images should be fine, depending on the image resolution and the value granularity.
Validation and large-scale data processing
By taking the image processing and value extraction steps outlined above, we are able to process all of the acquired images and build the historical dataset. Though while the proposed method can be used for thermal data extraction, its results will be more reliable and robust if there is any reference temperature data that can be used to validate them. Validation data can include the calibration image and the extracted temperature or the discrete values (minimum, maximum, mean) from the thermal image.
When processing historical images, such as to obtain training data, it’s important to check for any degradation or recalibration of the image acquisition device. Both degradation and recalibration will shift RGB values, rendering them inconsistent. In that case, we might need to update the temperature vector for the whole image according to the change in RGB value.
Feature engineering plays an important role in image-based detection, recognition, and classification. When deep learning and convolutional neural networks became popular, the feature engineering traditionally used in imaging was replaced by the convolution kernel and designed networks. That does not, however, mean that feature engineering is no longer important in image-based predictive maintenance. In fact, proper feature engineering can be a critical component of predictive maintenance if the features are designed properly. Some typical image features include:
a) General shape, such as edges, corners, blobs, and ridges
b) Deformable, parameterized shapes
c) Color and texture
d) Local features using the Scale-invariant feature transform (SIFT) algorithm (“SIFT features”)
In thermal image processing-based predictive maintenance, the image correlates strongly with the physical phenomena and rules behind the image. For example, a particular shape seen in a high temperature area might correspond to the failure of a core part. In addition to the features of the image, features based purely on extracted temperature such as temperature trend can provide insight into the degradation of equipment (see table below for more details).
Augmenting features with operations data
More data is always beneficial, especially when it provides additional descriptive information. In thermal imaging-based predictive maintenance analytics, augmenting the temperature feature with operations data is always a good choice. The basic idea is that the operations data can be linked directly with measurement data (e.g., temperature data) by increasing the data dimension or using it as a conditional parameter to set up different operational conditions. The operations data provides a unified view of the entire piece of equipment or even the system as a whole, where the local measurements can be information-rich. Combining these two kinds of data requires careful design and knowledge of the physical system as well as of the daily operations.
During data augmentation, we typically augment measurement data (e.g., temperature or other features) with operations data instead of vice-versa. That’s because the operations data is at the equipment level whereas the temperature feature data includes value ranges and failure behaviors from various parts of a piece of equipment or system, such as spatial features and temporal-spatial features. If we’re using the Bayesian model for our diagnostics and classification, we want to use operations data. If we’re using a decision tree model such as random forest or GBDT, we can feed the model with both measurement and operations data.
Data integration and utilization during modeling
Different operations data will include varying time granularity, which means we might need to aggregate the data when integrating it with measurement data. The time granularity provides another way to filter out irrelevant parameters.
Different stages of data integration typically include the following operations parameters:
a) Feeding stage: speed, volume/capacity, patterns (continuous or periodic) and time intervals, material weight/density, key material ingredients levels (may come from offline test), etc.
b) Processing/interacting stage: pre-heat temperature and heating time (if pre-heating stage is applicable), rotating speed and motor current (for rotation equipment), fluid speed and pressure (for liquid-related equipment); also heating supply (fuel, gas, heavy oil, hydrogen, and oxygen) volume and speed, which indicate the potential for overheating during daily operations, and vibration, which can be deduced from correlated parameters such as the difference in current/speed between a pair of motor drivers
c) Pumping out/extraction stage: feeding and rotation speeds typically determine the speed of product output in a kiln or blast furnace; supporting operations such as cooling with a heat air exhaust fan, using a dust exhaust filter and fan or a negative pressure pump in the guiding groove can happen at this stage, the parameters of which can be important; for example, molten metal kept in torpedo cars at a high temperature over a long period of time may damage the equipment, significantly reducing its lifespan
Processing the parameters using a larger sample rate than that of the measurement sample used during integration is a straightforward process. The challenge comes from any fast-changing parameters, especially when they are control parameters, such as the feeding speed, pressure, etc. Small changes to the pattern of operations could have a significant impact on the status of the equipment. There is no general rule that can be applied to guarantee alignment between a high-frequency signal and a slow-moving measurement. System degradation is not a fast process, but outright failure can happen quickly. Using the average value of the high-frequency signal during the sample measurement period is one option. Paying attention to any accidents that impact the operation is also important, especially when using the operation parameters as a condition.
Accuracy of the models
In general, predictive maintenance utilizes machine learning technology by applying the analytic model to evaluate the condition of equipment using offline or online monitoring. Whether to choose the offline or online monitoring method depends on the system response time and maintenance requirements. For example, in most thermal-related field scenarios, an offline model might satisfy the maintenance scheduling requirement. The ultimate goal is either to cost-effectively perform the proper maintenance at a scheduled point in time or be warned of a system failure far enough ahead of time to be able to respond effectively.
In process manufacturing, it is critically important to have a predictive model that warns of potential system failure ahead of time. This ensures that the manufacturing plan can be adjusted and any necessary maintenance can be properly conducted. But choosing the right predictive model can be a challenge.
Model design and implementation
A diagnostic challenge in predictive maintenance is separating abnormal data from normal data, or precisely identifying which data is abnormal based on solid assumptions. The degradation of mechanical or electrical equipment is an evolving process, which suggests that failure does not happen in an instant. We can then assume that the measurement of key parameters in N days before the system failure — which typically appears as a sudden incident — yields abnormal data. The selection of N could be based on domain knowledge or inferred from predictive maintenance requirements using feasibility adjustments.
We’ll then want to train and test a classification, which can be binary or multi-class. Given the lack of rich abnormal data, failure events are unlikely, so binary classification will offer an acceptable level of diagnostic accuracy and robustness. There are a bunch of binary classification models that can be used; our suggestion is that if the parameters aren’t highly correlated, gradient boosting decision tree (GBDT) or Xgboost is a good option. If the amount of failure data is sparse and the parameters are high dimensional and correlated, a support vector machine (SVM) might work well. When the data is rich, a recurrent neural network model can be very effective in identifying the failure pattern and yielding accurate diagnostics.
In thermal imaging — especially when it comes to temperature-related measurements — we need to be on the lookout for false alarms created by the temperature radiation of neighboring or shared areas. Such radiation will render the temperature feature misleading, which can jeopardize the feature engineering and any related training. It will impact all temperature-related features.
We can manually tag the data points with any joint impact from neighboring areas during training to avoid introducing noise into the normal data set. But since we’re not always aware if such joint impact has taken place, a practical suggestion might be to apply a moving window on the spatial feature, which can be one-half or one-third of the width of the ROI, and get the ensemble result for each block to generate the feature. This will smooth the high variance data point but preserve the prediction for potential system failure. However, it will also reduce the sensitivity of the model, but the model can be fine-tuned during training to boost the robustness of its accuracy.
Performance gains using augmented data
While introducing operations data can result in misleading dimensions in modeling, primarily in diagnostics, various operations conditions may exaggerate the degradation of the system, which might enhance the trend feature and improve the accuracy of the diagnostics. For example, the concentration of heat or pressure in the leak gap of a steam pipe may yield an increasing temperature and regression coefficient.
Augmented data will improve performance accuracy; the accuracy of ROIs will improve depending on how sensitive they are to various operational conditions. When the baseline accuracy of performance is roughly 65% in detecting failure without augmented data, adding augmented data typically improves that performance between 10% and 15%. Generally speaking, diagnostics accuracy around system failure in N days (a typical N in process manufacturing is seven or 14 days) is about 90%-95% when detecting normal performance and 80% when detecting abnormal performance.
Moreover, utilizing augmented data might be helpful when it comes to identifying operational parameters for failure such as rotation speed, motor driver current, pipeline pressure, etc. This information can be useful for performing subsequent maintenance as well.
Use Case Examples
Combining thermal data with other sources of data can increase system accuracy, thus providing more accurate and timely information on potential equipment failures. In our experience, the approach and methodology highlighted in this article can be applied across industries. Two examples of use cases where combining thermal imaging with operations data to provide a more comprehensive view on the probability of failure are:
Torpedo ladle car in steel manufacturing: During the steel manufacturing process, torpedo cars are used to transfer molten metal from a blast furnace to the steel-making area. The inner section of the ladles are refractory-lined to protect the outer surface of the ladle cars. Ladle cars are also used for desulphurization, where sulfur reagents are added to molten iron in order to reduce the amount of sulfur that’s converted to slag. The violent chemical reactions that occur as part of this process have a significant impact on the refractory lining of the ladle, and damage to a ladle car can have serious consequences — from the loss of materials to the loss of human life. As such, maintenance teams monitor the health of the refractory lining using thermal imaging cameras, which provide continuous information on the outside temperature of the ladles. The thermal images can be used to extract features that can help predict leakage of the torpedo ladle car well in advance so that maintenance can be carried out proactively. Features from the thermal images can also be combined with operations data in the ladle such as grade sequence, grade chemistry, and retention time to predict the amount of refractory lifespan that remains.
Rotary kiln in cement and pulp manufacturing: Using lime kilns in cement and pulp manufacturing helps to aid the chemical recovery process and reduce the amount of calcium carbonate that turns into calcium oxide using heated air and rotation. The outer metallic tube is insulated with special brick layers, but sudden peeling of the brick face can expose the tube to extreme temperatures and cause it to rupture, resulting in significant losses. As such, manufacturing organizations typically deploy an IR imaging system, which provides a 360-degree profile of the lime kiln’s temperature. However, imaging systems have two main limitations. First, shift operators visually monitor their temperature profile to detect any abrupt shifts. But gradual upward shifts in temperature cannot be identified visually, which leads to delays in any subsequent actions taken by maintenance personnel. Second, these systems are usually deployed as standalone systems that do not interact with other Level-2 systems. That means the effect of other process parameters is not fully understood, such as if a brick peeling is likely to happen whenever one increases the feed rate of calcium carbonate above a certain value. Using predictive maintenance by applying image processing techniques on thermal images to generate temperature features, and then combining these features with Level-2 systems before such features are fed into machine learning models, can help prevent such peeling of the brick face and any resulting losses.
Predictive maintenance has quickly become an indispensable tool in manufacturing facilities, enabling organizations to avoid what otherwise would be costly machine failures and related manufacturing disruptions. However, leveraging tabular data to predict potential failures isn’t always an option. But thermal imaging data can be used to provide timely, accurate information that allows maintenance personnel to avoid any unnecessary machine failures, disruptions, or related costs.