Part 1: Predictive Maintenance with machine learning

As the industry 4.0 marches into production facilities, some of those ideas are spreading to other areas and could influence us all in a more direct way in the future. When we think of cyber physical systems its usually the automated mechanical arm working in the factory that comes to mind, however many other types of systems can be build. This article is gonna focus on the predictive maintenance systems, and attempt to answer what you need/how to get started.

In order to quantify machine health in any way you need this machine to be generating data about its running conditions. So in short you need to put sensors in your machines to actually measure everything critical to performance of the machine. This is no trivial task, however many places this has been the norm for a while since many control systems depend on these inputs to function. With the machine generating data you can start to save these logs or streams in a database or a data lake, and you create the foundation for analytics since you will have the actual data at hand.

Exploration of data

The data collection will over time become a historical data set, which can be used to derive insights on the various running states of the equipment. Lots of graphs can be generated providing an overview over machine processes, showing various sensor values and their change during processes. This work can be thought of as the exploratory part of the analysis, where the groundwork for understanding the data is done. Here lots of questions will properly appear, and it is crucial to keep the data researcher in close contact with people knowledgeable of the equipment in order to answer these questions. At this point there will properly also be a lot of ‘this shouldn’t happen’ moments, and these insights can be used by R&D.

This data exploration stage is often a complicated one, because the data often is stored away for another purpose than analytics. So first everything will have to be cleaned nicely and put into appropriate tables, perhaps you will need big data solutions here if the dataset is too big. Often there will however be lots of redundant and irrelevant data, so basic exploration plots and similar can typically be done on a single computer (and/or on a subset of the data).

This data exploration and understanding phase forms the foundation of the next part of the analysis — choosing inputs/features for the algorithms. One can select features based on correlations, statistical tests or simply by shotgunning a bunch of features into the algorithm to see what sticks.

Thanks for reading. In the next part we look at the missing pieces for building the model.