Why labelling will help us launch successful predictive maintenance projects
During my university years, we often used predictive maintenance concepts in case studies because it showed such obvious value to the industry. After I started working at Widget Brain, I quickly realised that predictive maintenance in the real world doesn’t really achieve the level of adoption that we all thought was possible a couple of years ago. Actually, I’ve seen less than a handful of projects turn out successfully.
Why do predictive maintenance projects fail?
Now, we can become philosophical about this problem and determine that algorithms cannot replace human decision-making. Or we can blame the lack of data. Personally, I don’t think that either of these reasons are the cause. There is sufficient business value to take human decision-making out of the loop in predictive maintenance, and there is plenty of machine data available, increasing exponentially every day. I think the real problem is the lack of useful data.
What is useful data?
Useful data, in the context of predictive maintenance, is data leading up to and including a fault or failure. Clearly, the industry made sure that machine faults and failures, which we call events in data science, are very rare. However, because of redundancy and engineering marvels, we now have plenty of big machine data but almost no fault or failure data available. Additionally, machines are operated in a dynamic environment for different purposes. This introduces even more noise which makes recording useful data difficult and expensive.
You need sufficient useful data to get an algorithm trained to produce the right accuracy level. As we all know, in predictive maintenance, accuracy is extremely important because the cost of wrong predictions is high.
How to get useful data
Obtaining sufficient useful data, however, is very expensive: both time- and moneywise. That’s because you need to connect a lot of machines and monitor them over a relatively long period to record these events. You also need a log of all events (faults, failures and service jobs), which requires service engineers to log everything in a standardised way, adding the right timestamps along the way. This is all a prerequisite for starting with the actual proof of concept. As you can imagine, this is often a big upfront investment which companies are often not willing to make because the business case is still uncertain. The unwillingness causes only a few projects to be successful.
What is labelling?
In order to make these projects successful, we need a method to create useful data cheap and fast. This process is called labelling. Labelling is an active process of adding context and interpretation to machine data. The goal of labelling is to create a database containing useful data which enables predictive maintenance and performance optimisation algorithms to be trained properly and make accurate predictions.
How does labelling work?
Labelling is a continuous process in which a machine expert looks at data coming in. It gives context to specific parts of the data, which are called patterns. A pattern is a demarcated time series dataset spanning one or multiple sequential observations over one or more variables. The starting procedure of a machine can be a pattern, or an increasing temperature because of too high friction due to too little lubricant. Each pattern occurs for a reason, like a fault or failure, and labelling a pattern allows you to store and recognise patterns and their root cause in never seen before data.
Labelling is a continuous process. Of course, more labels need to be added initially, so the time investment will be high(er) in the beginning. Over time, the database will become more and more valuable, essentially storing all expert knowledge in one centralised place. That means that fewer new labels will appear. The world changes, however. That’s why you should occasionally verify if patterns still match its root cause if you want to make sure the dataset and knowledge base remains relevant and up to date at all times.
I get why successful predictive maintenance projects are scarce. Big investments with little security of high returns is a big hurdle for companies to overcome. To get high returns, you need accurate predictions. To get accurate predictions, you need useful data. And useful data is also scarce, simply because obtaining them takes a lot of time and money which nobody wants to invest since the return isn’t clear — yet.
Like I said, the cycle is vicious. Luckily, labelling allows you to break that cycle in a cheaper and faster way. It allows you to accelerate scaling up to other use cases and machines when your initial proof of concept succeeded and to easily update the knowledge base when needed. Perhaps the best part about it is that your service engineers will have a say in what is useful, and what isn’t. The data will be highly relevant and with the right context, you’re assured that highly accurate predictions will never be a data problem.
If you have any questions or comments, feel free to leave a comment down below or connect with me on LinkedIn.