The New Role of Maintenance

The role of maintenance is an evolving story.

Initially it was looked at as a “necessary evil.”

So, maintenance was done when things broke. The mentality was “fix it when broke.” And such a strategy was rightly called “Breakdown Maintenance.”

Then during World War II when the US army found that during very crucial moments in a battle their tanks broke down they thought about ways to improve maintenance.

They came up with a maintenance strategy called “Preventive Maintenance” or more popularly known as PM.

That is maintain (replace parts) when the machine is running fine so that during crucial times the machines don’t give way and stop productive activities.

Though it was helpful, people did not know when to schedule a PM or what parts are to be changed. Moreover, changing parts on an fairly arbitrary basis drilled a deep hole in the pockets of industrialists, which they didn’t like much.

With this in mind Condition Based Maintenance (CBM) took birth. The simple idea was maintain a machine when it needs attention. And the way to find out that was by sensing what is going wrong through means of various sensors and human inspection. This was called by different names like CBM, Predictive Maintenance, PdM.

That helped people a lot. They could cut down on costs in terms of spare parts and labor and also prevent secondary failures. People also found that CBM proved to be a rather effective method to come to grips with failures that were random in nature.

Then people had another idea. They thought that “wear” was the primary reason for parts failing. So they wanted to stop accelerated deterioration. Thus TPM (Total Productive Maintenance) was born. This was the first attempt to connect maintenance to business goals by monitoring OEE (Overall Equipment Effectiveness) that linked equipment availability, production capacity and quality of goods produced. The goal was to achieve an OEE of 85% or above through maintenance. As a result an elaborate people oriented system developed over time.

While TPM was being developed another school of thought was taking shape in maintenance. It took birth from the aircraft industry. It was called RCM or Reliability Centered Maintenance. Realizing that more than 68% of failures were random in nature, RCM’s goal was to find in time when a failure would happen and eliminate the consequences of such failures. Their goal was not to reduce the number of failures. This is because RCM looks at the system failures in a static manner; looking at failures one by one in complete isolation of each other. In other words, it fails to see the dynamic interactions within and without the system that creates failures in the first place.

However, these maintenance strategies have helped us to some extent in reducing failures and maintenance costs but we are yet to come to a situation where we can extend the life of machine to a desirable extent. This can only be done when we would be able to eliminate failures during long periods of production runs. Question is — can we have a failure free operation for say one year at a stretch?

But why haven’t we been able to achieve such failure free operation?

The reasons are as follows:

  1. Randomness: We now understand that in manufacturing industries more than 85% failures are random in nature.
  2. Driven by process: As Boeing has now discovered, most of the randomness are process driven.
  3. None of the maintenance methods described above, focuses on the reason of failures considering interdependence, interactions and relationships between different parts of the system and processes.
  4. Along with randomness another major group of failures may be characterized as “early failure” or “infant failures.” Strangely, this this failure pattern is quite pronounced after scheduled overhauls or preventive maintenance (Waddington Effect).

This would give rise to a new way of maintenance that aims to achieve zero failures in a practical manner.

Briefly the process would be the following:

  1. Understand the failure patterns of the machine or system.
  2. Understand the dynamics of the system
  3. Find the root causes of failures
  4. Respond as per the dynamics and the causes to improve operation and maintenance.

NEME can be used as an acronym for this dynamic process, where —

N = Notice the changes

E = Engage with the patterns

M = Mull the interactions and relationships

Ex = Exchange the innovative solutions which would need minimal intervention and minimal resources to extend the life or MTBF (Mean Time Between Failure) of the machine or system. Improving MTBF is the heart of reliability improvement.

This process simultaneously improves Reliability, Availability and Performance of a system and is known as Rapidinnovation.

So the new role of Maintenance can be described as below:

The function of Maintenance would be to eliminate failures and thereby their consequences so as to allow maximum failure free operation for the desirable period of time and as per the requirement of the user at the minimum possible cost.

© Dibyendu De