Computer Vision for Smart Maintenance

Savina Diez
Data & Smart Services
5 min readMar 10, 2020


Support during the cleaning process of a paint spray gun

Complex tools and their maintenance procedures can be challenging for the respective end users. Think of the work of a craftsman — He must know how to use the respective tools, but not necessarily how to repair them.

But not only repair processes, but also the cleaning or inspection process can fall under the maintenance effort of a tool. Instructions, for example in the form of a video tutorial, can support the user with these procedures — but they always have a linear sequence and cannot address specific problems. In many cases, only the advice of a professional can help.

The use of Machine Learning to cope with complex tasks and for automation has already become established in various areas. Therefore, looking at the various possible applications of Machine Learning, the following question arises:

Can personal assistance from a professional be replaced by the use of an Artificial Intelligence?

Seeing Machines

A machine with the same qualities as a professional must be able to observe the scene and interpret what is seen. These capabilities are today summarized under the term Computer Vision.

Computer Vision is already in use in a wide variety of areas. At Frankfurt Airport, for example, a face recognition system replaces passport control by officials. This not only automates work that is monotonous for humans, but also eliminates human error.

Also, extremely elaborate processes, such as counting cars on a certain section of road, can now be passed on to “seeing” machines. This opens the possibility of monitoring the traffic flow of an entire city, so that in future, intelligent infrastructure decisions can be generated much more efficiently.

The logical next step should therefore be to make even monotonous maintenance procedures smart. Speaking of supervising a maintenance process, this requires a camera that films the user during the process, intelligent processing of the images and a user interface that displays the appropriate instructions.


In this article, the object to be maintained is referred to as a “tool”. It is assumed that it consists of removable individual parts that are relevant to the maintenance process.

Just like a professional, the system must have the following characteristics:

  • Knowledge of the tool and its components and their correct arrangement
  • Knowledge of the sequence of maintenance steps
  • The ability to give concrete instructions for action, based on observations

Based on these required capabilities, the following core components are defined:

Trained Machine Learning model to recognize the individual parts of the tool (object detector)

The prediction of the individual parts in the image is to be made with the help of Artificial Intelligence. This means that a Machine Learning model is needed, which has learned how the components of the tool look like. Since this model can only be adapted to a specific object, a new model must be trained for each tool.

Status recognition system to interpret what has been seen

In the rarest cases it will be enough to detect the objects in the image — their presence must be interpreted in some way. The complexity of the status detection systems depends on the maintenance process. For each process, a process description in table form must be created, which the system uses to inform the user about the next step.

User Interface

The user interface of the application should at least contain the instructions that the user needs to perform the process. It is also useful to show feedback about the detected objects. Buttons for navigation in the process and for user feedback (whether a detection was correct, for example) could also be useful.

Flow of a Support Procedure

The minimum procedure consists of the following steps:

  1. The user holds the tool in front of the camera. The image from the stream (one frame at a time) is sent to the object detector.
  2. The object detector identifies the individual parts of the tool. To do this, it uses a trained Machine Learning model that knows what the parts look like.
  3. The existing parts and their positions in the image are forwarded to the status detector.
  4. The status detector knows the current maintenance status. Based on the process description, it checks whether a status change should take place.
  5. The status detector sends the current advice text to the user interface.
System overview


The most prominent architecture for Machine Learning-based image classification are Convolutional Neural Networks (CNNs). They are characterized by their arrangement of special layers, in each of the so-called “feature maps” are created from the input data. The principle of operation of a CNN is based on pattern recognition, whereby each deeper layer can recognize more complex patterns than the previous one.

To make the CNN training prozess more efficient, Transfer Learning was used. Transfer Learning refers to the adaptation of an already pre-trained artificial neural network to a new task. For this purpose, certain variables of the network can be loaded and further trained or adapted to new objects. To perform Transfer Learning, the new data (images of the parts) with their annotations (positions of the parts in the image and their identifiers) are required. During training, you can decide whether all variables of the net or only the final layers for classification should be further adjusted. For a net that can already distinguish many different objects, the adjustment of the final layers is already enough.


Basically, the conditions are fulfilled to replace the professional by the presented supporting system using Computer Vision.

It remains to be worked out how complex the customization is in practice. A suitable method is required to be able to adapt the system (also as a computer vision layman) to the own tool.

It is crucial that the user interface is well designed. The user must be able to clearly recognize which step he should take next. For him to be able to concentrate on the maintenance procedure, as little interaction as possible should be required of him. Nevertheless, it makes sense to integrate user feedback and the possibility of manual navigation in the process.

Finally, the system could be integrated into an Augmented Reality Head Mounted Device (HMD) to make the repair, cleaning and inspection of complex tools even more convenient.