How to Detect Defects with an Imbalanced Dataset

Published in

OpenVINO-toolkit

7 min readFeb 9, 2023

About the Authors

Paula Ramos, Intel AI Evangelist, America
Zhuo Wu, Intel AI Evangelist, China
Samet Akcay, Intel AI Research Engineer/Scientist

I am always looking for ways to help developers improve their AI applications and create new solutions that solve real-world problems. Recently, at the 2022 Conference on Computer Vision and Pattern Recognition, I presented a number of different tutorials and exercises to help showcase how AI developers could quickly and easily develop performant models for edge applications.

One of the most impressive libraries I found from that presentation was Anomalib, a deep learning library for benchmarking and developing anomaly detection algorithms that can be exported to the OpenVINO™ Intermediate Representation and deployed on Intel hardware.

In this post, I — along with Sameet Akcay, an Intel AI Reacher Engineer/Scientists and one of the owners of the Anomalib repository, and fellow AI Evangelist Zhuo Wu — will explain exactly what Anomalib is, what makes it special, and how developers can get started with it.

So, let us talk about Anomalib. 🙂 — Paula Ramos

Quality control and quality assurance are crucial comments of any business’ reputation and customer experience. For instance, in the manufacturing industry, detecting anomalies on the production line ensures only the best quality products make it out the door. And in the healthcare industry, detecting anomalies early in the medical imaging process helps doctors make accurate diagnosis of their patients.

If something goes wrong in either of these scenarios, it can result in major consequences — which is why many industries are moving away from subjective and error-prone manual inspection and maintenance in favor of automated anomaly detection thanks to advancements in computer vision and deep learning technology.

But for AI to truly enhance the quality control and quality assurance experience, it must be able to leverage balanced datasets. And while there are tons of good data samples available out there today, it’s not always enough to make accurate and effective predictions in the industrial and medical industry. Additionally, the development of large-scale manufacturing and industrial automation has made it increasingly difficult for quality inspectors to handle large quantities of products.

Overcoming Dataset Challenges

Typically, supervised learning-based approaches that leverage sufficient annotated abnormal samples are used to achieve adequate anomaly detection results. But what if the industry norm is an unbalanced dataset that lacks representative samples in the anomalous class? And how do you define the boundary of abnormality when the defect can be any type of shape?

One way to address these issues is through unsupervised anomaly detection, which requires little to no annotation. Unsupervised anomaly detection relies solely on normal samples during the training stage and can identify anomalous samples by comparing them against the learned distribution of normal data.

An example of an unsupervised anomaly detection and localization model is the open-source, end-to-end library Anomalib, which provides state-of-the-art anomaly detection algorithms that can be customized to specific use cases and requirements.

Anomalib in Manufacturing

Let’s look at a production line with colored cubes as an example (Figure 1).

*Figure 1: Defect detection with Anomalib using an educational robot.*

We want to detect any colored cubes with a defect and prevent them from entering the production line. To do so, a camera is installed to monitor the conditions of the colored cubes, and a robotic arm is then acted upon by the monitor (Figure 2).

*Figure 2: Education robot running the inference of the Anomalib models*

For anomaly detection in this scenario, there is no hardware accelerator for us to train the model at the edge. We also cannot assume there are already thousands of images, especially images with defects, collected for training at the edge. Additionally, it is not anticipated that there will be a lot of defects, as is typical in a real-world manufacturing scenario.

Given these initial conditions, one our goals here is to achieve a faster training process at the edge and perform anomaly detection with high accuracy and efficiency. It’s important to keep in mind if there are any external condition changes — such as lighting, camera, or abnormalities — we will have to retrain our model, so a retraining process that does not require a lot of effort will be necessary. Lastly, to make the model useful in real manufacturing use cases, we must guarantee precise results from the inferencing with an anomaly detection model.

With Anomalib’s extensive library, we were able to meet all our requirements by designing, implementing, and deploying unsupervised anomaly detection models from data collection to the edge.

How Anomalib Works

The Anomalib library provides algorithms capable of calculating the abnormality over the image, and tools to run these algorithms through training, evaluating, testing, benchmarking, and hyperparameter optimization. Modules already serve algorithms and tools, which could also be used for custom algorithm design.

In Figure 3, we include deployment as part of the tools and modules, but we want to show that this part is also solved by the library.

*Figure 3: Tools, components and modules of Anomalib.*

Figure 4 details a high-level overview of the training-to-deployment workflow process. We have used PyTorch Lighting for training and testing; ONNX and OpenVINO™ for optimization; and TensorFlow, PyTorch and OpenVINO can be used for deployment.

*Figure 4: High-level overview of the training-to-deployment workflow process*

Want to learn more?

In a follow up post, we will walk through an example on how to use the Anomalib library with a custom dataset and use case, see the follow up blog here — Hands-On Lab: How to Perform Automated Defect Detection Using Anomalib. To try Anomalib for yourself, download, install, and check out this getting started example.

To get started with Anomalib on your own AI applications, you can clone the OpenVINO notebooks repository, and you use over 270 deep learning models from Open Model Zoo.

About us:

Paula Ramos has been developing novel integrated engineering technologies — mainly in computer vision, robotics, and machine learning applied to agriculture — since the early 2000s in Colombia. During her PhD and postgrad research, she deployed multiple low-cost, smart edge & IoT computing technologies that can be operated without expertise in computer vision systems, such as farmers. Her inventions run in rugged and critical conditions, such as farming and outdoor environments without lighting control, high full-sun radiation, or even high-temperature extreme conditions. Currently, she’s an AI Evangelist at Intel, developing intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs.

Samet Akcay is an AI Research Engineer/Scientist. His primary research interests are real-time image classification, detection, anomaly detection, and unsupervised feature learning via deep/machine learning algorithms. He recently co-authored and open-sourced Anomalib, one of the largest anomaly detection libraries in the field. Samet holds a PhD from the Department of Computer Science at Durham University, UK, and received his MSc degree from the Robust Machine Intelligence Lab at the Department of Electrical Engineering at Penn State University, USA. He has over 30 academic papers published in top-tier computer vision and machine/deep learning conferences and journals.

Zhuo Wu is an AI evangelist at Intel focusing on the OpenVINO™ toolkit. Her work ranges from deep learning technologies to 5G wireless communication technologies. She has made contributions in computer vision, machine learning, edge computing, IoT systems, and wireless communication physical layer algorithms. She has delivered end-to-end machine learning and deep learning-based solutions to business customers in different industries — such as automobile, banking, insurance, etc. She also has carried extensive research in 4G-LTE and 5G wireless communication systems, filed multiple patents when she was working as a research scientist at Bell Labs in China. She has led several research projects as the principal investigator when she was an associate professor at Shanghai University.

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.