My Google Summer of Code Experience on the 20th anniversary

Bepitic
OpenVINO-toolkit
Published in
4 min readSep 13, 2024

Integrating Vision Large Language Models into Anomalib

(This also can be seen in my portfolio/CV)

During my time as a Software Engineer — Machine Learning in the Google Summer of Code program (May 2024 — Aug 2024), I worked on an exciting project with OpenVINO™’s Anomalib. My main focus was integrating Vision Large Language Models (VLLMs) into the Anomalib framework to achieve Zero/Few Shot anomaly detection models.

Project Goals

  • Implement VLLM integration within the Anomalib framework.
  • Develop Zero-Shot and Few-Shot learning capabilities for anomaly detection.
  • Optimize the performance of the integrated models.
  • Create comprehensive documentation and examples for future users.

Results and Conclusions:

After integrating this model into the library, we ran several experiments comparing its performance to the baseline (WinClip). Below, we have some prompt outputs generated by the model, seamlessly integrated with the library’s current visualization system.

0-shot GPT 4o
2 - shot GPT 4o

The results, as shown in the table further down, are quite promising, demonstrating significant performance improvements in several areas.

However, these results should be viewed with a healthy dose of skepticism. Since ChatGPT is trained on a non-disclosed dataset, it’s impossible to fully rule out the possibility that the evaluation dataset might overlap with its training data. This could lead to inflated performance results that don’t reflect true generalizability.

While the initial findings are exciting, further experiments on a carefully controlled dataset will be necessary to ensure the model’s robustness in real-world applications.

Comparison GPT vs WinClip

Key Achievements

  • Successfully integrated state-of-the-art VLLMs into Anomalib.
  • Developed a flexible architecture that supports both Zero-Shot and Few-Shot learning paradigms.
  • Achieved significant improvements in anomaly detection accuracy compared to traditional methods.
  • Contributed to the open-source community by making the integration publicly available.
  • Created extensive documentation, including tutorials and code examples.

Challenges and Learning

Throughout the project, I encountered specific challenges related to model performance. The OpenAI ChatGPT model worked effectively out of the box, delivering satisfactory results. However, the open-source models presented significant difficulties — they were not trained to handle multiple images effectively or did not perform well in the tasks required. These challenges highlighted the limitations of current open-source models in comparison to proprietary solutions and underscored the importance of continued development and training for such models.

Code and Documentation

You can find the code I worked on during this project at the following repositories:

Code Not Merged

During the project, several models were explored but ultimately not merged due to performance issues. Below are the details of these models and the reasons they were not integrated into the main branch:

  • LLaVA Model — This model demonstrated poor performance and was unable to handle multi-shot scenarios effectively.
  • LLaVA Next Model — Although this model performed better in zero-shot scenarios, it still struggled with multi-shot tasks, leading to its exclusion from the final integration.
  • Ollama Wrapper — A wrapper for the Ollama model zoo, which similarly lacked the capability to handle multi-shot scenarios, resulting in suboptimal performance.

What’s Left to Do

While significant progress has been made, there are still several tasks left to complete:

  • Real-Time Inference Enhancements: Additional work is required to reduce latency and improve the speed of the anomaly detection models in real-time applications.
  • Extended Testing and Validation: Comprehensive testing across a broader range of datasets and anomaly types is needed to validate the robustness and generalizability of the models.
  • User Documentation and Tutorials: Although initial documentation has been created, more in-depth tutorials and guides are needed to help users fully leverage the new features.
  • Community Feedback and Iteration: Engage with the open-source community to gather feedback, address issues, and iteratively improve the integration based on real-world use cases.

Notices & Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.

--

--