VisionGuard: My Google Summer of Code 2024 Experience with OpenVINO

Inbasekaran Perumal
OpenVINO-toolkit
Published in
9 min readJust now
VisionGuard Final Demo Slides

As my 16-week journey with Google Summer of Code (GSoC) 2024 comes to a close, I’m excited to share VisionGuard— a desktop application designed to reduce eye strain and promote healthier screen usage. Developed under the OpenVINO Toolkit, VisionGuard is a personal and technical milestone that applies advanced computer vision to solve a common, everyday issue.

The Journey to GSoC

My name is Inbasekaran Perumal, and I recently graduated from the National Institute of Technology, Surathkal (class of 2024), specializing in Electronics & Communication Engineering. Throughout my academic journey, I’ve been deeply interested in the intersection of hardware and software, particularly when it comes to integrating AI and machine learning into real-world applications. Technologies like computer vision and image processing have always captivated me, which is why OpenVINO became the perfect open-source project to align with my interests.

My introduction to OpenVINO happened during a technical session led by Intel experts. The workshop on neural network inference using Intel FPGAs left a lasting impression on me. That’s when I realized that contributing to OpenVINO could offer the perfect balance of learning and practical application.

I first heard about Google Summer of Code during my freshman year, where seniors and professors often spoke about their involvement in open-source projects. While the idea of contributing to such a large global initiative initially seemed overwhelming, I was encouraged by my professors and seniors, who assured me that GSoC was an excellent opportunity to grow both technically and personally. This support helped me overcome my doubts, and I finally decided to take the leap.

I hope to one day write a detailed guide for beginners interested in contributing to C++-based open-source projects, as it can be challenging at first. Such a guide could help others avoid the same struggles I faced while getting started.

The Inspiration Behind VisionGuard

The idea for VisionGuard arose from a common challenge many of us faced during the COVID-19 pandemic: increased screen time and its associated health issues. Like many others, I experienced eye strain, headaches, and disrupted sleep patterns due to prolonged computer use during remote learning. This personal experience motivated me to create a tool that not only monitors screen time but also actively encourages healthier screen habits.

Project Overview

VisionGuard is a cross-platform desktop application designed to help users manage screen time and reduce eye strain. Developed as part of Google Summer of Code 2024 under the OpenVINO Toolkit, it leverages real-time gaze tracking, customizable break reminders, and screen time analytics — all with a focus on privacy through on-device processing.

Built using C++ and integrating OpenVINO, Qt, and OpenCV, VisionGuard optimizes for CPU, GPU, and NPU platforms, ensuring smooth performance while keeping the user experience seamless and intuitive.

More information can be found in the VisionGuard GitHub Repository.

Key Features

Gaze Tracking and Calibration

  • Utilizes OpenVINO model zoo for accurate, real-time gaze estimation
  • Edge inference for responsive feedback

Customizable Break Reminders

VisionGuard main screen with notification alert
  • Based on the 20–20–20 rule for eye strain reduction
  • Allows users to set custom break durations and intervals

Screen Time Analytics

  • Provides detailed daily and weekly usage reports
  • Offers insights into screen usage patterns

Multi-Hardware and Multi-Camera Support

  • Optimized for various hardware configurations
  • Supports up to five cameras for enhanced tracking accuracy

User-Friendly Interface

  • Clean, intuitive design with light/dark themes
  • System resource monitoring and frame rate optimization
  • Ability to switch between quantized weights (FP32, FP16, INT8-FP16) and devices

System Tray Integration

  • Runs unobtrusively in the background
  • Easy access via system tray

User Data Privacy

  • All processing occurs locally on the user’s device
  • Ensures user data remains secure and private

Technical Architecture

VisionGuard’s high level architecture diagram
VisionGuard’s high-level architecture diagram

VisionGuard’s architecture comprises three main components:

VisionGuard’s architecture comprises three main components:

  1. Client Interface: Handles user interactions and displays information
  2. Backend Processing: Manages gaze tracking, break reminders, and data analysis
  3. Data Storage: Stores user preferences and usage statistics

For a more detailed explanation of VisionGuard’s architecture, please refer to our comprehensive documentation.

VisionGuard: A Closer Look Under the Hood

As we delve deeper into the inner workings of VisionGuard, I’d like to share some of the fascinating technical details that make this application tick. While there’s a lot going on behind the scenes, today we’ll focus on three key components: gaze detection, calibration, and screen time tracking.

Gaze Detection

The gaze detection system in VisionGuard integrates multiple neural networks from the OpenVINO model zoo. Each model in the pipeline contributes to determining the user’s gaze direction:

  1. Face Detection: Identifies faces within the video feed.
  2. Head Pose Estimation: Calculates the orientation of the detected face, outputting yaw, pitch, and roll angles.
  3. Facial Landmark Detection: Locates key facial points, particularly around the eyes, to precisely define eye regions.
  4. Eye State Estimation: Determines whether the user’s eyes are open or closed.
  5. Gaze Estimation: Combines the outputs from previous models to estimate the user’s gaze direction.

This pipeline operates multiple times per second, enabling real-time gaze tracking.

Screen Calibration

To account for variations in user setups, VisionGuard implements a calibration process:

  1. Initialization: The process begins by displaying four points at the corners of the screen.
  2. Data Collection: As the user looks at each point for approximately 1.2 seconds, the system captures multiple gaze data points.
  3. Boundary Calculation: Using a convex hull algorithm, the system creates a polygon encompassing all captured gaze points.
  4. Margin Addition: To account for potential inaccuracies, the system extends the boundary slightly beyond the calculated convex hull.

This calibration helps define the screen’s boundaries in the user’s visual field, which is crucial for accurate screen time tracking.

Screen Time Tracking

Once calibrated, VisionGuard tracks screen time using the following process:

  1. Gaze Vector Projection: The 3D gaze vector from the gaze estimation model is projected onto a 2D plane representing the screen.
  2. Boundary Check: A ray-casting algorithm determines whether the projected gaze point falls within the calibrated screen boundary.
  3. Time Calculation: When the gaze point is within the boundary, the system increments the screen time counter. The counter pauses when the gaze point is outside the boundary.

This process repeats frequently, allowing for precise tracking of screen time, including brief glances away from the screen. However, it’s worth noting that the system’s effectiveness can vary based on factors such as lighting conditions, user movement, and hardware capabilities.

Benchmarking Results

To assess VisionGuard’s performance across different hardware configurations, we conducted extensive testing on various platforms, including an M3 Pro, Intel AI-PC, and Asus TUF A-15. For this discussion, we’ll focus on the Neural Processing Unit (NPU) results, which provide some interesting insights.

NPU Performance (Intel AI PC)

Key Takeaways

Our benchmarking results suggest that:

  • NPU configurations offer an optimal balance of performance and efficiency.
  • While iGPU setups provide high performance, they come at the cost of increased resource utilization.
  • INT-8 FP-16 quantization proves beneficial across the board, improving performance and reducing resource usage.
  • Higher FPS limits showcase the capabilities of each device but at the expense of increased resource utilization.

For a more detailed analysis of our benchmarking results, please refer to our comprehensive benchmarking document.

Current Status and Future Work

VisionGuard has reached a fully operational state, with core features such as real-time gaze tracking, break notifications, and screen time analytics functioning effectively. Pre-built binaries are available for Windows and macOS, which can be found on our GitHub Releases Page.

While we’ve made significant progress, there’s always room for improvement. Some areas we’re considering for future development include:

  1. Expanding support to various Linux distributions (Debian, Fedora, Arch).
  2. Implementing comprehensive unit testing to enhance reliability.
  3. Developing robust GitHub workflows for continuous integration and deployment (CI/CD).
  4. Adding support for multi-monitor setups and multi-user environments.

Development Challenges and Learnings

My GSoC experience came with plenty of challenges, each one playing a huge role in helping me grow as a developer:

  • Cross-Platform C++ Development & Deployment: Ensuring VisionGuard worked smoothly across macOS, Windows, and Linux wasn’t easy. From quirky compiler errors to troubleshooting OpenVINO’s Model Zoo demo with MSVC 2022, there were plenty of hurdles that required serious debugging sessions.
  • Mastering CMake: Managing a project that spans multiple platforms meant getting comfortable with CMake. This tool became essential in handling dependencies and building configurations, and mastering it was a key milestone for me.
  • Low-Level C++ Design: Working with C++ on a deeper level was a whole new world. Implementing efficient object-oriented patterns and navigating memory management was challenging, but it gave me a solid foundation in how C++ really works under the hood.
  • Screen Calibration Accuracy: Making the screen calibration both accurate and user-friendly across different screen sizes was tougher than it sounds! Balancing technical precision with usability pushed me to think creatively about solving this problem.
  • Modern C++ Standards: Sticking to current C++ standards meant I had to rethink how I managed permissions and data storage. Moving away from simple file storage to more secure and efficient resource handling was another valuable learning experience.
  • Cross-Platform Debugging & WinDbg: One of the trickiest moments came when my application, which ran fine on Mac, suddenly crashed on Windows for no apparent reason. This pushed me to dive into tools like WinDbg to analyze crash dumps and debug the GUI on Windows. It was frustrating at first, but by the end, I had sharpened my cross-platform debugging skills.

Before GSoC, most of my knowledge was theoretical, and I learned to figure out things on my own. Throughout this journey, I picked up practical skills in C++, got hands-on with cross-platform development, and became adept at troubleshooting tricky compiler issues. Working on VisionGuard gave me a deeper understanding of C++ design, memory management, and even some computer vision concepts.

Conclusion

Participating in GSoC with OpenVINO and working on VisionGuard has been a deeply rewarding experience. It allowed me to work with advanced technology, collaborate with amazingly skilled mentors, and create a tool that can positively impact users’ digital well-being.

One of the key takeaways from GSoC is that the experience goes beyond just coding. Building strong relationships with mentors and the community has been invaluable. I learned that staying proactive, asking questions, and embracing challenges can lead to significant personal growth.

For future participants, I’d recommend choosing a project that genuinely excites you — it’ll keep you motivated. Don’t hesitate to reach out to your mentors and the community; their guidance can make a big difference. Overall, GSoC has been an incredible opportunity, and I’m truly grateful for everything I’ve learned along the way.

Acknowledgments

I extend my sincere gratitude to my mentors, Dmitriy Pastushenkov and Ria Cheruvu, for their guidance throughout this journey. Thanks also to Abhishek Nandy for providing the AI-PC for benchmarking, and the OpenVINO Org Admins Adrian Boguszewski and Zhuo Wu for their support in coordinating the GSoC experience. Lastly, I’m grateful to my friends Vaishali S, Pranav M Koundinya, and Samhita R for their assistance in testing VisionGuard.

Project Resources

For those interested in exploring VisionGuard further or contributing to its development, we’ve compiled a list of relevant resources:

Presentations

Throughout the GSoC program, we presented our progress at various stages:

Project Planning

For insights into our project planning and proposal:

References

The development of VisionGuard relied on numerous resources and technologies. Here are the key references:

  1. Gaze Estimation Demo OpenVINO Model Zoo
  2. OpenVINO Model Zoo
  3. OpenVINO Toolkit
  4. OpenCV
  5. Qt6
  6. CMake
  7. 20–20–20 Rule for Eye Strain
  8. Point in Polygon Algorithm
  9. Convex Hull Algorithm
  10. Tait-Bryan Angles
  11. NSIS
  12. Google Summer of Code
  13. Microsoft Visual C++

--

--