The Final Chapter: Visualizing Predictions and Analyzing the Results

Ayush Raj
9 min readFeb 21, 2024

--

Okay, so we’re heading towards the last part of our Project.

This is the final Installment to wrap up our Object Detection Series. For optimal comprehension and seamless reading experience, following the series chronologically is advised. The structure is mapped here: Series Introductory Blog.

In this article, I’ll delving deep into the fascinating realm of visualizing predictions and analyzing the results of our first two groundbreaking projects. In this edition, we’ll shine a spotlight on the precision, recall, and mAP (mean Average Precision) metrics, offering valuable insights into the performance of both projects. Join us as we unveil the intricate details behind the scenes, exploring how these metrics quantify the accuracy and effectiveness of our object detection models. From visually inspecting predictions to conducting in-depth analyses, we’ll uncover the successes and challenges encountered along the way. Get ready for a data-driven exploration that promises to deepen your understanding of our projects and the exciting world of computer vision! Okay, So Let’s see them one by one:

Project 1

We’ve used pre-trained Faster R-CNN Model with Resnet50 backbone fine-tuned on this Dataset’s Train Split.

Then we evaluated the model on the test split. The Results are like :

This is the Result obtained on a random image of the test dataset.

Green & Yellow Boxes represent the actual & predicted boxes respectively.

As you can already see, Predictions are quite good. Although, there are some redundant boxes on both the sides of the Image. But, that it is okay, after all you can’t expect to be 100% correct. And that is synonymous to life, isn’t it? Everything can’t be perfect. Sorry for taking a small detour to Philosophy, Haha. :) Coming back to point, what matters most is that the remaining boxes should have a high IoU with the corresponding ground truth boxes.

Project 2

We’ve used the SOTA Model YOLOv8 (Medium Version) fine-tuned on the same dataset.

And the Results & the speed of inference were quite Promising. Have a look :

Results obtained on an image of the Test Dataset.

Here, Not only the model was able to predict helmet class accurately with good confidence, it was able to predict head class well. Again, some redundant boxes. The output numbers are confidence scores for the classes mentioned nearby.

Comparing the two Models

For evaluating the performance of both the models, we turn to key metrics such as precision, recall, and mean Average Precision (mAP). Precision measures the accuracy of our model’s predictions by assessing the proportion of true positive detections among all positive predictions. On the other hand, recall evaluates the model’s ability to correctly identify all relevant instances, capturing the proportion of true positive detections among all actual positive instances. These metrics provide valuable insights into the trade-off between precision and recall, allowing us to fine-tune our models for optimal performance.

Furthermore, mean Average Precision (mAP) offers a comprehensive assessment of our model’s overall accuracy across different object classes. By averaging the precision-recall curves for each class and calculating the area under the curve (AUC), mAP provides a single numerical value that summarizes the model’s performance across the entire dataset. This metric serves as a powerful tool for comparing the effectiveness of different models and tracking improvements over time.

By analyzing these metrics, we gain valuable insights into the strengths and weaknesses of our models, enabling us to make informed decisions for future enhancements and optimizations

Evaluation on Precision and Recall Metrics

Just to reiterate, we’ve used YOLOv8 Medium version everywhere (yolov8m).

Here, we’ll present the results that we got during our evaluation.

We’ve calculated the Precision and Recall per class for all the 3 classes, but the point of interest since the beginning, is our Helmet Class. So, I’ve showed only that Class. Have a look:

Image by Author

After examining the precision and recall values for both YOLO and Faster R-CNN, it’s evident that there is a significant disparity in performance, particularly with Faster R-CNN exhibiting notably lower values. This discrepancy highlights potential shortcomings in the Faster R-CNN model’s ability to accurately detect objects compared to YOLO.

Possible reasons for the lower precision and recall values with Faster R-CNN could include issues such as:

  1. Difficulty in capturing intricate object details: Faster R-CNN may struggle to accurately detect objects with complex shapes or textures, leading to missed detections or false positives.
  2. Insufficient training data: The Faster R-CNN model may not have been adequately trained on diverse and representative datasets, limiting its ability to generalize to new or unseen scenarios. And this you already knew, because as mentioned earlier, our dataset was heavily imbalanced.
  3. Suboptimal model architecture: The design and configuration of the Faster R-CNN model architecture may not be optimized for the specific object detection task at hand, resulting in subpar performance compared to YOLO.

Moving forward, it may be beneficial to explore strategies for improving the performance of the Faster R-CNN model, such as fine-tuning model hyperparameters, augmenting training data, or considering alternative model architectures. By addressing these challenges, we can work towards achieving higher precision and recall values, ultimately enhancing the overall effectiveness of our object detection systems using Faster R-CNN.

Evaluation on Mean Average Precision(mAP) at an IoU Threshold of 0.5.

Image by Author

Upon examining the mean Average Precision (mAP) values at an Intersection over Union (IoU) threshold of 0.5 for both YOLO and Faster R-CNN, it’s notable that the performance of the Faster R-CNN model appears to be reasonably decent. However, the YOLO model demonstrates significantly superior performance in comparison.

While the Faster R-CNN model exhibits a seemingly decent mAP score, it’s evident that the YOLO model outperforms it by a considerable margin. This observation underscores the effectiveness and efficiency of the YOLO approach in object detection tasks, highlighting its potential to deliver more accurate and reliable results across diverse datasets and scenarios.

The values of the two models emphasizes the need for careful consideration when selecting an object detection approach. Factors such as model complexity, computational efficiency, and task-specific requirements should be taken into account to ensure optimal performance and effectiveness in real-world applications.

FPS (Frames per Second)

Frames per second (FPS) serve as a crucial measure of a model’s real-time performance, directly impacting its usability in dynamic environments. Achieving high FPS rates signifies the model’s ability to process video streams swiftly, facilitating timely decision-making and enhancing overall efficiency. Monitoring and optimizing FPS is essential for ensuring seamless integration of object detection systems into applications such as surveillance, autonomous vehicles, and augmented reality, where real-time responsiveness is paramount. It is calculated mathematically as 1/the time taken by a single frame(image) from pre-processing, inference to post-processing.

Image by Author

YOLO, due to its rapid processing speed, took only 12ms on an avg. for a single frame in our Test Datasets and Video Files. So, the FPS comes out to be 1000/12 i.e. 83 FPS.

While Faster R-CNN for the same task took 142ms on avg. So, the FPS comes out to be 1000/14 i.e. 7 FPS, which is very low, making it unfit for Real-Time Object Detection Tasks.

Note: Keep in mind that FPS can vary based on factors like input resolution, system load, and specific hardware capabilities.

Here we’ve calculated this using “Tensor 4” GPU

Conclusion

  • The YOLO model consistently demonstrates superior FPS rates, Precision and Recall Values over.
  • While the Faster R-CNN model may offer relatively decent performance in terms of precision and recall, but it lags behind YOLO.
  • The significant difference in frames per second (FPS) between Faster R-CNN (7 FPS) and YOLO (83 FPS) is critical in the context of real-time object detection. For any model to be able to run on Video Files, it must have a minimum of 24 fps as this is the standard for videos.

With a higher FPS, YOLO demonstrates a remarkable ability to process video streams swiftly, facilitating real-time object detection in dynamic environments. This rapid processing capability is particularly advantageous in applications such as surveillance, autonomous vehicles, and augmented reality, where timely decision-making is essential.

In contrast, the lower FPS of Faster R-CNN may limit its suitability for real-time scenarios, potentially causing delays in detection and response. Therefore, the superior FPS of YOLO positions it as a more viable solution for applications requiring efficient and responsive object detection in real-time.

So, yeah! Now, we’ve successfully covered all the concepts required for Object Detection in particular for this Project and obviously their implementation in the Project itself!

Our Learning Experience

Throughout this project journey, we’ve embarked on a transformative learning experience that has enriched our understanding of object detection and deepened our expertise in computer vision. From grappling with complex algorithms to navigating the nuances of model implementation, every step has been a testament to perseverance and growth. We’ve embraced challenges as opportunities for learning, honing our problem-solving skills and expanding our technical repertoire. Moreover, collaborating with peers and mentors has fostered a sense of community and shared knowledge, fueling our passion for innovation and discovery. As we reflect on this journey, we emerge with newfound insights, ready to apply them to future endeavors and continue our pursuit of excellence in the field of AI.

Wrapping Up!

Yeah, now we end this series of blogs where we’ve meticulously explored every aspect of our projects, from understanding the fundamentals to delving into the intricacies of implementation. With concepts clarified and implementations dissected, we’ve embarked on a journey of discovery, uncovering insights and pushing boundaries in the realm of object detection. As the curtain falls on this series, we stand poised at the intersection of theory and practice, armed with newfound knowledge and ready to tackle the challenges that lie ahead in the ever-evolving field of Computer Vision.

We’re thrilled to have shared our journey with Object Detection, especially its application in Real-World Problems. If you’re interested in learning more about our project or have any queries regarding the topics or anything in general, feel free to reach out! We’ll be giving our emails at the end.

This Blog Series is a product of our dedicated teamwork. We strive for accuracy, but mistakes can happen. If you notice any errors or typos, please point them out and let us know in the comments below.

If this Series helped, we’ll be very happy to know that! I also invite you to reflect on your own experiences and insights gained from our exploration on Object Detection. Consider how these insights might influence your approach to future projects or spark new ideas for applications. Feel free to share any challenges you faced or breakthroughs you experienced while working with this technology. Your contributions to our collective understanding are invaluable, and we look forward to hearing from you. Thank you for being part of our learning journey on Object Detection!

Till then, Goodbye and Happy Learning! :)

Do Follow for more such content!!!

About the Authors

Ayush RajReach out to me!

Vikash Kumar ThakurReach out to me!

--

--

Ayush Raj

A passionate learner who loves to break complex concepts into simpler explanations. Research Interests include Deep Learning and Computer Vision.