How to run the YOLOv7 Object Detection model on windows?

Ragavan Arul
4 min readJan 14, 2023

--

“By the end of this reading, you will have a basic understanding of the YOLOv7 object detection model architecture and how to run it on Windows 10 or 11 using your preferred images. Additionally, you will learn how to write a batch script and run it for detecting multiple images or videos (e.g. 100).”

YOLOv7 Object detection result-1

Now, not further delay we will jump into the YOLOv7 model.

Yolov7 is a real-time object detector currently revolutionizing the computer vision industry with its incredible features. The official YOLOv7 provides unbelievable speed and accuracy compared to its previous versions. Yolov7 weights are trained using Microsoft’s COCO dataset, and no pre-trained weights are used. We can see in the following image compression of the computational time of the few object detection model.

Computational time compression of the object detection model

The YOLO Architecture in General

YOLO architecture is FCNN(Fully Connected Neural Network) based. However, Transformer based versions have recently been added to the YOLO family as well. We will discuss Transformer based detectors in a separate post. For now, let’s focus on FCNN (Fully Convolutional Neural Network) based YOLO object detectors also it has three main components.

  • Backbone
  • Head
  • Neck

The Backbone mainly extracts essential features of an image and feeds them to the Head through Neck. The Neck collects feature maps extracted by the Backbone and creates feature pyramids. Finally, the head consists of output layers that have final detections. The following table shows the architectures of YOLOv4, YOLOv4, and YOLOv5.

Extended Efficient Layer Aggregation

Extended Efficient Layer Aggregation

In YOLOv7, the authors build on research that has happened on this topic keeping in mind the amount of memory it takes to keep layers in memory along with the distance that it takes a gradient to back-propagate through the layers — the shorter the gradient, the more powerfully their network will be able to learn. The final layer aggregation they choose is E-ELAN, an extended version of the ELAN computational block. It has been designed by analyzing the following factors that impact speed and accuracy.

  1. Memory access cost
  2. I/O channel ratio
  3. Element wise operation
  4. Activations
  5. Gradient path

The proposed E-ELAN uses expand, shuffle, and merge cardinality to achieve the ability to continuously enhance the learning ability of the network without destroying the original gradient path.

Auxiliary Head Coarse-to-Fine

The YOLO network head makes the final predictions for the network, but since it is so far downstream in the network, it can be advantageous to add an auxiliary head to the network that lies somewhere in the middle. While you are training, you are supervising this detection head as well as the head that is actually going to make predictions.

The auxiliary head does not train as efficiently as the final head because there is less network between it and the prediction — so the YOLOv7 authors experiment with different levels of supervision for this head, settling on a coarse-to-fine definition where supervision is passed back from the lead head at different granularities.

Yeah, If you want more information about the YOLOv7 model. You can see the YOLOv7 model here.

How to run the YOLOv7 on windows

  1. clone with the YOLOv7 repository
git clone https://github.com/WongKinYiu/yolov7.git

2. Create the new environment with python 3.9 to that, open the Anaconda prompt and do the following process

conda create -n yolov7 python=3.9

3. Activate the Conda environment


conda activate yolov7

4. Install the conda, Cuda

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

5. Change the path to where you store the YOLOv7 model (while cloning the YOLOv7 GitHub)

cd path

6. Run the detect.py

python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg

7. How to detect particular classes in the images

python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg --class 0,1,1,3

8. How to run more than one image at a time using Bash script?

touch detect.sh

vim detect.sh

#!/bin/bash

for file in inference/images/*.jpg
do
python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source $file

done

Conclusion

YOLOv7 is a real-time object detection model that is fast and accurate. It has a computational time that is suitable for real-time applications and its accuracy is comparable to other state-of-the-art models. The architecture of YOLOv7 has been improved to increase its efficiency and reduce the number of parameters needed. This results in faster inference times and improved performance on a variety of object detection tasks. Overall, YOLOv7 is a powerful and efficient model that is well-suited for real-time object detection applications.

--

--

Ragavan Arul

With more than 2.5 years of experience as an ML Engineer and working with distributed training, CI/CD pipeline, MLOps, & building ML and DL models using TF.