Custom Object Detection with YOLOv7: A Step-by-Step Guide

8 min readJun 23, 2024

YOLOv7 is a powerful tool for real-time object detection, known for its speed and accuracy. However, what if you need to detect objects that aren’t included in the default model? This guide will show you how to train YOLOv7 on your own custom dataset. You’ll learn how to prepare your data, set up the model, and train it to recognize the specific objects you need. Whether you’re working on a unique project or tackling a specialized task, this easy-to-follow tutorial will help you harness the full potential of YOLOv7 for custom object detection.

**Fig-:** **YOLOv7 official repository**

YOLOv7 was developed by a team of researchers led by Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. The team released the model in 2022, continuing the tradition of the YOLO (You Only Look Once) family of real-time object detection models. This version aimed to improve both the speed and accuracy of object detection tasks compared to its predecessors.

**Fig-1: YOLOv7 BenchMarks [https://github.com/WongKinYiu/yolov7**]

In Figure 1, it is evident that YOLOv7 surpasses YOLOR, PP-YOLOE, YOLOX, and YOLOv5 in both accuracy and speed. The entire development of YOLOv7 has been carried out using PyTorch, ensuring compatibility with a wide range of deep learning tools and libraries. This advantage not only highlights YOLOv7’s superior performance metrics but also its robustness and ease of integration within various machine learning workflows.

In this article, we will focus on “Training YOLOv7 on Custom Data.” You can follow the steps below to train YOLOv7 with your own data. These steps have been tested on Ubuntu 18.04 and 20.04 with CUDA 10.x/11.x.

Installing Required Modules
Using Pretrained Models for Object Detection
Training YOLOv7 on Your Own Data
Performing Inference with Custom Trained Weights

1. Installing Required Modules

Create a folder named “YOLOv7”(In your case it is same or different name).
Open the terminal (Linux/macOS) or command prompt (Windows).
Go to the “YOLOv7” folder in the terminal/command prompt.
Use these commands to set up a virtual environment named “venv” (you can choose a different name):

# In Linux
python3 -m venv myenv
source myenv/bin/activate
pip install --upgrade pip

# In Windows
python -m venv myenv
myenv\Scripts\activate
pip install --upgrade pip

Note: Above step is not necessary, but recommended if you don’t want to disturb python system packages.

Clone the YOLOv7 repository from this link, and move to the cloned folder with the following commands.

git clone https://github.com/WongKinYiu/yolov7.git
cd yolov7

Now we need to install all of the libraries that will help in the training of YOLOv7.

pip install -r requirements.txt

2. Using Pretrained Models for Object Detection

Now that we’ve installed all the necessary modules, let’s verify that everything is working correctly by testing object detection with pre-trained weights. Use the following command in your terminal/cmd to detect objects:

python detect.py --weights yolov7.weights --cfg yolov7.cfg --img 640 --conf 0.5 --source path/to/your/image/or/video

Note: Make sure the YOLOv7 weights (yolov7.weights)is placed in the "yolov7" folder. If you haven't downloaded the pre-trained weights yet, you can get them from this link and move the downloaded file to your current working directory (yolov7).

If everything is working correctly, you should see the results in the following directory:

Results Directory: yolov7/runs/detect/exp/horses.jpg

**Fig-2:** **YOLOv7 official repository**

3. Training YOLOv7 on Your Own Data

Follow these steps to train YOLOv7 with your own dataset:

Step 1: Prepare Your Dataset

If you don’t have data, you can use the openimages database to download the dataset. YOLOv7 requires labels in .txt files with the format:

Step 2: Label Your Data

For labeling, you can use labelImg but first you need to install it in your system. Follow these steps:

Create a folder with name labelImg_app
Go inside the folder through terminal and use:
pip install --upgrade pip
pip install labelImg
labelImg

Now labelImg app is open and click to Open Dir and select your directory where images are present and draw rectangle box on your image like this and save it.

After saving the images you will get two files, one is classes.txt and the

other file in which your bounding box coordinates are written in yolov7 format(class center_x center_y width height) and looks like:

Here, 0 indicates the cat class and 1 indicates the dog class because the cat was labeled first.

In this way you can annotate you entire image and ready it for training.

Step 3: Split Your Data

Split your data into train and test folders. A common split is 80% for training and 20% for testing.

Folder Structure:

├── yolov7
│     ├──images
│     │   ├── train (train images present here)
│     │   ├── val (validation images present here)
│     │   ├── test (test images present here)
│     │
│     ├── labels
│     │   ├── train (train labels txt files are here)
│     │   ├── val (val labels txt files are here)
│     │
│     ├── train.txt (In this file your train image location is saved)
│     ├── val.txt (In this file your val image location is saved)

train.txt and val.txt files looks like:

train.txt

val.txt

In these .txt files, the paths to all your training and validation images are stored. You need to specify the locations of these two files in your custom.yaml file (see the example custom configuration file below).

Step 4: Create a Custom Configuration File

Create a file named “custom.yaml” in yolov7/data/ and configure it:

# I assume you are currently in yolov7 folder
train: ./train.txt
valid: ./val.txt
nc: 1  # Number of classes
names: ['your_class_name']  # List of class names

Step 5: Start Training

Open terminal in yolov7, activate the virtual environment, and run:

python train.py --weights yolov7.pt --data "data/custom.yaml" --workers 4 --batch-size 4 --img 416 --cfg cfg/training/yolov7.yaml --name yolov7 --hyp data/hyp.scratch.p5.yaml --epochs 50

--img: Size of images for training (default: 640).
--batch-size: Batch size used in training.
--epochs: Number of training epochs.
--data: Path to custom configuration file.
--weights: Pre-trained weights (yolov7.pt, yolov7x.pt, etc.).

It’s important to select the appropriate pre-trained weights for your task. When you visit the official YOLOv7 GitHub repository, you’ll find six different pre-trained weights files.

Choose the one that best fits your requirements. Refer to the table below for guidance:

Note: Training will not start if any image is corrupted, If some label file is corrupted there will be no issue in training because yolov7 will ignore that image and label files.

Check Training Progress

Monitor training and you will get your training weights files path once the training will finish.

Evaluating the model during training :

Intersection over Union (IoU) is a fundamental metric used in object detection tasks to measure the overlap between a predicted bounding box and a ground truth bounding box. IoU is calculated as the ratio of the area of intersection between the predicted bounding box and the ground truth bounding box to the area of their union. It is expressed mathematically as:

Mean Average Precision is a crucial metric for evaluating object detection models. It combines precision and recall to provide a single performance measure:

Precision: Precision is the proportion of true positive detections out of all positive detections made by the model. For example, suppose you have two images of dogs and your model predicts a total of 5 bounding boxes across both images. If only 3 of these 5 bounding boxes have an IoU greater than or equal to the chosen threshold (e.g., 0.5) with the ground truth boxes, then your precision will be:
Precision = 3/5 = 0.6
Recall: Recall is the proportion of true positive detections out of all actual positives in the dataset. For example, suppose you have 4 ground truth bounding boxes in two images, and your model predicts 5 bounding boxes. If only 3 of these predicted boxes have an IoU greater than or equal to the chosen threshold (e.g., 0.5) with the ground truth boxes, then your recall will be:
Recall = 3/4 = 0.75

Average Precision (AP) is calculated for each class and represents the area under the precision-recall curve. Mean Average Precision (mAP) is the average of AP values across all classes. YOLOv7 typically reports mAP at different IoU thresholds. For example:

mAP@0.5: This means a predicted box is considered correct if its IoU with the ground truth box is at least 50%.
mAP@0.5:0.95: This is a more comprehensive metric that averages the mAP across multiple IoU thresholds (0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95). This provides a balanced view of the model’s performance from moderate to high overlap requirements.

In summary, higher mAP@0.5 and mAP@0.5:0.95 values indicate that your model has successfully learned to detect objects with good accuracy across varying degrees of overlap with the ground truth boxes in the dataset.

Step 6: Inference with Custom Weights

Once training finishes, perform detection using:

python detect.py --weights runs/train/yolov7/weights/best.pt --source "path to your testing image"

This completes the guide for training YOLOv7 on your custom dataset. Experiment with your own data and enjoy exploring its capabilities!

I trust this blog has enriched your understanding of the YOLOv7 training on custom dataset. If you found value in this content, I invite you to stay connected for more insightful posts. Your time and interest are greatly appreciated. Thank you for reading!

Custom Object Detection with YOLOv7: A Step-by-Step Guide

1. Installing Required Modules

2. Using Pretrained Models for Object Detection

3. Training YOLOv7 on Your Own Data

Written by Sachinsoni