Object Detection Using YOLOv5 From Scratch With Python | Computer Vision

Kazi Mushfiqur Rahman
9 min readOct 13, 2023

--

Complete project in GitHub

Install necessary packages

Search in google with the following url to yolov5 repository

Click on requirements.txt file

Click on Raw button at the top right corner from the above screen. It will navigate you to the below page

From the above page copy the url and paste it after pip install –r

pip install –r https://raw.githubusercontent.com/ultralytics/yolov5/master/requirements.txt

And run this command in terminal inside activated virtual environment to install the packages

Now , when the required packages are installed, we can start to write code

detect.py

# Import PyTorch module
import torch
import cv2

# Download model from github
model = torch.hub.load('ultralytics/yolov5', 'yolov5n')

img = cv2.imread('car.jpg')
img = cv2.resize(img,(1000, 650))

# Perform detection on image
result = model(img)
print('result: ', result)

Explanation of the above code

This Python code uses PyTorch and OpenCV (cv2) to perform object detection on an image using the YOLOv5 model. Here’s a simple explanation of what each part of the code does:

  1. Import PyTorch and OpenCV:
  • import torch: Imports the PyTorch library for deep learning.
  • import cv2: Imports OpenCV, a library for image processing.

2. Download YOLOv5 Model:

  • model = torch.hub.load('ultralytics/yolov5', 'yolov5n'): Downloads the YOLOv5 model from a GitHub repository using the 'yolov5n' variant. This model is used for object detection.

3. Load and Resize an Image:

  • img = cv2.imread('car.jpg'): Reads an image called 'car.jpg' using OpenCV.
  • img = cv2.resize(img, (1000, 650)): Resizes the image to a specific width and height (1000 pixels wide and 650 pixels tall).

4. Perform Object Detection:

  • result = model(img): Uses the downloaded YOLOv5 model to perform object detection on the resized image. The results of the detection are stored in the result variable.

5. Print the Results:

  • print('result: ', result): Prints the detection results, which can include information about the objects detected in the image, their locations, and confidence scores.

In summary, this code loads a YOLOv5 object detection model, resizes an image, and then runs the model on the image to identify objects and print the detection results.

Show detected result in terminal

python detect.py

Convert detected result to pandas data frame to visualize the it

detect.py

# Import PyTorch module
import torch
import cv2

# Download model from github
model = torch.hub.load('ultralytics/yolov5', 'yolov5n')

img = cv2.imread('car.jpg')
img = cv2.resize(img,(1000, 650))

# Perform detection on image
result = model(img)
print('result: ', result)

# Convert detected result to pandas data frame
data_frame = result.pandas().xyxy[0]
print('data_frame:')
print(data_frame)

Explanation of the above code

In this code, you are converting the detection results from the YOLOv5 model into a Pandas DataFrame and then printing this DataFrame. Let’s break down the code step by step:

  1. data_frame = result.pandas().xyxy[0]:
  • result.pandas(): This is a method provided by the YOLOv5 library that converts the detection results into a Pandas DataFrame.
  • .xyxy[0]: This part of the code is used to extract the detection results for the first image in the batch. The [0] indicates the index of the image in the results.

2. print('data_frame:'):

  • This line is simply printing the text “data_frame:” to the console to indicate that the following output will be the contents of the DataFrame.

3. print(data_frame):

  • This line prints the actual Pandas DataFrame containing the detection results.

The Pandas DataFrame data_frame should contain information about the detected objects in the image, such as their class labels, confidence scores, and bounding box coordinates. By extracting and displaying this information, you can easily analyze and work with the detection results using the powerful data manipulation capabilities provided by Pandas.

Show data frame in terminal

python detect.py

Detect image drawing bounding box and confidence score

detect.py

# Import PyTorch module
import torch
import cv2

# Download model from github
model = torch.hub.load('ultralytics/yolov5', 'yolov5n')

img = cv2.imread('car.jpg')
img = cv2.resize(img,(1000, 650))

# Perform detection on image
result = model(img)
print('result: ', result)

# Convert detected result to pandas data frame
data_frame = result.pandas().xyxy[0]
print('data_frame:')
print(data_frame)

# Get indexes of all of the rows
indexes = data_frame.index
for index in indexes:
# Find the coordinate of top left corner of bounding box
x1 = int(data_frame['xmin'][index])
y1 = int(data_frame['ymin'][index])
# Find the coordinate of right bottom corner of bounding box
x2 = int(data_frame['xmax'][index])
y2 = int(data_frame['ymax'][index ])

# Find label name
label = data_frame['name'][index ]
# Find confidance score of the model
conf = data_frame['confidence'][index]
text = label + ' ' + str(conf.round(decimals= 2))

cv2.rectangle(img, (x1,y1), (x2,y2), (255,255,0), 2)
cv2.putText(img, text, (x1,y1-5), cv2.FONT_HERSHEY_PLAIN, 2,
(255,255,0), 2)

cv2.imshow('IMAGE', img)
cv2.waitKey(0)

Explanation of the above code

This code snippet takes the Pandas DataFrame data_frame that contains object detection results and uses it to draw bounding boxes and labels on the original image. Here's a step-by-step explanation:

  1. indexes = data_frame.index:
  • This line retrieves the index values of all the rows in the Pandas DataFrame data_frame. These indexes correspond to different detected objects in the image.

2. for index in indexes::

  • This for loop iterates through each index (i.e., each detected object) in the DataFrame.

3. Inside the loop, the code extracts various information about each detected object:

  • x1 and y1 are the coordinates of the top-left corner of the bounding box, extracted from the 'xmin' and 'ymin' columns in the DataFrame, respectively.
  • x2 and y2 are the coordinates of the bottom-right corner of the bounding box, extracted from the 'xmax' and 'ymax' columns.
  • label is the class label of the detected object, obtained from the 'name' column.
  • conf is the confidence score of the model for that detection, extracted from the 'confidence' column.

4. text = label + ' ' + str(conf.round(decimals=2)):

  • This line creates a text string that combines the label and the confidence score, formatted to display the label and confidence rounded to two decimal places.

5. cv2.rectangle(img, (x1,y1), (x2,y2), (255,255,0), 2):

  • This code draws a rectangle (bounding box) on the original image using OpenCV. It uses the coordinates x1, y1 (top-left corner) and x2, y2 (bottom-right corner) to define the box. The color is set to (255, 255, 0), which represents a shade of yellow, and the line thickness is 2 pixels.

6. cv2.putText(img, text, (x1,y1-5), cv2.FONT_HERSHEY_PLAIN, 2, (255,255,0), 2):

  • This line adds text to the image using OpenCV. It displays the text string created earlier near the top-left corner of the bounding box. The font style is set to cv2.FONT_HERSHEY_PLAIN, the font size is 2, and the text color is again (255, 255, 0) (yellow), with a line thickness of 2 pixels.

7. cv2.imshow('IMAGE', img):

  • This line displays the modified image with bounding boxes and labels using OpenCV.

8. cv2.waitKey(0):

  • This line waits indefinitely for a key press. The program will continue running until a key is pressed, and then it will close the image window.

In summary, this code processes the object detection results to draw bounding boxes and labels on the original image, making it easier to visualize and understand what objects the YOLOv5 model detected in the image.

Show data frame in terminal

python detect.py

Detected image

In the above image a car and a person has been detected with their confidence score

Detect images from video drawing bounding box and confidence score

detect2.py

import torch
import cv2

# Download model from github
model = torch.hub.load('ultralytics/yolov5', 'yolov5n')

# model = torch.hub.load('yolov5', 'yolov5n', source= 'local')

cap = cv2.VideoCapture('cars.mp4')

while True:
img = cap.read()[1]
if img is None:
break

# Perform detection on image
result = model(img)
print('result: ', result)

# Convert detected result to pandas data frame
data_frame = result.pandas().xyxy[0]
print('data_frame:')
print(data_frame)

# Get indexes of all of the rows
indexes = data_frame.index
for index in indexes:
# Find the coordinate of top left corner of bounding box
x1 = int(data_frame['xmin'][index])
y1 = int(data_frame['ymin'][index])
# Find the coordinate of right bottom corner of bounding box
x2 = int(data_frame['xmax'][index])
y2 = int(data_frame['ymax'][index ])

# Find label name
label = data_frame['name'][index ]
# Find confidance score of the model
conf = data_frame['confidence'][index]
text = label + ' ' + str(conf.round(decimals= 2))

cv2.rectangle(img, (x1,y1), (x2,y2), (255,255,0), 2)
cv2.putText(img, text, (x1,y1-5), cv2.FONT_HERSHEY_PLAIN, 2,
(255,255,0), 2)

cv2.imshow('IMAGE', img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break

Explanation of the above code

This code is a Python script that uses OpenCV to perform real-time object detection on a video file (‘cars.mp4’) using the YOLOv5 model. Let’s break down how it works:

  1. cap = cv2.VideoCapture('cars.mp4'):
  • This line opens the video file ‘cars.mp4’ for reading using OpenCV’s VideoCapture function. cap is a video capture object.

2. while True::

  • This starts an infinite loop, which will process the video frame by frame until the video ends or until the ‘q’ key is pressed to exit the loop.

3. img = cap.read()[1]:

  • This line reads the next frame from the video using the cap.read() method. [1] selects the second element of the returned tuple, which is the frame itself. If there are no more frames to read, it breaks out of the loop.

4. result = model(img):

  • This line uses the YOLOv5 model to perform object detection on the current frame (img), just like in the previous explanation.

5. data_frame = result.pandas().xyxy[0]:

  • This line converts the detection results for the current frame into a Pandas DataFrame, similar to the previous explanation.

6. The following code block is the same as the one explained earlier:

  • It iterates over the rows of the DataFrame, extracts bounding box coordinates, label names, and confidence scores, and draws bounding boxes and labels on the frame using OpenCV.

7. cv2.imshow('IMAGE', img):

  • This line displays the modified frame with bounding boxes and labels in a window titled ‘IMAGE’.

8. if cv2.waitKey(1) & 0xFF == ord('q')::

  • This code waits for 1 millisecond for a key press. If the pressed key is ‘q’ (quit), it breaks out of the loop, allowing you to stop the video processing.

In summary, this code continuously processes frames from the ‘cars.mp4’ video, performs object detection on each frame, and displays the frames with bounding boxes and labels in a window. You can exit the processing loop by pressing the ‘q’ key. This allows you to visually inspect object detections in the video in real time.

Input video for detecting objects

Detect object from the above video

--

--

Kazi Mushfiqur Rahman

B.Sc. In CSE, Software Engineer, Techneous | Python | Django | DRF | Computer Vision | OpenVINO - AI Framework | JS | Ajax | Devops | Technical writter