How to use YOLOv9 for Object Detection

Mert
4 min readMar 13, 2024

--

Introduction

In a previous blog post, we explored object detection with YOLOv8. Now, we’re thrilled to delve into the latest iteration — YOLOv9! This new version promises significant advancements in accuracy, efficiency, and applicability, making it a powerful tool for various computer vision tasks.

YOLOv9, like its predecessor, focuses on identifying and pinpointing objects within images and videos. Applications such as self-driving cars, security systems, and advanced image search rely heavily on this capability. YOLOv9 introduces even more impressive innovations than YOLOv8.

How to use YOLOv9 for images and videos

Step 1: Installing the necessary libraries

pip install opencv-python ultralytics

Step 2: Importing libraries

import cv2
from ultralytics import YOLO

Step 3: Choose your model

model = YOLO("yolov9c.pt")

On this website, you can compare different models and weigh up their respective advantages and disadvantages. In this case we have chosen yolov9c.pt.

Step 4: Write a function to predict and detect objects in images and videos

def predict(chosen_model, img, classes=[], conf=0.5):
if classes:
results = chosen_model.predict(img, classes=classes, conf=conf)
else:
results = chosen_model.predict(img, conf=conf)

return results

def predict_and_detect(chosen_model, img, classes=[], conf=0.5, rectangle_thickness=2, text_thickness=1):
results = predict(chosen_model, img, classes, conf=conf)
for result in results:
for box in result.boxes:
cv2.rectangle(img, (int(box.xyxy[0][0]), int(box.xyxy[0][1])),
(int(box.xyxy[0][2]), int(box.xyxy[0][3])), (255, 0, 0), rectangle_thickness)
cv2.putText(img, f"{result.names[int(box.cls[0])]}",
(int(box.xyxy[0][0]), int(box.xyxy[0][1]) - 10),
cv2.FONT_HERSHEY_PLAIN, 1, (255, 0, 0), text_thickness)
return img, results

predict() function

This function takes three arguments:

  • chosen_model: The trained model to use for prediction
  • img: The image to make a prediction on
  • classes: (Optional) A list of class names to filter predictions to
  • conf: (Optional) The minimum confidence threshold for a prediction to be considered

The function first checks if the classes argument is provided. If it is, then the chosen_model.predict() method is called with the classes argument, which filters the predictions to only those classes. Otherwise, the chosen_model.predict() method is called without the classes argument, which returns all predictions.

The conf argument is used to filter out predictions with a confidence score lower than the specified threshold. This is useful for removing false positives.

The function returns a list of prediction results, where each result contains the following information:

  • name: The name of the predicted class
  • conf: The confidence score of the prediction
  • box: The bounding box of the predicted object

predict_and_detect() function

This function takes the same arguments as the predict() function, but it also returns the annotated image in addition to the prediction results.

The function first calls the predict() function to get the prediction results. Then, it iterates over the prediction results and draws a bounding box around each predicted object. The name of the predicted class is also written above the bounding box.

The function returns a tuple containing the annotated image and the prediction results.

Here is a summary of the differences between the two functions:

  • The predict() function only returns the prediction results, while the predict_and_detect() function also returns the annotated image.
  • The predict_and_detect() function is a wrapper around the predict() function, which means that it calls the predict() function internally.

Step 5: Detecting Objects in Images with YOLOv9

# read the image
image = cv2.imread("YourImagePath")
result_img, _ = predict_and_detect(model, image, classes=[], conf=0.5)

If you want to detect specific classes, which you can find here, simply write the ID number of the object in the list of classes.

Step 6: Save and Plot the result Image

cv2.imshow("Image", result_img)
cv2.imwrite("YourSavePath", result_img)
cv2.waitKey(0)

Step 7: Detecting Objects in Videos with YOLOv9

video_path = r"YourVideoPath"
cap = cv2.VideoCapture(video_path)
while True:
success, img = cap.read()
if not success:
break
result_img, _ = predict_and_detect(model, img, classes=[], conf=0.5)
cv2.imshow("Image", result_img)

cv2.waitKey(1)

Step 8: Save the result Video

# defining function for creating a writer (for mp4 videos)
def create_video_writer(video_cap, output_filename):
# grab the width, height, and fps of the frames in the video stream.
frame_width = int(video_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(video_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(video_cap.get(cv2.CAP_PROP_FPS))
# initialize the FourCC and a video writer object
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
writer = cv2.VideoWriter(output_filename, fourcc, fps,
(frame_width, frame_height))
return writer

Just use the function and code above

output_filename = "YourFilename"
writer = create_video_writer(cap, output_filename)

video_path = r"YourVideoPath"
cap = cv2.VideoCapture(video_path)
while True:
success, img = cap.read()
if not success:
break
result_img, _ = predict_and_detect(model, img, classes=[], conf=0.5)
writer.write(result_img)
cv2.imshow("Image", result_img)

cv2.waitKey(1)
writer.release()

Conclusion

In this tutorial we have learned how to detect objects with YOLOv9 in images and videos. If you found this code helpful, please clap your hands and comment on this post! I would also love for you to follow me to learn more about Data Science and other related topics. Thanks for reading!

References

Paper of YOLOv9: https://arxiv.org/abs/2402.13616

Github-Page of YOLOv9: https://github.com/WongKinYiu/yolov9

--

--

Mert

Bioinformatics grad, now Master's in Informatics. Passionate about Computer Vision & Deep Learning.