Introduction
In a previous blog post, we explored object detection with YOLOv8. Now, we’re thrilled to delve into the latest iteration — YOLOv9! This new version promises significant advancements in accuracy, efficiency, and applicability, making it a powerful tool for various computer vision tasks.
YOLOv9, like its predecessor, focuses on identifying and pinpointing objects within images and videos. Applications such as self-driving cars, security systems, and advanced image search rely heavily on this capability. YOLOv9 introduces even more impressive innovations than YOLOv8.
How to use YOLOv9 for images and videos
Step 1: Installing the necessary libraries
pip install opencv-python ultralytics
Step 2: Importing libraries
import cv2
from ultralytics import YOLO
Step 3: Choose your model
model = YOLO("yolov9c.pt")
On this website, you can compare different models and weigh up their respective advantages and disadvantages. In this case we have chosen yolov9c.pt.
Step 4: Write a function to predict and detect objects in images and videos
def predict(chosen_model, img, classes=[], conf=0.5):
if classes:
results = chosen_model.predict(img, classes=classes, conf=conf)
else:
results = chosen_model.predict(img, conf=conf)
return results
def predict_and_detect(chosen_model, img, classes=[], conf=0.5, rectangle_thickness=2, text_thickness=1):
results = predict(chosen_model, img, classes, conf=conf)
for result in results:
for box in result.boxes:
cv2.rectangle(img, (int(box.xyxy[0][0]), int(box.xyxy[0][1])),
(int(box.xyxy[0][2]), int(box.xyxy[0][3])), (255, 0, 0), rectangle_thickness)
cv2.putText(img, f"{result.names[int(box.cls[0])]}",
(int(box.xyxy[0][0]), int(box.xyxy[0][1]) - 10),
cv2.FONT_HERSHEY_PLAIN, 1, (255, 0, 0), text_thickness)
return img, results
predict()
function
This function takes three arguments:
chosen_model
: The trained model to use for predictionimg
: The image to make a prediction onclasses
: (Optional) A list of class names to filter predictions toconf
: (Optional) The minimum confidence threshold for a prediction to be considered
The function first checks if the classes
argument is provided. If it is, then the chosen_model.predict()
method is called with the classes
argument, which filters the predictions to only those classes. Otherwise, the chosen_model.predict()
method is called without the classes
argument, which returns all predictions.
The conf
argument is used to filter out predictions with a confidence score lower than the specified threshold. This is useful for removing false positives.
The function returns a list of prediction results, where each result contains the following information:
name
: The name of the predicted classconf
: The confidence score of the predictionbox
: The bounding box of the predicted object
predict_and_detect()
function
This function takes the same arguments as the predict()
function, but it also returns the annotated image in addition to the prediction results.
The function first calls the predict()
function to get the prediction results. Then, it iterates over the prediction results and draws a bounding box around each predicted object. The name of the predicted class is also written above the bounding box.
The function returns a tuple containing the annotated image and the prediction results.
Here is a summary of the differences between the two functions:
- The
predict()
function only returns the prediction results, while thepredict_and_detect()
function also returns the annotated image. - The
predict_and_detect()
function is a wrapper around thepredict()
function, which means that it calls thepredict()
function internally.
Step 5: Detecting Objects in Images with YOLOv9
# read the image
image = cv2.imread("YourImagePath")
result_img, _ = predict_and_detect(model, image, classes=[], conf=0.5)
If you want to detect specific classes, which you can find here, simply write the ID number of the object in the list of classes.
Step 6: Save and Plot the result Image
cv2.imshow("Image", result_img)
cv2.imwrite("YourSavePath", result_img)
cv2.waitKey(0)
Step 7: Detecting Objects in Videos with YOLOv9
video_path = r"YourVideoPath"
cap = cv2.VideoCapture(video_path)
while True:
success, img = cap.read()
if not success:
break
result_img, _ = predict_and_detect(model, img, classes=[], conf=0.5)
cv2.imshow("Image", result_img)
cv2.waitKey(1)
Step 8: Save the result Video
# defining function for creating a writer (for mp4 videos)
def create_video_writer(video_cap, output_filename):
# grab the width, height, and fps of the frames in the video stream.
frame_width = int(video_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(video_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(video_cap.get(cv2.CAP_PROP_FPS))
# initialize the FourCC and a video writer object
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
writer = cv2.VideoWriter(output_filename, fourcc, fps,
(frame_width, frame_height))
return writer
Just use the function and code above
output_filename = "YourFilename"
writer = create_video_writer(cap, output_filename)
video_path = r"YourVideoPath"
cap = cv2.VideoCapture(video_path)
while True:
success, img = cap.read()
if not success:
break
result_img, _ = predict_and_detect(model, img, classes=[], conf=0.5)
writer.write(result_img)
cv2.imshow("Image", result_img)
cv2.waitKey(1)
writer.release()
Conclusion
In this tutorial we have learned how to detect objects with YOLOv9 in images and videos. If you found this code helpful, please clap your hands and comment on this post! I would also love for you to follow me to learn more about Data Science and other related topics. Thanks for reading!
References
Paper of YOLOv9: https://arxiv.org/abs/2402.13616
Github-Page of YOLOv9: https://github.com/WongKinYiu/yolov9