Day 104(DL) —Implementing an object Tracker for German Shepherd — part2
Citation: All the videos/images used in this post are taken from https://www.pexels.com/search/videos/german%20shepherd/
In the part1 blog, we understood how to set up the data pipeline (i.e) converting the downloaded videos to frames and annotating the images(bb boxes) using CVAT. Now we are all set for the implementation of the object detection model.
step1: Gathered 13 videos from pexels.com. One point to be taken care of here is, when we download the videos there should be some similarity between the train, validation and test set videos. As the object detection model can only identify the new data with prior knowledge, choosing a completely different video(different angles, different backgrounds and size) for the test set may not be effective.
step2: Converted all the videos to frames along with resizing the individual frame. As these are high-quality videos with higher width and height, using the original information might consume a lot of time during the image loading into CVAT as well as setting up the folder structure for training. So to make things simple, have reduced the width & height by 3 times the original.
Step3: Selected around 15 frames(from each video) with some sufficient interval gap, so that the selected frames carry some individual identity rather than duplication. Uploaded each image into CVAT for annotation.
Step4: Since we are employing yolov5 for creating an object detection model, the annotated images are split into three groups train, valid and test. we can refer to the link for setting up the folder structure of yolov5. The labels are downloaded from CVAT in the Yolo format.
Step5: Let’s start the model building by referring to the previous blog. The only update needed here is the number of classes given in the data.yaml file. Since our objective is to identify German-Shepherd, we have only one target label. The same information can be updated in the data.yaml file accordingly.
Step6: Once we train our model using train.py, the next step would be to detect the test video/images. Predicting videos is also similar to images, just that we need to load the video under test -> images (labels folder is basically not required for the test, so dropping it will not harm the process).
Step7: When the prediction is completed, we can see the annotated video/image under runs → detect -> exp.
Since the dataset included for training is only a small subset of around 100 images, the accuracy level is medium(with some extra glitches). As we expand the training samples, the predicting power of the model proportionally increases.
Now that we’ve modelled a simple object detector that can identify German_Shepherd. The same logic could be applied for other images/videos and the only prime point is the training images should be altered according to the need.