My Model performs bad and I don’t know why? (YOLO computer vision)
What do I want to achieve?
I started a new project about 2 months ago, trying to identify patterns in the delivery of my mail. For that, I record the time when the postman delivers my mail.
The data includes the postman, vehicle, time, date, weather (temperature, wind, air pressure, etc.). With this data I try to recognize and predict patterns in the delivery, maybe?
Examples:
- If it rains on a Monday, my mail won’t arrive before 12 noon?
- Whenever vehicle XY has been used, the mail never arrives before 2 p.m.
- Postman X always takes the longest.
In the beginning I sat in front of the window, around noon to look for the postman. As soon as he showed up, I wrote down the time, the vehicle and the postman.
This was time consuming and I never knew whether the postman was already there (and I had missed him) or whether he would still come.
Automate boring stuff
Since this is insanely inefficient, I installed a vibration sensor* in my mailbox that automatically sends a notification (to my phone) as soon as the mailbox is opened.
If you want to try this at home, here are a few tips:
- In order to use the vibration sensor, you will need a Aqara hub*.
- If your mailbox is made of metal, try to install the hub as close as possible to the sensor.
With this solution, I no longer had to stick to my window and could relax and wait for the message on my cell phone.
But there are also various problems here.
- What happens if I don’t get any mail at all?
- How do I track the postman and the vehicle? (I don’t see them anymore)
In order to capture the vehicle and the postman, I ran to the window every time I received a push notification.
That made my life a little easier, but there has to be something better?
(Further) automate boring stuff
So I bought a CCTV camera* and installed it in front of my door.
Every time I receive a notification, I look into the CCTV camera app to see who is at the door.
Therefore I don’t have to be at home to collect the data. That works pretty good, but I still have no information on days when I don’t get any mail.
Let’s go!
My plan was to train a model that automatically recognizes the postman’s car. So I can review the footage to see when he crossed my house.
However, in order to be able to recognize the postman’s car, I need training data.
My first Model
I used the YOLOv5 framework because it works quickly and easily. For my first model, I took screenshots of the postman’s car. The postman only has two different cars so far. A white van and a yellow electric car.
Train Data (First Model)
I used these pictures (approx. 30 different) to train my model on two classes (white van and electric car). I used makesense.ai to label the data.
Parameters
- img: 640 (I didn’t change this parameter, because I didn’t know how this affects my model)
- batchsize: 64 (thats the biggest possible number I could use. I trained all models local on my machine. I tried it on Google Colab multiple times, but most of the time, it crashed and I had to start over)
- epochs: 80 (I later found out, that the Yolo Documentation suggest 300 epochs at first)
python train.py — img 640 — batch 64 — epochs 80 — data custom.yaml — weights yolov5s.pt — cache
Results
I took a Video (that the Model haven’t seen before) to test how good it performs.
To detect the car I used the best.pt model. But how do I know, which epoch this is? Can I use the weights of a specific epoch (e.g. Epoch 2)?
Bounding Boxes
(How is this helpful?) Did I use it wrong?
Metrics
How can I use this data to improve my model?
Model Version 2
This time, I added more data to the model. Because the amount of pictures of the postcar is limited and fairly small. I created a new class with “other cars”. If the model doesn’t know how a postcar looks like, I can teach it how it doesn’t look like. Therefore I took around 500 pictures of “other cars” of my footage and labeled them as “others”.
Additionally I created screenshots of every single position of the postcar.
And added a lot of external pictures to have even more data to train.
Parameters
- img: 640 (same as above)
- batchsize: 64 (same as above)
- epochs: 300
Results
The results are bad. Once again, I used the best.pt weights to detect those cars.
Bounding Boxes
I think this time, the bounding boxes worked better.
Metrics
Things that I tried:
- reduce the epoch Size (150, 80, 50, 20 and 10)
- reduced “other cars” pictures from 500 to 200 to 100 and eventually 0
- removed pictures of external postcars
- removed pictures of postcars that are almost similar or where you barely see the car
Unfortunately none of the above improved my model. What did I do wrong?
*I get commissions for purchases made through links in this post.