My Model performs bad and I don’t know why? (YOLO computer vision)

Daniel Guo
7 min readAug 17, 2021

What do I want to achieve?

I started a new project about 2 months ago, trying to identify patterns in the delivery of my mail. For that, I record the time when the postman delivers my mail.

The data includes the postman, vehicle, time, date, weather (temperature, wind, air pressure, etc.). With this data I try to recognize and predict patterns in the delivery, maybe?

Examples:

  • If it rains on a Monday, my mail won’t arrive before 12 noon?
  • Whenever vehicle XY has been used, the mail never arrives before 2 p.m.
  • Postman X always takes the longest.

In the beginning I sat in front of the window, around noon to look for the postman. As soon as he showed up, I wrote down the time, the vehicle and the postman.

This was time consuming and I never knew whether the postman was already there (and I had missed him) or whether he would still come.

Automate boring stuff

Since this is insanely inefficient, I installed a vibration sensor* in my mailbox that automatically sends a notification (to my phone) as soon as the mailbox is opened.

If you want to try this at home, here are a few tips:

  • In order to use the vibration sensor, you will need a Aqara hub*.
  • If your mailbox is made of metal, try to install the hub as close as possible to the sensor.

With this solution, I no longer had to stick to my window and could relax and wait for the message on my cell phone.

But there are also various problems here.

  • What happens if I don’t get any mail at all?
  • How do I track the postman and the vehicle? (I don’t see them anymore)

In order to capture the vehicle and the postman, I ran to the window every time I received a push notification.

That made my life a little easier, but there has to be something better?

(Further) automate boring stuff

So I bought a CCTV camera* and installed it in front of my door.

Look at that cute hedgehog exploring my garden

Every time I receive a notification, I look into the CCTV camera app to see who is at the door.

Therefore I don’t have to be at home to collect the data. That works pretty good, but I still have no information on days when I don’t get any mail.

I got lucky once and saw the postcar driving by

Let’s go!

My plan was to train a model that automatically recognizes the postman’s car. So I can review the footage to see when he crossed my house.

However, in order to be able to recognize the postman’s car, I need training data.

My first Model

I used the YOLOv5 framework because it works quickly and easily. For my first model, I took screenshots of the postman’s car. The postman only has two different cars so far. A white van and a yellow electric car.

Electric car
White van

Train Data (First Model)

Download Train data

I used these pictures (approx. 30 different) to train my model on two classes (white van and electric car). I used makesense.ai to label the data.

Parameters

  • img: 640 (I didn’t change this parameter, because I didn’t know how this affects my model)
  • batchsize: 64 (thats the biggest possible number I could use. I trained all models local on my machine. I tried it on Google Colab multiple times, but most of the time, it crashed and I had to start over)
  • epochs: 80 (I later found out, that the Yolo Documentation suggest 300 epochs at first)

python train.py — img 640 — batch 64 — epochs 80 — data custom.yaml — weights yolov5s.pt — cache

Results

I took a Video (that the Model haven’t seen before) to test how good it performs.

Only a confidence score of 0.11

To detect the car I used the best.pt model. But how do I know, which epoch this is? Can I use the weights of a specific epoch (e.g. Epoch 2)?

For all models I always used the best.pt weights to detect the cars

Bounding Boxes

(How is this helpful?) Did I use it wrong?

Bounding Boxes at Epoch 79
Bounding Box at Epoch 7

Metrics

How can I use this data to improve my model?

I don’t even know what this is (80 Epochs)
results (80 Epochs)
confussion matrix (80 epochs)
F1 Score (80 Epochs)
PR (80 Epochs)
p Curve (80 Epochs)
r curve (80 Epochs)

Model Version 2

This time, I added more data to the model. Because the amount of pictures of the postcar is limited and fairly small. I created a new class with “other cars”. If the model doesn’t know how a postcar looks like, I can teach it how it doesn’t look like. Therefore I took around 500 pictures of “other cars” of my footage and labeled them as “others”.

Additionally I created screenshots of every single position of the postcar.

And added a lot of external pictures to have even more data to train.

Parameters

  • img: 640 (same as above)
  • batchsize: 64 (same as above)
  • epochs: 300

Results

The results are bad. Once again, I used the best.pt weights to detect those cars.

Its hard to see on the video, but both of them have a confidence score less than 0.10

Bounding Boxes

I think this time, the bounding boxes worked better.

Metrics

Metrics Model V2 (300 Epochs)
Train Model V2 (300 Epochs)
X (300 Epochs)
results Model V2 (300 Epochs)
confussion matrix
F1 Model V2 (300 Epochs)
PR Model V2 (300 Epochs)
p curve Model V2 (300 Epochs)
r curve Model V2 (300 Epochs)

Things that I tried:

  • reduce the epoch Size (150, 80, 50, 20 and 10)
  • reduced “other cars” pictures from 500 to 200 to 100 and eventually 0
  • removed pictures of external postcars
  • removed pictures of postcars that are almost similar or where you barely see the car

Unfortunately none of the above improved my model. What did I do wrong?

*I get commissions for purchases made through links in this post.

--

--