YOLOv4 訓練教學

Published in

謦伊的閱讀筆記

14 min readAug 6, 2020

YOLO (You Only Look Once) 是一個 one-stage 的 object detection 演算法，將整個影像輸入只需要一個 CNN 就可以一次性的預測多個目標物位置及類別，這種 end-to-end 的算法可以提升辨識速度，能夠實現 real-time 偵測並維持高準確度。

在做訓練之前，先來簡單介紹一下 YOLO~

YOLO 的作法就是將輸入的影像切割成 SxS 的網格 (grid)，若被偵測物體的中心落入某個網格內，這個網格就要負責去偵測該物體。而每個網格要負責預測B個 bounding boxes (bndBox，在 YOLO 的設計中，YOLOv1: B=2, YOLOv2: B=5, YOLOv3: B=3) 和屬於各別類別的機率 (假設有C個類別)，其中對每個 bndBox 的預測會輸出5個預測值: x, y, w, h 以及 confidence。

- x, y 代表該bndBox的中心座標與圖片寬高的比值，是bndBox歸一化後的中心座標- w, h代表該bndBox的寬高與輸入圖像寬高的比值，是bndBox歸一化後的寬高座標- confidence代表bndBox與Ground Truth的IOU值

若要看更詳細的 YOLO 介紹，可以去看以下的文章

YOLOv1、YOLOv2 📝

YOLOv3 📝

YOLOv4 📝

Evaluation metrics

YOLO 的評估指標主要採取 IOU 和 mAP

IOU (Intersection over Union)

即兩個 bndBox 的交集 / 兩個 bndBox 的聯集，也就是指 predict 的 bndBox與 Ground Truth 的 bndBox 的交集除以聯集，通常score > 0.5 就被認為是不錯的結果了~

mAP (mean Average Precision)

即各類別AP的平均值，而AP就是指PR curve (Precision-Recall curve) 的面積 (area under curve, AUC)。

PR curve 是以 Recall 為 X 軸、Precision 為 Y 軸所繪製成的曲線，Precision 及 Recall 越高，代表模型效能越好，因此越往右上角靠近越好。

Precision (準確率): TP / (TP + FP)
Recall (召回率): TP / (TP + FN)

接下來要介紹 Confusion Matrix 的四個指標: TP, TN, FP, FN

TP (True Positive): 實際為目標物件，也正確地預測出是目標物件，例如將一張貓咪的照片成功預測出是貓咪
TN (True Negative): 實際不為目標物件，也正確地預測出不是目標物件，例如將一張狗狗的照片成功預測出不是貓咪
FP (False Positive): 實際不為目標物件，但卻錯誤地預測成是目標物件，例如將一張狗狗的照片錯誤預測為是貓咪，也稱作 Type 1 Error
FN (False Negative): 實際為目標物件，但卻錯誤地預測成不是目標物件 (或是指沒預測出來的正樣本)，例如將一張貓咪的照片錯誤預測為不是貓咪，也稱作 Type 2 Error
在 object detection中，會將預測目標物件與 Ground Truth 做 IOU計算，若 IOU 大於閾值 (Threshold，通常設定0.5)，並且同一個 Ground Truth只能計算一次，則會被認為是一個 TP；若 IoU<=Threshold 的檢測框，或者是檢測到同一個 Ground Truth 的多餘檢測框的數量，則被認為是 FP；而都沒有被檢測到的Ground Truth的數量就是 FN。

YOLO training

接下來會示範如何使用 Darknet 來訓練 YOLO ~

Darknet 是 YOLO 作者自己寫的 deep learning framework，不過原作者因為一些因素不再繼續維護，改由俄羅斯的 AlexeyAB 接續，以下是 Darknet 的 github

AlexeyAB/darknet

Paper Yolo v4: https://arxiv.org/abs/2004.10934 More details: medium link Discussion: Reddit About Darknet framework…

github.com

在 Darknet github 裡可以看到 YOLOv4 的 AP 和 FPS 都比 YOLOv3 還要來得好! 是由 AlexeyAB 以及兩位台灣中研院的資訊科學研究所的研究員共同研發出來的。接下來就來使用 YOLOv4 tiny model 來做訓練吧~~ 以下是在 linux 下執行的

首先要先 git clone Darknet github

git clone https://github.com/AlexeyAB/darknet

進入到剛 clone下來的 darknet_master 裡去修改 Makefile，將 GPU, CUDNN, CUDNN_HALF, OPENCV 修改為1，預設值為0

sed -i "s/GPU=0/GPU=1/g" darknet/Makefile
sed -i "s/CUDNN=0/CUDNN=1/g" darknet/Makefile
sed -i "s/CUDNN_HALF=0/CUDNN_HALF=1/g" darknet/Makefile
sed -i "s/OPENCV=0/OPENCV=1/g" darknet/Makefile

修改後會呈現下圖的樣子

修改完後就可以編譯了!

cd darknet; make

接著創建資料夾來放檔案

➔ 建立 Face_detection 資料夾

cd ..; mkdir Face_detection
cd Face_detection

➔ 建立參數資料夾(cfg), 權重資料夾 (weights)，用來放訓練要用到的東西及訓練生成的權重，接著從 darknet 資料夾裡複製 face.data, face.names 放至 cfg 中

import os
import shutilif not os.path.exists(“Face_detection”):
 os.mkdir(“Face_detection”)if not os.path.exists(“Face_detection/cfg”):
 os.mkdir(“Face_detection/cfg”) 
 os.mkdir(“Face_detection/cfg/weights”)if not os.path.exists(“Face_detection/cfg/face.data”):
 shutil.copyfile(“darknet/cfg/coco.data”, “Face_detection/cfg/face.data”)if not os.path.exists(“Face_detection/cfg/face.names”):
 shutil.copyfile(“darknet/cfg/coco.names”, “Face_detection/cfg/face.names”)

準備資料集，label bndBox 的格式要轉為 txt 檔

🔹 我目前使用 WIDER FACE 資料集來做訓練，label bndBox 格式需要另外轉換成 txt 檔

下載下來後解壓縮，會得到 wider_face_split, WIDER_train, WIDER_val

unzip WIDER_train.zipunzip WIDER_val.zipunzip wider_face_split.zip

目前檔案路徑如下

轉換 label bndBox 格式為 txt 檔，可以參考以下文章

如何轉換為Yolo txt格式

Yolo 訓練的 label bndBox 格式是 txt 檔，因此在上篇文使用 WIDER FACE 資料集或是使用 PASCAL VOC xml 來做訓練的話，需要另外轉換格式。

medium.com

將要 train 及 validate 的檔案路徑寫入 txt 檔放進參數資料夾 cfg 裡 (這是因為我把 face.data 的 train, val 路徑設定在 cfg 資料夾裡)，並且把訓練集、驗證集的照片與 label 放入同一個資料夾，如下圖所示

更改剛剛從 darknet 複製過來的 face.data, face.names 內容

❗ names 檔案寫入要預測的 label，data 檔案寫入 classes 類別數量、train 的 bndBox txt 檔、valid 的 bndBox txt 檔、names 參數檔、backup (weights 檔) 的位置路徑

我是訓練人臉偵測只有一個 class， data 跟 names 檔案會是下面這樣~~

❗ face.data 內的路徑可以寫絕對路徑或是相對路徑

修改 yolov4-tiny.cfg

將 darknet/cfg/yolov4-tiny-custom.cfg 複製到參數資料夾 cfg 中，並改名為 yolov4-tiny-obj.cfg

若要使用其他 cfg 檔可以在 darknet/cfg 中尋找!

cp ../darknet/cfg/yolov4-tiny-custom.cfg cfg/yolov4-tiny-obj.cfg# 查看參數
sed -n -e 8p -e 9p -e 212p -e 220p -e 263p -e 269p cfg/yolov4-tiny-obj.cfg# 會呈現如下
width=416
height=416
filters=255
classes=80
filters=255
classes=80

❗ 在更改 yolov4-tiny-obj.cfg 之前，先算一下 filters 跟 classes 要更改為多少

yolov4 偵測的濾鏡(filter) 大小為 (C+5)*B- B 是每個Feature Map可以偵測的bndBox數量，這裡設定為3- 5 是bndBox輸出的5個預測值: x,y,w,h 以及 Confidence- C 是類別數量filters=(classes + 5)*3  # 因為是一個類別，所以filters更改為 18classes=1  #人臉偵測只有一個類別

更改 yolov4-tiny-obj.cfg 裡的 filters 跟 classes，也可以更改輸入圖片大小

sed -i '212s/255/18/' cfg/yolov4-tiny-obj.cfg
sed -i '220s/80/1/' cfg/yolov4-tiny-obj.cfg
sed -i '263s/255/18/' cfg/yolov4-tiny-obj.cfg
sed -i '269s/80/1/' cfg/yolov4-tiny-obj.cfg# 再次查看參數
sed -n -e 212p -e 220p -e 263p -e 269p cfg/yolov4-tiny-obj.cfg# 會呈現如下 
filters=18
classes=1
filters=18
classes=1

修改預設 anchors 值，可以使用以下指令 (記得更改參數 cfg/face.data, num_of_clusters, width, height)，是由 Darknet 官方寫好可以自動算出 anchors 值

cd ../darknet
./darknet detector calc_anchors ../Face_detection/cfg/face.data -num_of_clusters 6 -width 416 -height 416 -showpause

將 yolov4-tiny-obj.cfg 裡第 219, 268 行的 anchors 更改為輸出的值。因為 num_of_clusters 設為6，因此會有6組值 (如下圖)

yolov4-tiny-obj.cfg

下載 yolov4-tiny Darknet 官方事先訓練好的 weight (yolov4-tiny.conv.29) 放入 Face_detection/cfg 中，就可以開始做訓練啦!
若使用其他 cfg 檔可以從 darknet 官方 github 下載相對應的 weight，Search for weight ➔ 🔎

下圖是我檔案放置的位置路徑

訓練模型

./darknet detector train ./Face_detection/cfg/face.data ./Face_detection/cfg/yolov4-tiny-obj.cfg ./Face_detection/cfg/yolov4-tiny.conv.29 -dont_show

測試模型

等待訓練完後，就可以拿訓練好的 weight 來做預測啦! 而訓練好的 weights 會放在 cfg/weights 裡，然後打開 yolov4-tiny-obj.cfg，將net 裡Testing 的 batch, subdivisions 註解刪掉，並註解 Training 的 batch, subdivisions (如下圖)

./darknet detector test ../Face_detection/cfg/face.data ../Face_detection/cfg/yolov4-tiny-obj.cfg ../Face_detection/cfg/weights/yolov4-tiny-obj_final.weights ../Face_detection/IMG_001.jpg

預測完了就可以打開照片查看是否有預測到人臉~~預測的照片會放在 darknet 裡

from PIL import Image
Image.open("predictions.jpg")

計算 mAP

./darknet detector map ../Face_detection/cfg/face.data ../Face_detection/cfg/yolov4-tiny-obj.cfg ../Face_detection/cfg/weights/yolov4-tiny-obj_final.weights

計算 recall

./darknet detector recall ../Face_detection/cfg/face.data ../Face_detection/cfg/yolov4-tiny-obj.cfg ../Face_detection/cfg/weights/yolov4-tiny-obj_final.weights

這樣就完成 YOLO 訓練啦~~也可以訓練屬於自己的物件偵測哦!!

另外，我在我的 github 放上全部的程式碼，歡迎參考~