如何轉換為Yolo txt格式

李謦伊

Published in

謦伊的閱讀筆記

7 min readAug 8, 2020

Yolo 訓練的 label bndBox 格式是 txt 檔，因此在上篇文使用 WIDER FACE 資料集或是使用 PASCAL VOC xml 來做訓練的話，需要另外轉換格式。

先來介紹 Yolo 的格式，它是由class id, 歸一化後的 x, y 中心座標及歸一化後的w, h 所組成，資料會呈現下圖的樣子

複習一下上一篇文講到的 Yolo 格式 x, y, w, h

- x, y 代表該bndBox的中心座標與圖片寬高的比值，是bndBox歸一化後的中心座標- w, h代表該bndBox的寬高與輸入圖像寬高的比值，是bndBox歸一化後的寬高座標

定義照片中的座標

xmin, ymin (x1, y1) 是 bndBox 左上角的座標位置，依序從右至下算 bndBox 的座標，bw, bh 是 bndBox 的寬與高

歸一化後 Yolo 格式的公式如下，w, h 是照片本身的寬與高

WIDER FACE label 轉換

接著來介紹 WIDER FACE 資料集的 label 格式，是由圖片位置路徑, 邊框的數量, 以及邊框的屬性組成的，其中邊框的屬性有: x1, y1, w, h, blur, expression, illumination, invalid, occlusion, pose

資料集: http://shuoyang1213.me/WIDERFACE/

- x1, y1 是指bndBox的左上角座標，w, h是指bndBox的寬高
 
- blur 是指照片的模糊程度: 0清晰、1一般、2嚴重- expression 是指照片的表情: 0正常、1誇張- illumination 是指照片的曝光程度: 0正常、1極度- invalid 是指照片是否無效: 0否、1是- occlusion 是指照片是否有被遮擋: 0無、1部分、2大量- pose 是指照片的姿勢: 0正常，1非典型

了解格式後就可以做轉換啦~~或是也可以直接使用以下我提供的程式碼，我的檔案放置位置如下圖。其中 cfg 是自己創建的資料夾，用來放生成的 train.txt, val.txt (會寫入所有訓練集路徑)。而 wider_face_split, WIDER_train, WIDER_val 則是 WIDER FACE 資料集的訓練集、驗證集、label

❗️ 因為我的資料集類別只有一個，所以第80行 f.write(‘0 %s %s %s %s\n’ % (x, y, w, h)) 的第一個參數是0

轉換好的格式會放在 yolo_train, yolo_val 資料夾裡，所有 train, validate 的圖片路徑 txt 檔則會放在 cfg 資料夾裡 (train.txt, val.txt)。轉換完就可以開始訓練囉!!

PASCAL VOC xml 轉換

以下就是 VOC xml 的格式，<annotation> 裡的 <size> 是指照片的資訊 width, height, depth，<object> 是偵測的物件資訊: 物件類別 name, bndBox xmin, xmax, ymin, ymax，也就是指 bndBox 的 x, y 座標的最小與最大值

<annotation>
 <folder>VOC2012</folder>
 <filename>2007_000027.jpg</filename>
 <source>
  <database>The VOC2007 Database</database>
  <annotation>PASCAL VOC2007</annotation>
  <image>flickr</image>
 </source>
 <size>
  <width>486</width>
  <height>500</height>
  <depth>3</depth>
 </size>
 <segmented>0</segmented>
 <object>
  <name>person</name>
  <pose>Unspecified</pose>
  <truncated>0</truncated>
  <difficult>0</difficult>
  <bndbox>
   <xmin>174</xmin>
   <ymin>101</ymin>
   <xmax>349</xmax>
   <ymax>351</ymax>
  </bndbox>
  <part>
   <name>head</name>
   <bndbox>
    <xmin>169</xmin>
    <ymin>104</ymin>
    <xmax>209</xmax>
    <ymax>146</ymax>
   </bndbox>
  </part>
  <part>
   <name>hand</name>
   <bndbox>
    <xmin>278</xmin>
    <ymin>210</ymin>
    <xmax>297</xmax>
    <ymax>233</ymax>
   </bndbox>
  </part>
  <part>
   <name>foot</name>
   <bndbox>
    <xmin>273</xmin>
    <ymin>333</ymin>
    <xmax>297</xmax>
    <ymax>354</ymax>
   </bndbox>
  </part>
  <part>
   <name>foot</name>
   <bndbox>
    <xmin>319</xmin>
    <ymin>307</ymin>
    <xmax>340</xmax>
    <ymax>326</ymax>
   </bndbox>
  </part>
 </object>
</annotation>