Training YOLOv3 Object Detection API with your own dataset

efkan duraklı
Sep 13, 2019 · 7 min read

Hi everyone,

In this article, I will tell how to train yolo v3 with your own data set. Before starting, I want to tell something about why am I writing this article, object detection, famous object detection APIs and how to train YOLOv3 with your own data set.

Why am I writing this article ?

A Turkish proverb says that “spoken words fly away, written words remain”.

What is object detection ?

Object detection is task in computer vision that involves identifying the presence, location and type of one or more objects in a given image. Image classification involves predicting the class of one object in an image. Object localization refers to identifying the location of one or more objects in an image and drawing a bounding box around their extent. Object detection combines these two tasks and localizes and classifies one or more objects in an image.

Famous Object Detection APIs

Deep learning is great method to solve object detection tasks. In this section, I will tell something very short about known deep learning based object detection libraries.

R-CNN Model Family

The R-CNN family of methods refers to the R-CNN, which may stand for “Region with CNN Features” or “Region-Based Convolutional Neural Network”, developed by Ross Girshick, et al.

  • R-CNN : Bounding boxes are proposed bu the “selective search” algorithm, each of which is stretched and features are extracted via a deep convulutional neural network, such as Alexnet, before a final set of object classifications are made with linear SVMs.
  • Fast R-CNN : Simplified design with a single model, bounding boxes are still specified as input, but region-of-interest pooling layer is used after the deep CNN to consolidate regions and the model predicts both class labels and regions of interest directly.
  • Faster R-CNN : Addition a Region Proposal Network that interprets features extracted from the deep CNN and learns to propose regions-of-interests directly.
  • Mask R-CNN : Extension of Faster R-CNN that add an output model for predicting a mask for each detected object.

SSD (Single Short Detector)

SSD is designed for object detection in real-time. Faster R-CNN uses a region proposal network to create boundary boxes and utilizes those boxes to classify objeccts. While it is considered the start-of-the-art in accuracy, the whole process runs at 7 frames per second. Far below what a real-time processing needs. SSD speeds up the process by eliminating the need of the region proposal network. To recover the drop in accuracy, SSD applies a few improvements including multi-scale features and default boxes. These improvements allow SSD to match the Faster R-CNN’s accuracy using lower resolution images, which further pushes the speed higher.

YOLO(You Only Look Once) Model Family

The YOLO, family of models are a series of end-to-end deep learning models designed for fast object detection, developed by Joseph Redmon, et al. and first described in the 2015 paper titled “You Only Look Once: Unified, Real-Time Object Detection.”

The approach involves a single deep convolutional neural network (originally GoogLeNet, later updated and called DarkNet based on VGG) that splits the input into a grid of cells and each cell directly predicts a bounding box and object classification. The result is a large number of candidate bounding boxes that are consolidated into a final prediction by a post-processing step.

There are three main variations of the approach; they are YOLOv1, YOLOv2 and YOLOv3. The first version proposed the general architecture, whereas the second version refined the design and made use of predefined anchor boxes to improve bounding box proposal, and version three further refined the model architecture and training process.

Although the accuracy of the models is close but not good as Region-Based Convolutional Networks(R-CNNs), they are popular for object detection because of their detection speed, often demonstrated in real-time on video or camera feed input.

How to train YOLOV3 with your own data set

Before starting training, you must install and compile open source neural networks library written in C called darknet.

Installing Darknet

Darknet is easy to install with only two optional dependencies:

  • OpenCV : if you want a wider variety of supported image types
  • CUDA : if you want GPU computation

First clone the Darknet git repository.

git clone
cd darknet

Compiling With CUDA

Darknet on the CPU is fast but it’s like 500 times faster on GPU! You’ll have to have an Nvidia GPU and you’ll have to install CUDA. I won’t tell CUDA installation in this article.

Once you have CUDA installed, change the first line of the Makefile in the base directory:


Compiling With OpenCV

By default, Darknet uses stb_image.h for image loading. If you want more support for weird formats(like CMYK jpegs) you can use OpenCV instead! OpenCV also allows you to view images and detections without having to save them to disk.

First install OpenCV and change the second line of the Makefile.


Preparing your data set

Before starting to train, you must prepare your data for object detection. To prepare your data, you can use LabelImg tool. In this tool, you can prepare your data in two different format; xml and txt. For yolov3, txt format must be. If you prepared your data in xml format, don’t worry. You can convert your xml format to txt format by using following python code easily.

import xml.etree.ElementTree as ET
import argparse
import os

# write your class names in your xml files
classes = ["Giriş olmayan yol", "Dur", "Durak", "Park etmek yasaktır", "Azami hız sınırlaması (30 km/saat)",
"Sağa dönülmez", "Sola dönülmez", "Park yeri", "İleri ve sağa mecburi yön", "Azami hız sınırlaması (20 km/saat)",
"Taşıt trafiğine kapalı yol", "İlerden sola mecburi yön", "İleriden sağa mecburi yön", "İleri ve sola mecburi yön", "Hız sınırlaması sonu (20 km/saat)"]

def convert(size, box):
dw = 1./size[0]
dh = 1./size[1]
x = (box[0] + box[1])/2.0
y = (box[2] + box[3])/2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)

def main(args):

dataset_path = args.dataset_path

annot_path = os.path.join(dataset_path, "annots")

filenames = os.listdir(annot_path)

if not os.path.exists(os.path.join(dataset_path, "labels")):
os.makedirs(os.path.join(dataset_path, "labels"))
for filename in filenames:
img_id = filename[:-4]
file_path = os.path.join(annot_path, filename)
in_file = open(file_path)
out_file = open(os.path.join(dataset_path, "labels", img_id + ".txt"), "w")
root = tree.getroot()
size = root.find('size')
w = int(size.find('width').text)
h = int(size.find('height').text)

for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text

if cls not in classes or int(difficult) == 1:
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
bb = convert((w,h), b)
out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

if __name__ == '__main__':

argparser = argparse.ArgumentParser()

'-d', '--dataset_path',
help = 'The path of dataset',
required = True)

args = argparser.parse_args()


Splitting your data train and test

You must create two file named train.txt and test.txt. These files must contain path of train and test images in your data set line by line. If you want to split your data, you can use the following function in python.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

Preparing YOLOv3 configuration files

YOLOv3 needs certain specific files to know how and what to train. You must create these three files(.data, .names, and .cfg) and I will explain yolov3.cfg and yolov3-tiny.cfg.

  • cfg/
  • cfg/traffic-sign.names

Let’s prepare the .data and .names file. Let’s start by creating and filling it with following content.

classes= 15
train = train.txt
valid = test.txt
names = traffic-sign.names
backup = backup/
  • classes : number of class in your data set
  • train : train file path
  • test : test file path
  • names : class names file
  • backup: where you want to store the yolo weights file

The traffic-sign.names looks like as follows. Every category should be on a new line, its line number should match the category number in .txt label files we created earlier.

Giriş olmayan yol 
Park etmek yasaktır
Azami hız sınırlaması (30 km/saat)
Sağa dönülmez
Sola dönülmez
Park yeri
İleri ve sağa mecburi yön
Azami hız sınırlaması (20 km/saat)
Taşıt trafiğine kapalı yol
İlerden sola mecburi yön
İleriden sağa mecburi yön
İleri ve sola mecburi
Hız sınırlaması sonu (20 km/saat)

Caution! : if you convert your data set xml format to txt format using above python code, the order of class names in .names file must be same in class list in python code.

Now we go to create the .cfg for choose the yolo architecture. If you have less configuration on GPU(less than 2GB GPU) you can use tiny-yolo.cfg or have good configuration of GPU(Greater than 4GB GPU) us yolov3.cfg.

Step 1(If you choose yolov3.cfg)

Copy the yolov3.cfg and save the file name as yolov3-traffic-sign.cfg and make the following edits

  • Line 3: set batch=24, this means we will be using 24 images for every training step
  • Line 4: set subdivisions=8, the batch will be divided by 8 to decrease GPU VRAM requirements.
  • Line 603: set filters=(classes + 5)*3 in our case filters=60
  • Line 610: set classes=15, the number of categories we want to detect
  • Line 689: set filters=(classes + 5)*3 in our case filters=60
  • Line 696: set classes=15, the number of categories we want to detect
  • Line 776: set filters=(classes + 5)*3 in our case filters=60
  • Line 783: set classes=15, the number of categories we want to detect

Step 2 (If you choose yolov3-tiny.cfg)

Copy the yolov3-tiny.cfg and save the file name as yolov3-tiny-traffic-sign.cfg

  • Line 3: set batch=24, this means we will be using 24 images for every training step
  • Line 4: set subdivisions=8, the batch will be divided by 8 to decrease GPU VRAM requirements.
  • Line 127: set filters=(classes + 5)*3 in our case filters=60
  • Line 135: set classes=15, the number of categories we want to detect
  • Line 171: set filters=(classes + 5)*3 in our case filters=60
  • Line 177: set classes=15, the number of categories we want to detect

Ok, you can start training,If you have successfully completed all the steps.

Before starting train, you must download pre-trained weights from here. After

Enter the following command into your terminal to start training.

./darknet detector train cfg/ cfg/yolov3-traffic-sign.cfg darknet53.conv.74

After training, you can easily test your model by typing following from your terminal

./darknet detector test cfg/ cfg/yolov3-traffic-sign.cfg yolov3.weights data/1.png

If you want to use yolov3 in your python code, you can use file in darknet/python folder. This code is written for python2. If you want to use yolov3 for python3, you must make minor changes

  • add parenthesis all print functions
  • and Put the letter b at the beginning of all strings given as parameters to functions. For example;
net = load_net(b"cfg/tiny-yolo.cfg", b"tiny-yolo.weights", 0)


Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade