Reproducing training performance of YOLOv3 in PyTorch (Part 0)

3 min readDec 20, 2018

Part 0 : Introduction

Hi, I’m Hiroto Honda, an AI R&D engineer at DeNA Co., Ltd. Japan.

On Dec. 6th, DeNA open-sourced a PyTorch implementation of YOLOv3 object detector . Our implementation reproduces training performance of the original implementation, which has been way more difficult than reproducing the test phase.

Why is it a big deal?

When you wish to train the state-of-the-art detector, you need to use a training system that maximizes the performance of the detector. If the training performance of the implementation is a few percent less than the accuracy reported in the paper, it is not state-of-the-art anymore. Although there are many repositories which reproduce inference of object detectors, it’s hard to find one that reproduces training.

Why PyTorch?

Re-implementing YOLO (originally written in C) in PyTorch is meaningful as the framework has benefits of both flexibility and performance. PyTorch is rapidly gaining its popularity, and recently released version 1.0 enables seamless move from research to production.

In this article, I would like to share what I know about YOLOv3 — especially how to train the detector with reproduced accuracy.

YOLO object detector

YOLO (You Only ‘Look’ Once) is one of the state-of-the-art detectors which are capable of localizing and classifying multiple objects in images (Fig. 1). What makes YOLO popular is that the detector is faster than two-stage detectors such as Faster R-CNN. While two-stage detectors propose object regions first and investigate the regions for object localization and classification, YOLO combines the two stages into one neural network. The simple architecture makes it possible to perform real-time detection. The details of the network will be shown in Part 1.

Fig. 1 Object detection result using our PyTorch_YOLOv3.

Training the detector is more complex than testing it. We have to train multiple tasks simultaneously — object localization, classification and ‘objectness’ (confidence). If the loss functions for the tasks are not correctly applied, accuracy of the detector significantly drops from the reported average precision.

I have analyzed the official implementation and figured out what must be satisfied for quantitative reproduction. Our re-implementation in PyTorch has successfully achieved it and the COCO validation accuracy during training is comparable with that of the original implementation of YOLOv3 (darknet) as shown in Fig. 2.

Fig. 2 Comparison of validation average precision during training between darknet and PyTorch_YOLOv3.

In the following parts, I would like to share the secrets for training YOLOv3 :

Part 1: Network Architecture and channel elements of YOLO layers

Part 2: How to assign ground-truth targets

Part 3: What are the actual loss functions?

— coming soon !

Check out our PyTorch implementation of YOLOv3!!

Author’s project page / Original Implemantation / Paper

Thank you for reading, see you again in Part 1.

Reproducing training performance of YOLOv3 in PyTorch (Part 0)

Part 0 : Introduction

Written by Hiroto Honda