Published in


YOLOv3 : A Machine Learning Model to Detect the Position and Type of an Object

This is an introduction to「YOLOv3」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.


YOLOv3 is an deep learning model for detecting the position and the type of an object from the input image. It can classify objects in one of the 80 categories available (eg. car, person, motorbike…), and compute bounding boxes for those objects from a single input image.

Below is a sample video of YOLOv3 recognition. It is able to detect cars, trucks, people, handbacks, and more.

Example applications of this model include counting people entering a store, monitoring occupancy ratio in a restaurant or road traffic, detecting abandoned bicycles, and detecting access to dangerous areas.

YOLOv3 was release in April 2018.

Recognizable categories

YOLOv3 has been trained on the MS-COCO dataset and can identify the following 80 categories

coco_category=[ “person”, “bicycle”, “car”, “motorcycle”, “airplane”, “bus”, “train”, “truck”, “boat”, “traffic light”, “fire hydrant”, “stop sign”, “parking meter”, “bench”, “bird”, “cat”, “dog”, “horse”, “sheep”, “cow”, “elephant”, “bear”, “zebra”, “giraffe”, “backpack”, “umbrella”, “handbag”, “tie”, “suitcase”, “frisbee”, “skis”, “snowboard”, “sports ball”, “kite”, “baseball bat”, “baseball glove”, “skateboard”, “surfboard”, “tennis racket”, “bottle”, “wine glass”, “cup”, “fork”, “knife”, “spoon”, “bowl”, “banana”, “apple”, “sandwich”, “orange”, “broccoli”, “carrot”, “hot dog”, “pizza”, “donut”, “cake”, “chair”, “couch”, “potted plant”, “bed”, “dining table”, “toilet”, “tv”, “laptop”, “mouse”, “remote”, “keyboard”, “cell phone”, “microwave”, “oven”, “toaster”, “sink”, “refrigerator”, “book”, “clock”, “vase”, “scissors”, “teddy bear”, “hair drier”, “toothbrush”]

YOLOv3 “Tiny” model

There are two types of YOLOv3 models: the standard model, which has a high recognition accuracy, and the tiny model, which has a slightly lower recognition accuracy, but runs faster.

The mAP (accuracy) of the standard model YOLOv3–416 is 55.3 and the mAP of the tiny model is 33.1. The FLOPS (computational power) are 65.86 Bn and 5.56 Bn, respectively.

Using YOLOv3 with ailia SDK

The ailia SDK supports both models with the Detector API since version 1.2.1. ailia SDK makes it possible to use YOLOv3 from Python and Unity on Windows, Mac, iOS, Android and Linux.

This is a sample of running YOLOv3 using ailia SDK and Python.

This is a sample of running YOLOv3 using ailia SDK and Unity.

In case you are using YOLOv3 with the ailia SDK in Python, you need to first load the model using the ailia.Detector API, feed in an image using the compute API, then simply get the count of detected objects using the get_object_count API, and the bounding box and categories using the get_object API.

When using YOLOv3 from Unity, use the AiliaDetectorModel class to load the model, then use the ComputeFromImage API to feed it an image, and finally get a list of detected objects, bounding boxes and categories.

The Unity Package of the ailia SDK includes scenes that use YOLOv3 and can be used out-of-the-box on Windows, Mac, Linux, iOS and Android.

Training the model on your own dataset

Now we will discuss how to run YOLOv3 trained on our own dataset using the ailia SDK.

YOLOv3 was developed using Darknet.

A Keras implementation is available in the following repositories and can be used to convert Darknet models into a form that can be used in the ailia SDK or re-trained on your own dataset.

Models trained with Darknet can be converted to hdf5 files using keras-yolo3 with the following command. hdf5 to ONNX can be found in the next section.

python yolov3.cfg yolov3.weights model_data/yolo.h5
python [OPTIONS…] — image, for image detection mode, OR
python [video_path] [output_path (optional)]

To retrain the model on your own dataset, use the script from the keras-yolo3 repository. The format of the annotated data needed to train keras-yolo3 is: one line per image, each line containing the path to the image file, the coordinates of the bounding box (x1,y1)-(x2,y2), followed by the category index. If there are multiple bounding boxes per image, separated each one of them with a space. The coordinates will be in pixel coordinates.

image_path x1,y1,x2,y2,category x1,y1,x2,y2,category

Next, create a text file containing the categories. Categories are listed in order starting at index 0. For example if one category is for face of people, you can simply list the category as below.


With those files ready you can now start the training.


The paths of those input files being hardcoded in the script, make sure to update them before running the script.

annotation_path = ‘train.txt’ # update here
log_dir = ‘logs/000/’
classes_path = ‘model_data/voc_classes.txt’ # update here
anchors_path = ‘model_data/yolo_anchors.txt’

Next you can create an hdf5 file.

Conversion from hdf5 to ONNX

To be used with ailia SDK, you need to convert hdf5 files to ONNX, using the following code from keras2onnx.

See the following repository for examples of face recognition using YOLOv3 with FDDB.

Using the repository above, you can convert in ONNX with the following command. Please use Keras 2.2.4, Tensorflow 1.13.2, and keras2onnx 1.5.1 for the conversion.

ax Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store