Cardboard Box Detection and Localization using RetinaNet (Keras)

4 min readJul 3, 2019

Keras RetinaNet is keras implementation of RetinaNet object detection as described in Focal Loss for Dense Object Detection Paper by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar (https://arxiv.org/abs/1708.02002).

We are going to train a model using keras retinanet to detect and localize a custom object : “cardboard box” from the image.

Step 1. Dataset Preparation

In order to detect cardboard boxes in the image, I have prepared several cardboard box images :

The dataset can be downloaded from :

https://jfids.jasaplus.com/repo/cardboard.tar.bz2

https://jfids.jasaplus.com/repo/cardboard2.tar.bz2

Keras Retinanet will resize any input image before input layer, the input image will be resized into an image with the maximal height 800px. For example, when we give input image 640×480 it will be resized into 1067×800. When we give input image 800×600, it will be resized into 1067×800.

This dataset was created using gimp and labelImg. It contains 640×480 images. We are going to train a new model using this dataset to detect image from webcam with 640×480 resolution.

Gimp was used to extract only relevant part of the image using lasso tool

labelImg was used for labelling specific areas within image that contains our object. Do not use too much margin when labelling the object since the quality of our detection depends on our dataset quality.

Step 2. Converting the Dataset from Pascal VOC Format into CSV Annotation Format for Keras RetinaNet

Next, we need to convert the dataset from voc into csv. We are going to use conv_pascal_to_csv, this utility can be cloned from github url :

https://github.com/akchan/conv_pascal_to_csv.git

$ ruby conv_pascal_to_csv.rb –help
Usage: conv_pascal_to_csv [options]
–annotation-path PATH path to xml annotations directory
–image-path PATH path to jpeg images directory
–val-ratio float sample ratio for validation (0.0–1.0). default=0.1
$ ruby conv_pascal_to_csv.rb –annotation-path /home/ringlayer/Desktop/convnet/keras-retinanet/datasets/cardboard2/voclabels –image-path /home/ringlayer/Desktop/convnet/keras-retinanet/datasets/cardboard2/images

Once the conversion finished, we will get 3 files in csv directory :

$ ls csv
annotations.csv classes.csv val_annotations.csv

I replaced image paths at annotations.csv and val_annotations.csv with absolute paths of my dataset location :

Step 3. Start Training New Model

In order to train new model, we can do transfer learning by using previously trained weight. Here, I use Resnet50 as a backbone to train new model :

python3 keras_retinanet/bin/train.py –freeze-backbone –random-transform –weights weights/ResNet-50-model.keras.h5 –batch-size 8 –steps 500 –epochs 15 csv csv/annotations.csv csv/classes.csv

These parameters required since we were not feeding the neural net at once with all training images, instead we will divide it into several epochs (each epoch will be divided into 500 steps)

–batch-size 8

We used batch size 8, The batch size defines the number of images that will be propagated through the network.

–epochs 15

One epoch means an entire dataset is passed forward and backward through the neural network. Since we used 15 epochs, it means it will need 15 epochs to complete the entire dataset provided at annotations.csv.

–steps 500

We use 500 steps on each epoch. This is the number of batch iterations before a training epoch is considered finished.

Here I trained the model using gtx1080 with 8 gb vram :

This configuration takes about 7gb vram

In order to fine tune new model training we can use a gpu with 11 gb vram.

The trained weight on each epoch will be saved to snapshots directory. We are going to use resnet50_csv_15.h5 for our final model to detect cardboard (I renamed resnet50_csv_15.h5 into cb20.h5).

The source code for testing this model on webcam and video can be fetched from :

https://jfids.jasaplus.com/repo/keras-retinanet-custom-object-detection.tar.bz2

Testing our model to detect cardboard in video and image :

Cardboard Box Detection and Localization using RetinaNet (Keras)

Written by Antonius Freenergi