An Introduction to Implementing Retinanet in Keras for Multi Object Detection on Custom Dataset

5 min readJul 6, 2019

With advancements in Deep Learning, many new approaches for object detection have been introduced and detection models like YOLO (You Only Look Once) have become obsolete.

Object Detection models like SNIPER, Retinanet and Trident have left Faster-RCNN and YOLO far behind.

A great post to get a basic understanding of how Retinanet works can be found here.

Code

The repository you need to download is fizyr/keras-retinanet. I have tested this code on both Ubuntu and Windows.

For Windows users, you’ll have to install Anaconda environment and will be needing Microsoft Visual Basic C++ Tools.

After downloading the repository, go to your Anaconda terminal in Windows or terminal in Ubuntu and navigate to the repository folder and type:

$ pip install . --user

This line installs the keras-retinanet package locally on your computer.

NOTE: This process for installing the package needs to be done whenever any change is made in the actual code files.

You’ll have access to the following commands on your terminal after installing the package:

retinanet-train
retinanet-evaluate
retinanet-debug
retinanet-convert-model

Annotation

Two csv files are needed. The first one containing the path, bounding box and class name of each image. The second file should only contain the class name and their mapping.

The CSV file with annotations should contain one annotation per line. Images with multiple bounding boxes should use one row per bounding box. Note that indexing for pixel values starts at 0. The expected format of each line is:

path/to/image.jpg,x1,y1,x2,y2,class_name

Some images may not contain any labeled objects. To add these images to the dataset as negative examples, add an annotation where x1, y1, x2, y2 and class_name are all empty:

path/to/image.jpg,,,,,

A full example would be:

/data/imgs/img_001.jpg,837,346,981,456,cow
/data/imgs/img_002.jpg,215,312,279,391,cat
/data/imgs/img_002.jpg,22,5,89,84,bird
/data/imgs/img_003.jpg,,,,,

The class name to ID mapping file should contain one mapping per line. Each line should use the following format:

class_name,id

Indexing for classes starts at 0. Do not include a background class as it is implicit.

For example:

cow,0
cat,1
bird,2

I used labelImg to first do the annotations in pascal voc xml format and then I converted that to txt using this script. Just rename .txt to .csv for getting it to csv format.

Debugging

Before you start the training, you need to check whether the annotations you have done are correct or not. You can do this by typing the following in the terminal:

$ retinanet-debug csv path/to/annotation.csv path/to/classes.csv

Annotations are colored in green when there are anchors available and colored in red when there are no anchors available. If an annotation doesn’t have anchors available, it means it won’t contribute to training. It is normal for a small amount of annotations to show up in red, but if most or all annotations are red there is cause for concern. The most common issues are that the annotations are too small or too oddly shaped (stretched out).

Training

After that huge and arduous task of labeling and debugging your data. You can now finally begin training. Before you start, you need to know that you have certain options available which are but not limited to:

Backbone architecture (By default this is set to resnet50 but you can also use resnet101, resnet152. You can find all the options you have in keras-retinanet/models directory in the repository)
Weights (By default it will download imagenet weights for training if not specified)
Epochs (By default this is set to 50)
Save directory for weights and tensorboard logs are set to the home directory by default but you can change this as well

The commands for these essential things are given below:

--backbone
--weights
--epochs
--snapshot-path
--tensorboard-dir

For more details on how these commands work and what more options you have, you can refer to keras-retinanet/bin/train.py file in the repository.

A basic example on how you would be using these with the retinanet-train command is:

retinanet-train --backbone resnet101 --weights path/to/weights.h5 --epochs 100 --snapshot-path path/to/save/dir --tensorboard-dir path/to/tensorboard/dir csv path/to/annotations.csv path/to/classes.csv

That looks a lot complicated I guess….

For your first try, just let everything go as default. Just use this:

retinanet-train csv path/to/annotations.csv path/to/classes.csv

You might want to change the save directory for weights and tensorboard. For this just use

retinanet-train --snapshot-path path/to/save/dir --tensorboard-dir path/to/tensorboard/dir csv path/to/annotations.csv path/to/classes.csv

NOTE: There is no callback added for earlystopping or a csvlogger. You need to add that yourself in the train.py file.

This will first download the imagnet weights for resnet50 and then start your training.

For better results, it is recommended that you use the pre-trained weights on the COCO datsets with resnet50 backbone which are available on this link.

Inference

The training procedure of keras-retinanet works with training models. These are stripped down versions compared to the inference model and only contains the layers necessary for training (regression and classification values). If you wish to do inference on a model (perform object detection on an image), you need to convert the trained model to an inference model. This is done as follows:

retinanet-convert-model /path/to/training/model.h5 /path/to/save/inference/model.h5

After converting the model. Navigate to the example folder in the repository and open the notebook.

Replace the model_path with the path to the converted model. If you haven’t used resnet50 as backbone then be sure to change the backbone as well.

model_path = os.path.join('..', 'snapshots', 'resnet50_coco_best_v2.1.0.h5')

# load retinanet model
model = models.load_model(model_path, backbone_name='resnet50')

There is also the labels_to_names variable which is a dictionary in which classes are mapped.

labels_to_names = { 0: ‘person’, 1: 'bat'}

Change it accordingly to the number of classes you have.

After this all that’s left is to give the path to to your test image in Run detection on example section of the notebook.