Object Detection for JellyFish using small dataset and RetinaNet

yhoztak
5 min readAug 31, 2018

--

Object Detection

CNN works great for Image Recognition and there are many different architectures such as Yolo, Faster R-CNN, RetinaNet. There are interesting applicability such as using satellite image and decided to try out.

There’s a lot of tutorials for dogs/cat image classifier including Kaggle and there’s very cool visualization framework to show how CNN works in depth like deepvis. But what I thought missing is a simple tutorial of using these framework with your own or custom data, so here it goes.

Why RetinaNet?

Simply put, I read performance benchmark on this blog, and this performed the highest in accuracy. In my use case, speed doesn’t matter too much but accuracy does, so picked this. It’s one of the CNN Architecture and uses Feature Pyramid Net and Resnet as shown on their white paper

Setting it up

After going through keras-retinanet-example, I see that it works quite well with pre-trained model with cocodataset. With Coco explorer, you can see what objects are supported in this model. Whatever the labels listed here works quite well already. You’ll realize that there’s Motorcycle, but there’s no JellyFish here.

As an example, if I try to detect motorcycle, it shows

both person and motorcycle are detected

but if I try to detect jelly fish, it shows…

nothing…

Why JellyFish? It’s such a mythical creature looks different one to another. Some of them are transparent, some of them lights up, and they look interesting.

Data preparation

Open Image Datasets labels hierarchy

OpenImageDataset is maintained by Google and has tons of collection with annotations already (as of today, it says 15,440,132 boxes on 600 categories, 30,113,078 image-level labels on 19,794 categories), and it does have Jellyfish!

Now the second problem is, it has too much data if I download the whole dataset (6T~). I don’t have that much space on my laptop, and I also want something small and simple like just detecting Jellyfish.

Metadata is about 1GB from here to find different labels, etc.

I found out that someone wrote nice script and you can query specific list of open images using big-query. My query looks like this and also can be found here. The class information can be found from here and /m/0d8zb is for JellyFish.

#standardsql
SELECT
i.image_id AS image_id,
original_url,thumbnail_300k_url,
confidence
FROM
`bigquery-public-data.open_images.labels` l
INNER JOIN
`bigquery-public-data.open_images.images` i
ON
l.image_id = i.image_id
WHERE
label_name='/m/0d8zb'
AND confidence >= 0.8
AND Subset='train'

Now it’s time to preprocess some data so that I can train with keras-retinanet as a start.

https://github.com/fizyr/keras-retinanet#usage shows how to train using custom dataset--weights option can be used to build on top of existing model, the default one would train with pretrained imagenet weights with resnet50 as backbone.

class.csv is pretty simple since I have one class to classify

JellyFish,1

Now the annotation.csv

KerasRetinanet accepts absolute position for bounding box for custom data, where OpenImage uses ratio, probably due to variety of sizes in images, so we’d need to convert them.annotations.csv needs a bit of work

Wrote something like this script (to be linked) so that

  • based on big query result, create lookup table for imageId
  • from annotation-bbox.csv, filter out JellyFish label, based on imageId, get construct full filepath.

create csv with each row like path/to/image.jpg,x1,y1,x2,y2,class_name

Training a model

Now I run from keras-retinanet folder like this:

keras_retinanet/bin/train.py csv ~/object_detection/keras_format/annotations.csv ~/object_detection/keras_format/classes.csv

As you can see on train.py here, the default epochs is 50, with 10000 steps.

Since my dataset is so small (1500 lines), I could use much smaller iterations and epochs probably.

nohup keras_retinanet/bin/train.py — tensorboard-dir=/home/ubuntu/projects/keras-retinanet/tensorboard_logs — epochs=10 — steps=1500 csv /tmp/annotations_on_ubuntu.csv /tmp/classes.csv > train_gpu.log &

Using gpu, it finishes only in 4 hours. With CPU, the ETA showed up as 45 times more…

From Tensorboard, both Classification loss and and loss looks good. I run a few times with a different steps or epochs to see what happens. It was obvious that having higher steps shows early improvement on both of the metrics. I could probably keep it going till as low as it goes for classification loss, but for this quick experiment, 0.08 for loss looks pretty good enough

By the way, what is loss function? Found good explanation here.

Loss is the penalty for a bad prediction. That is, loss is a number indicating how bad the model’s prediction was on a single example. If the model’s prediction is perfect, the loss is zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights and biases that have low loss, on average, across all examples.

Can it detect JellyFish now?

Let’s see if it can detect the jellyfish now.

Great! It is missing bunch of them, but at least be able to detect these confusing objects with high accuracy.

Tried out different image and this one looks great. It actually capture most of the obvious one.

It’s great that even the training data is so small, the object detection framework works quite well.

What’s next?

Maybe one of the followings may be interesting to try

  • Detect more variety of sea animals
  • Make it work with movie
  • with aerial images

--

--