Spot Skeletons in your Closet (using Deep Learning CV)

Exploration of pre-trained and custom-trained Computer Vision models to create a fun “Spooky Object Detector.”

Bhawana Mishra
Slalom Build
6 min readOct 28, 2020

--

Let me be clear, I don’t intend to spy on you or reveal your inner demons this Halloween. Instead, we are going to try to detect some real “skeletons” or “monsters” in an absolutely literal terrifying kind of way. Read on to see what I mean.

In the spirit of Halloween, I sought to detect spooky entities in an image using deep learning Computer Vision (CV). And like a character in a typical horror movie, I underestimated the travails down this path. With so many pre-trained models out there, this should have been a treat. In hindsight, parts of this exploration felt much more like a trick!

We will walk through the use of existing pre-trained and custom-built models for detecting some scary demonic creatures, so no one needs to be too scared this Halloween.

Exploring Pre-trained models

What are pre-trained models? In simple terms, any model trained on a set of data points pertinent to a specific problem domain can be used as-is or as a starting point for any related problem domain. This helps with the time or computational restrictions or technical expertise associated with building a model from scratch. Just like how you are planning to revamp your Batman costume into Black Panther costume this year. Gotcha!

I started by exploring different pre-trained Image Classification models in keras. I used keras because it is a fairly user-friendly library running on top of Tensorflow, CNTK and Theano. This is great for beginners and for building fast prototypes using deep neural net. Also, keras has a number of different image classification models. Some of the popular ones are Xception, VGG16, VGG19, ResNet50, InceptionV3, MobileNet.

So I installed keras and Tensorflow to set up my environment and got started with hunting some beasts! I began with Google’s InceptionV3 model for a few reasons:

  1. It has proven to be computationally efficient
  2. It has one of the lower error rates.
  3. And it can attain > 78.1% accuracy on the ImageNet dataset.

I was able to start getting label predictions for my test images with just a few lines of code. The results were humerus (wink). The model classified most of the images as either a “mask” or a “ski-mask”. Well, that’s not entirely bat-crazy though, is it? It makes intuitive sense why the model would think they are all wearing masks, because who are we kidding, ghosts rarely pose and the ones that do, probably pretend to be humans with masks. For my digital P.K.E meter, this just wasn’t good enough. I wanted to creep it real.

Top 5 predictions of the InceptionV3 model for the test images

Further technical investigation revealed that most of the pre-trained models had very practical (read: boring) labels of object, animal, or person. Turns out no one had been looking for more spook-tacular objects like ghosts, monsters, witches, or vampires like I was! And although I didn’t want to go the route of building my own image classification model from scratch, it seemed like I was being gently led down this haunted alley after all.

Training Your Own Model

While reviewing a number of articles and papers on building a custom model, I came across YOLOv3 — a popular object detection model (YOLO — You Only Look Once). An image needs to be forwarded only once through the network and the model can identify up to 80 different objects in it. It is extremely fast and accurate like a Single Shot Multibox Detector (SSD). I decided to go for it.

So, how does YOLO work?

  1. Divides the images into a 13x13 grid (169 cells).
  2. Number of possible bounding boxes are predicted in each cell.
  3. Two confidence scores or probability scores are predicted by the network a. Object location — Probability that the bounding box is actually enclosing an object.
    b. Object recognition — Probability that the object in the bounding box belongs to particular class.
  4. Based on the non-maximum suppression method, it disregards the bounding boxes with lower confidence or one of the overlapping bounding boxes with lower confidence.

Before investing time into training my own model using YOLO, I tried using YOLO as-is just to check out if it did a good job of detecting objects in a picture, which it did. It was able to accurately detect person, car, bird, etc. But that still didn’t tickle my bone. I can’t have the model identify a ghost/monster as a person. The labels of the pre-trained YOLOv3 model were mostly of 80 different objects, but not my genre of things.

The time was upon me to tailor YOLO for my very own spooky object detector and because I am all about reusing and recycling, I closely followed Anton Mu’s github repository for training your own YOLO from scratch. His YOLO was trained to detect cats, and while cats can be afang-tastically cute way to spend my time, I decided to limit that to no longer than insta-videos and move on.

This was a 4-step process -

Step 1 : Image collection and Annotation

Image collection. I had a bunch 10–12 test images but not enough to train a model. Taking advice from the blog, I used a bulk image downloader extension to download at least 120 images and split the images into 80–20 training and testing image sets.

For YOLO, the minimum is one image per object but if you are not using any data augmentation techniques, it is better to have as many images per class to fine-tune the network and improve the prediction accuracy.

Image annotation. Since the freshly downloaded images were not labeled, I used Microsoft VoTT (Visual Object Tagging) tool to manually tag/label several training images. You need to tag as many objects in an image as you can. The VoTT labeled images then need to be translated into YOLO format for the model training.

Step 2 : Download and Convert Weights

You need to download pre-trained YOLOv3 weights and the config file from darknet. The YOLO weights need to be converted into a keras model.

Step 3 : Train your Model

I ran the model training script to train the model and collect the final weights. This step can take a few minutes to several hours depending on your GPU/CPU setup or, even better, a cloud service like AWS, Azure, etc.

Step 4 : Detect Images

Using the final weights and the newly trained keras, the YOLOv3 model was able to detect the spooky creatures on the test images by creating labeled bounding boxes around the objects with higher confidence scores.

Spooky Object Detector Results

Based on the results above, you can tell that the trained YOLO model is doing a great job at locating the area containing the object and the most probable label for it. This got me curious and I wanted to test if it fared well when there were multiple objects or real people in the image.
I collected more images containing multiple objects and humans, labelled them using VoTT and re-trained my model with the updates set of labels. Take a look at some of the results:

Results from re-trained YOLO model

The re-trained model could locate multiple entities within an image for a large subset of pictures but failed to do so in the remaining. However, the model’s label prediction for the located entities is quite accurate. Increasing the variety of images per label and training on images containing multiple labels definitely improved the model. I’m really happy with these promising results but it’s still not ideal.
It would be interesting to see if the detection and prediction accuracies improve, and by how much, through the following explorations:

  • Increase the training set to incorporate more labelled images per class and if possible more classes.
  • Data augmentation capabilities with YOLOv3.
  • Hyperparameter tuning.

So this holiday, collect your photos or videos and go through this exorcise to find any photobombing paranormal entities in them or check if any of your friends in pictures are from the netherworld. Witching you a Happy Howl-o-ween! Boo!

--

--

Bhawana Mishra
Slalom Build

Senior Machine Learning Engineer @ Slalom Build, Seattle.