Is Santa Claus Real?
My team is very serious about Christmas and the gifts we get. So, we wanted to track Santa Claus and know when he delivers our presents. We decided to put a camera in our chimneys and find him. Recently we saw this post about how to train Tensorflow’s object-detection API for your own dataset which persuaded us to lay our hands on this cool stuff and try to find Santa.
The code is available at this github repo. The model generated from this code can be extended to find other characters be it animated or real-life.
Here is the Santa Finder in action. 🎅🏻
Collecting the Data
As any machine learning model, the most important aspect is the data. Since we wanted to find varying types of Santas, animations, clay mations, people dressed as Santa, our training data had to be diverse. To gather the data we wrote a stream-processor which uses VLC to stream videos from any online source and capture frames from it. The stream processor captures frames in a video without having to wait for the video to load i.e if the video currently playing is at the 2 second mark, the stream processor would have captured frames from the 4 or 5 second mark. As a bonus, you can watch videos in ASCII, which is the coolest way of watching videos 🤓. Instructions to use the stream processor are available here.
Here is a small collection of the different types of Santa images we collected. All of these images were collected from YouTube. As you can see, there are images of different kinds of animated, and live Santa.
Labeling the data
The next step was to label the data i.e. drawing a bounding box around the face of Santa Claus. A common choice for labeling images is to use the tool labelimg, but we used a custom script referenced from this post.
To label the images, click on the top left of character’s face first and then the bottom right of its face. If no character is present in the image, double click on the same spot and get the image deleted. The code for the script is available here.
Creating Tensorflow Record file
Once the bounding box information is stored in a csv file, the next step was to convert the csv file and the images to a TF record file, which is the file format used by tensorflow’s object-detection API. The script to convert the csv file to a TF record can be found here.
A protobuf text file is also needed that is used to convert label name to a numeric id. For our case it was just one class.
item {
id: 1
Name: santa
}
Creating the Config File
For training, we used the faster_rcnn_inception_resnet config file as the basis. We changed the number of classes parameter in the config file to 1 since we have only one class — ‘Santa’ and changed the input path parameter to point to the TFrecord we created in the previous step. We used the pre-trained checkpoint for faster_rcnn_inception_resnet. We used this model because model accuracy was more important than model training speed. There are other models that offer different training speeds and accuracy which can be found here.
Training
The training code was run on our local machines to check if everything was working fine and once it was, it was deployed to Google Cloud Platform’s ML engine. The model was trained for more than a 100,000 steps.
The model does pretty well for both animated and real-life pictures.
Exporting the model
After the training was finished, the model was exported to use it for testing on different images. To export the model we chose the latest checkpoint obtained from the training job and exported it to a frozen inference graph. The script to convert a checkpoint to a frozen inference graph can be found here.
We also built a webpage for our model which pulls back images from google search and tries to find Santa in the images returned. The results on this webpage were filtered to show bounding boxes drawn with a confidence of more than 60%. This is a snapshot of the webpage.
Next Steps
While the training job was running, we noticed that the TotalLoss fell below 1 very quickly, which meant the the model was doing very well in finding Santa.
We knew our model was not going to be perfect. While the model was doing pretty good in finding Santa fairly accurately, we were also getting false positives. False positives, for this scenario, are images in which there was no Santa Claus, but the model predicted that there was.
There is plenty of room for improvement in making the predictions more accurate and reducing the number of false positives.
The next steps would be to learn more about the different parameters in config files and get better informed on how they affect the training of the model and its predictions.
Conclusion
We hope that you can now train the object detector for your own dataset. At this point, I would like to thank Shivangi Shroff and Josh Kurz who played an equal part in building the project.
Thank You for reading and hope you liked this post. If you have any questions or suggestions feel free to reach out to us on LinkedIn at Varun Vohra and Shivangi Shroff. We’d be more than happy to receive any feedback.