How to Build a Real-time Hand-Detector using Neural Networks (SSD) on Tensorflow

Detection on live video from a webcam. (a) Detection works well even with partial occlusions. (b) There were some misses when motion was fast and hands were from an unlikely egocentric viewpoint. (c ) Detection worked fairly well even with object overlap.

Motivation — Why Track/Detect hands with Neural Networks?

If you are not interested in the process of training the detector, you can skip straight to the section on applying the model to detect hands.

Data preparation and network training in Tensorflow

Bambach, Sven, et al. “Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions.” Proceedings of the IEEE International Conference on Computer Vision. 2015.

The Egohands dataset provide a polygon (the white dots) around each hand. We need to generate bounding boxes from the polygons, and generate tfrecords to train a tensorflow model.
LOCATION_X
— frame_1.jpg
— frame_2.jpg

— frame_100.jpg
— polygons.mat // contains annotations
LOCATION_Y
— frame_1.jpg
— frame_2.jpg

— frame_100.jpg
— polygons.mat // contains annotations
Final total loss value of 2.575 and mAP of 0.9686.

Using the Detector to Detect/Track hands

Applying the detector to a Youtube video.

Thoughts on Optimization.

Performance on random images with hands show some limitations of the detector.

Given the above, a potential alternate strategy is to explore structures that allow us efficiently interleave output form multiple pretrained models for various object classes and have them detect multiple objects on a single image.

Acknowledgements

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store