Bringing Puppets To (Augmented) Reality

2020CV Inc
Published in
3 min readMay 31, 2017


YoPuppet Demo

Having been born a child of the 80’s, I grew up in the golden age of puppets. My generation was raised on The Muppets, Sesame Street, Star Wars, The Dark Crystal, Labyrinth, The Never-Ending Story, ET, Gremlins, ALF, and the list goes on.

My brothers and I would crowd around a small TV and VCR and had our imaginations captured by the creations of Jim Henson et al. Puppets have been around for two and a half millennia, and they are still alive and well today (though used more sparingly in modern cinema). There is an undeniable absurdity to making a moving hand come to life, but it has never ceased to amuse us. Yet, after having a daughter of my own, I can see how the magic I once experienced as a kid might be lost in translation to the hard-to-impress, mobile phone generation. Suffice it to say, that your 80’s console of choice (in my case, a Commodore 64) required quite a bit of imagination to suspend one’s disbelief.

A few of those iconic puppets…

As much of a boon as computer-generated content has been for the entertainment industry, it has also sucked much of the life out of it (personally, I strongly prefer the Yoda confined to Luke’s back over the one flipping around a green screen like a circus monkey). Maybe I’m just being nostalgic, but I think that part of what brings me back to the films of my childhood is that the hand-crafted special effects served a purpose, and were performed extemporaneously—unlike the lifeless, lazy, and premeditated renderings of modern cinema.

So, in an attempt to reclaim some of that magic for my daughter’s generation (and the ones that preceded her), I’ve gone about creating a mobile app that brings both worlds together. YoPuppet is still at an early stage of development, and our main goal here is simply to get some feedback from the internet . Right now our app is geared toward families and content creators, with the ability to record and share if desired. I’m confident that with a little help I can take this app to a production-ready state on iOS and Android devices (and desktops, as well). Up to this point we are self-funded, and we are debating how to best fund this project going forward.

For anybody interested from a technical perspective, you can read on below. Otherwise, I appreciate your reading this far and thank you in advance if you shared or donated!

This app was developed using OpenCV, Cocos2d-x, & Spine. Most of the code was written in C++. I haven’t tested on anything other than an iPhone 7 Plus, but it achieves around 60 FPS with little in the way of optimization. It works with both the front and back cameras of mobile devices. It adapts to any color of skin (even with something like a blue latex glove over the hand). It will work with either hand (eventually), and for anybody with at least 2 fingers (and in theory, even with prosthetic limbs). The process, at a high level, is segmenting the skin from the background (HSV back-projection), then segmenting the hand contour (thresholding, erosion, dilation) from there. Segmentation works reasonably well in most light conditions and backgrounds with a little manual tweaking of thresholds, but it will need to be improved for consumer use (probably by adapting to the environment dynamically). I attempt to find landmarks around the hand and arm using defects in the contour’s convex hull and that is really where the difficulty ends. From there, I’m simply rigging up skeletons to the landmarks and rotating and scaling as necessary. As you might be able to discern, I’m not an artist, and these puppets you see rigged to the skeleton are just placeholders for far more creative art than my own. I chose to use one particular planar hand orientation and 2d over 3d simply to be able to make something that works on a mobile device and reliably. In theory, it could be expanded to support other orientations and 3d puppets, but that is a challenge for a team far bigger than myself. For those that are unfamiliar, hand gesture recognition is far more complicated than facial feature detection — due to the many degrees of freedom of the hand and fingers — and it’s a largely unsolved area with 2d cameras.