Augmediated reality system based on 3D camera selfgesture sensing

Raymond Lo, Alexander Chen, Valmiki Rampersad, Jason Huang, Han Wu, Steve Mann. . IEEEISTAS’13 Proceedings of IEEE International Symposium on Technology and Society.

Date: June 28, 2013

Publication: Augmediated reality system based on 3D camera selfgesture sensing.pdf


This project is inspired from Raymond lo, a PhD student working under Professor Steve Mann. I developed the gesture recognition interface software utilizing Neural Network along with some simple UI element. The motivation for this project is to bring the novel 3D interaction seen in minority report (movie) to real life.


This project contains four parts: hardware implementation, gesture segmentation, gesture recognition and UI interface.

Hardware Implementation

The main idea of this project is to hook up a Prime Sense (eg. Kinect) camera to a Epson Moverio glasses. The Epson Moverio glasses is basically a projected screen that will project video in front of your eyes. We use this amazing screen to display what the camera sees and achieve augment reality. In the first prototype, I put together a Kinect along with a RGB camera in order to stream all RGB, Infrared and Depth map feed. With this basic setup, I developed the gesture recognition system with Python OpenCV . This system includes a hand segmentation algorithm, gesture recognition algorithm and simple User Interface. After this is completed, other co-authors port the same algorithm into a more advance prototype (built by raymond lo). The second prototype utilize a smaller Prime Sense Camera along with an Android based smaller processor. All the algorithms I developed in the first prototype were rewritten in C in order to perform low latency gesture recognition.

Hand Segmentation

The hand segmentation algorithm is done by basically merging the depth map feed and the Infrared feed. In order to perform gesture recognition in first person view, we are always dealing with close range object. Therefore, we simply make a close range binary filter utilizing both the infrared feed and depth map feed. It turns out it can reduce out most of the noise and give us pretty good results in indoor environment.

Gesture Classification

In this project, I use Neural Network to train our gesture recognition system. To train the neural net, I collected more than 20000 training data among four different gestures and train the neural net using 100 hidden layer. Once I train the neural net, I incorporated the trained algorithm into our gesture recognition pipeline. This is how it works, once the segmentation algorithm segment out the hand, it resize the image into a 20 by 20 image patch. I feed this patch into our trained neural net and the neural net will tell us which gesture this patch is likely to be. I obtain a pretty good accuracy in our training and testing data. But I still need to do some more work for this to be used in real life.

UI Interface

To demonstrate how this system work, I put together a set of simple user interface element for the user to see the possibilities with such system. One of them is the select button as you can see in the image on the right. Once this button is triggered, it will provide a user a draggable window item to the user. This window can be imagined as any desktop application window that display any kind of information. The user can interact with it by dragging it around using only gestures. They can also closed the window by clicking/hovering the select button again.

Like what you read? Give Alex Chen a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.