Gesture Commands with Kinect Gesture Recognition

4 min readNov 6, 2018

Kinect, a motion sensing device created for Xbox games, enables users to control and interact with their computers using gestures and spoken commands with Windows Kinect software development kit (SDK). In this tutorial, I’ll show you how to setup the Kinect on Windows 10 and how to define a customized gesture as a gesture command which asks the windows media player to play an audio in a machine learning way.

Introduction

Kinect could recognize discrete and continuous gestures. A discrete gesture is recording a pose while continuous one is recording the whole process to make the pose. This tutorial focuses on discrete gesture recognition since it’s much easier for beginners. Since the gesture recognition follows machine learning pattern, I will record a gesture several times as training and test data with Kinect Studio. Then, I will build the machine learning model in Visual Gesture Builder and store it in a .gbd file. Finally, this gdb file will be used to implement the gesture command.

Machine Learning vs Heuristic Gesture Recognition

Machine learning gesture recognition learned from some training data to recognize a gesture while heuristic gesture is more about comparison of x, y values in coordinates. It’s obvious that heuristic approach is much easier to implement for simple problem such as hand over head. However, the heuristic may be hard to implement for complex gesture such as baseball swing. In addition, doing machine learning approach can provide clues to heuristic approach.

Setup

First, you need to connect the Sensor and the Kinect Adapter by following the setup guide in the adapter box. You will also need to download the Windows Kinect SDK 2.0 (Do not download V1.8, which is not compatible with Windows10). To install the SDK correctly, please check the system requirements and follow the install instructions on the download page. After installing the SDK successfully, there should be three apps: Kinect Studio v2.0, SDK Browser v2.0 and Visual Gesture Builder installed.

Record gestures

The process of recording gestures are shown in the following video.

Click on the Record tab and the connect button on the upper left to connect to the sensor. To start recording the gesture, click on the red circle next to the connect button and you can see your skeleton on the screen. As shown in video, I did the dab pose in one record for more training data. After recording the gesture, the file is stored in the directory shown in file -> Settings -> recording file path.

Analyze the gesture

The process of analyzing gestures are shown in the following video.

A solution called play audio is created. A project called play_audio is created with Wizard in the solution. The gesture ignores lower body and fingers but focuses on both side arms. The play_audio shown in video is the training part while the play_audio.a is the test. One recording is added to the play_audio as clip. A tag is the interval when targeted gesture is true which could be selected by holding the shift tab and dragging the mouse. The true value could be entered in the value column or by press “enter” after selecting the interval. After adding all the tags, the solution could be built to create the model for this gesture and stored in the .gbd file. Then, the test clip added to play_audio.a could be analyzed based on the file. The level of confidence is shown numerically in the confidence column and graphically below the table. Now using the .gdb file, you can start implementing the command.

Implement the gesture command

The command implementation is based on the BodyBasic sample in SDK Browser. Due to the amount of initialization, the code provide here only include the changes I made. To initialize the graph builder, package Microsoft.Kinect.VisualGestureBuilder need to be installed with NuGets.

The MainWindow_Loaded function is the main function that start recognizing. The value of the databasePath here is the absolute path to the .gbd file and the gesture name is the name of the project. The gestures in the .gbd file are all loaded. In this case, only the play_audio is loaded since there is only one gesture in play_audio.gbd.

The content of the command is implemented in vgbFrameReader_FrameArrived function. In this case when the model is more than 80% confident about the detected pose is a dab pose, the audio will be played from Windows Media Player.