Hand gesture recognition technology brings real hand into virtual reality

Estelle Jiang
9 min readDec 16, 2019

--

Recent years, the amount of researches that have been conducted around gesture interaction and recognition can mirror the popularity of relevant technologies. With the help of machine learning, researcher and developers now are able to teach not only our body gestures but also hand gestures to a machine using sensors and pass data. Therefore, the movements and gestures of hands can be recognized, interpreted and shared through networks. The ability of computers recognize hand gestures visually is essential for progress in human–computer interaction. Researchers, in the field of Human Computer Interaction, have realized the benefits of working with human gesture to control specific applications. As the abilities to perceive and track the shape and motions of hands have been constantly enhanced, these relevant cutting edge technologies with such abilities will become vital components in improving the user experience across a variety of technological platforms and domains.

https://en.ryte.com/wiki/Human_Computer_Interaction

As we know, HCI concerns a lot about “the design, evaluation, and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them”, especially “in the context of the user’s task and work”.(Chakraborty, Sarma, Bhuyan, & Macdorman, 2018) HCI focuses not only on enhancing present interfaces’ usability and reliability but also on the development of innovative and user-friendly interfaces that support and promote more natural and frictionless experience. In which case, hand gesture interaction, a hand-free and non-touch interaction, can make one step further in the filed of HCI and become prominence. Looking deeply into the application of hand gesture recognition and interaction, it can potentially range from forming the basis of sign language understanding to medical assistance to virtual reality. It can also enable the overlay of digital content and information on the top of the physical world in augmented reality.

Photo by Lux Interaction on Unsplash

When it comes to Virtual Reality, people are trying to explore what natural, efficient and intuitive interaction methods are available to further improve the experience of VR. Currently, the most common interaction method used in the virtual world is the interaction of electronic devices, such as handles and other products which temporarily solve the problem of interaction but also make the user from the virtual world is taken out. The current interaction still largely influences the experience of immersion. I believe the combination between hand gesture recognition and virtual environment can bring a controller-free interaction into people’s daily life. The VR experience can become more immersive with the help of cutting edge AI hand gesture tracking techniques.

What is virtual reality?

Virtual reality also means virtual environment which is created by computer simulation system. It simulates a virtual space where people can be immersed in and provided with different types of interactive activities related to visual and auditory perceptions. Users can interact with three dimensional objects within in this three dimensional world, which triggers a direct communication between people and the illusionary world. Within this world, more opportunities and creative ideas can be achieved by designers and developers, such as varied games and live scenes. People who want to experience the virtual world need to have a headset to sense and control the objects or follow the commands.

Li clearly summarized that VR has three distinct characteristics: interaction, immersion, and imagination. Interaction is much more straightforward. Immersion means that users feel that they are part of the environment and the world as if they are immersed. For imagination, It refers to the use if multi-dimensional perception information provided by VR scenes to acquire the same feelings as the real world while acquiring the feelings which are not available in the real world. (Li, Huang, Tian, Wang, & Dai, 2019)

What is gesture interaction technology in the virtual reality? How human involved?

In our daily life, people are able to use their gestures to express their ideas and interaction intentions and gestures are able to convey a series of information and contents through relevant physical movements. In the VR environment, gesture-based interfaces and technology allows users to control devices and move the objects using hand and other body parts. For the sake of this article, I specifically focus on the hand gesture recognition techniques. Usually, developers will associate a set of hand gestures with certain commands to perform operations. Li claims that the gestures in the interaction process can be distinguished according to varied spatiotemporal operation behaviors, different semantics, different interaction modes, and different interaction ranges. (Li, Huang, Tian, Wang, & Dai, 2019). According to research titled Hand Gesture Recognition Using Computer Vision, the approaches of recognize hand gestures present can be divided into two ways: data glove sensor devices that transform hand and finger motions into digital data, and computer vision which uses a camera. The application of data glove was mainly used in the early interaction scenario. With the development of touch screens, the acquisition of gesture signals has developed into visual signal acquisition from computer cameras. At this point, the gestures of users will be regarded as input, then users are able to use their hand’s movements and gestures to navigate through the system, such as selecting, moving, scrolling, grabbing and deleting. In the virtual reality world, users will wear the headset and they can easily catch their hand and track the gestures and movements throughout the headset and that is why the hand gesture recognition techniques can step in and be applied to many aspects of VR.

How does the AI technology work?

To get deeper into how the computer vision technique for hand gesture recognition works, we want to consider three layers of hand interactive system — detection, tracking and recognition. By following these steps, lots of break-though open sources and platforms for better recognize hand gesture have been successfully developed and innovated. Basically, the primary step in terms of gesture recognition is how to successfully detect hands and then segment the corresponding images. Detection will extract the visual data produced by hand within camera’s view. In the Hand Gesture Recognition Using Computer Vision paper, it talks about there are several features will influence the accuracy and the performance of image segmentation, such as color, shape and motion. With detailed considerations about these factors, the accuracy of hand location can then be largely improved. After detecting the location of hand, tracking monitors data about the segmented hand regions frame-by-frame can help people better understand the movements of hands. This tracking mechanism can also ensure each movement is captured to make the data analysis and interpretation more accurate. The last step involving in this process is recognition. The goal of hand gesture recognition is to interpret the semantics that the hand location, posture or gesture conveys. The logic of the last step is grouping the extracted data in order to find patterns. Through the algorithms’ training, it could find match and determine what the gesture has just been performed. Therefore, the system is able to perform corresponded operation and action once the gesture is identified. However, there are research shows there are uncertainties existing for the hand gesture recognition techniques due to the complexity of parsing, or segmentation, of the continuous signal into constituent elements.

The explorations of hand gesture recognition technology in the field of VR

Luckily, the future with gesture recognition technology has arrived and there are lots of existing real world applications and cases that proves hand gesture recognition technology does create more immersive VR experience.

Up to date, researchers and engineers from Facebook Reality Labs and Oculus are rolling out native hand tracking to the Quest, its standalone virtual reality headset. Users just need to update their software to fully experience this new feature on the platform. Oculus claims that our hands play an important role in how we interact with the world from gesturing and communicating with other to picking up and moving objects. The articulated hand tracking system on the Quest allows the users’ who are not familiar with game/electronic controllers to entry the virtual world without barriers. The system does not use active depth-sensing technology or any additional equipment (such as instrumented gloves). Their new approach uses the four cameras of Quest in conjunction with new techniques in deep learning and model-based tracking. Because of this invention, users’ do not need to pair up the controllers, charge the controller and learn how to use the buttons on it. It brings people’s hand into the VR world and provide users with a more natural way to interact with the platform.

To Briefly summarize how the hand tracking works on the Oculus platform, they used the deep neural networks to accurately predict the location of a person’s hands as well as landmarks, such as joints of the hands. They will then reconstruct a 26 degree-of-freedom pose of the hands and fingers by using the landmarks they collected. Developers can use the 3D models which contains the configuration and surface geometry of the hand to make the new interaction mechanics happen in the app.

Shifting our attention to Google, they also released a new approach to hand perception in June. The approached they implemented provides high-fidelity hand and finger tracking since it is able to recognition and infer 21 3D key points of a hand from just a single fame by employing machine learning. They wish the achievement and real-time performance of hand tracking approach they released can push the boundaries and allows more researchers to take actions to create more use cases and stimulate new applications.

(For more information about the specific models:

https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html)

resource from Google AI Blog

Possible issues and future direction

Even there are lots of new released models for hand tracking and recognization, the application of hand tacking in the field of VR can still be possible improved along the way because of the possible issues relating to this state of art technique. As a new user who do not experience with hand gesture controlling in the virtual world before, I might wonder what the learning curve of this technique looks like. When developers are able to recognize lots of hand gestures and implement them by associating with specific actions and operations, the learning curve for users to align each gesture with certain action might highly increase. Li mentions: “although gesture interaction simplifies the interactive input method, there is no standardized operation specification.” It means that gesture and task do not have a one-to-one correspondence and it is also hard for engineers to build a consistent operation platform due to the variety and diversity of hand gestures. Unlike the interaction on the mobile phone, people still need spend enough time to build a common sense around this field.

Also, when users interact in the VR world with the electronic controller, it is point-to-point precisely operation. However, it is hard to see how accurate it is in terms of hand gesture interaction. Cabral conducted several experiments to test the 2D computer vision based gesture recognition system with several different scenarios. The experiments show that the time to completion of simple pointing tasks is considerably slower when compared to a mouse and that its use during even short periods of time causes fatigue. (Cabral, Morimoto, & Zuffo, 2005). Plus the application of hand gesture interaction is affected by factors such as the recognition method and interaction device and the processing of recognition process is varied. The user experience and fluency of hand gesture interaction cannot be fully guaranteed.

Just like what announced by Oculus, there are not so much support on the VR platform recently since the hand tracking has just been launched. They are still in the process of releasing the developer toolkits for the sake of further VR app and games development.

It is true that there are still some usability issues need to be adjusted and balanced. Nevertheless, the implement and developing of hand gesture recognition techniques can definitely unlock a more expressive and immersive experience for virtual reality and reduce the current frictions of physical devices.

Citations:

  1. Chakraborty, B. K., Sarma, D., Bhuyan, M. K., & MacDorman, K. F. (2017). Review of constraints on vision-based gesture recognition for human–computer interaction. IET Computer Vision, 12(1), 3–15
  2. Yang, L. I., HUANG, J., Feng, T. I. A. N., Hong-An, W. A. N. G., & Guo-Zhong, D. A. I. (2019). Gesture interaction in virtual reality. Virtual Reality & Intelligent Hardware, 1(1), 84–112.
  3. Samantaray, A., Nayak, S. K., & Mishra, A. K. (2013). Hand gesture recognition using computer vision. International Journal of Scientific & Engineering Research, 4(6), 1602–1608.
  4. Cabral, M. C., Morimoto, C. H., & Zuffo, M. K. (2005, October). On the usability of gesture interfaces in virtual reality environments. In Proceedings of the 2005 Latin American conference on Human-computer interaction (pp. 100–108). ACM.
  5. Using deep neural networks for accurate hand-tracking on Oculus Quest. (n.d.). Retrieved from https://ai.facebook.com/blog/hand-tracking-deep-neural-networks.
  6. On-Device, Real-Time Hand Tracking with MediaPipe. (2019, August 19). Retrieved from https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html.

--

--