Summary of Eye-Hand HCI Research in the Past 10 Years for Apple’s Vision OS

Eye-Hand Interaction in Vision OS

V2XR
4 min readAug 27, 2023

Apple Vision Pro demonstrates a “semi-new” interaction method: using gaze points 👀 as interaction cues to guide interactions, and simple pinch 🤏🏻 and drag ✊🏻 gestures to trigger commands. Eye tracking and hand tracking have been used in many other consumer head-mounted displays like HoloLens 2, PSVR 2, and Meta Quest Pro. Apple’s “incremental innovation” combines both and achieves higher tracking accuracy and recognition range through its powerful computing capability.

HoloLens 2
Meta Quest’s direct & indirect hand interaction

Eye tracking and gesture interaction have also been heavily researched for VR/AR human-computer interaction, in aspects like gaze, voice, haptics, even taste and smell. Ken Pfeuffer from Aarhus University in Denmark has conducted nearly 10 years of eye-hand symbiosis HCI research since 2014, evolving from initial eye-hand interaction on touch tablets to VR/AR eye-hand collaboration. His research on Gaze + Pinch started in 2017. Here is a summary of his 10 papers and 1 PhD thesis.

Eye-hand Symbiotic interaction research in the past decade.

To systematically categorize interaction styles, the author borrowed the concept of Instrumental Interaction from HCI theories. Its core idea is that interaction design should focus on helping users accomplish tasks rather than just improving user experience. Designers should first understand the user’s task, then design a series of interconnected steps to turn the system into an effective tool.

This theory categorizes interaction styles by their degree of indirection in time and space, defined as the offset/distance required to accomplish a task in time or space. For example:

  • Mouse interaction with computers has spatial indirection (2D desk space vs 2D screen space) and temporal indirection (moving mouse to target first).
  • Touch screens have direct interaction without intermediate steps in time and space.

Direct interaction is not always better than indirect interaction. Research shows mouse can outperform touch for simple pointing on tablets due to limited screen size and icon size. The expanded target areas and snap-to-target effect in iPad’s mouse support improves precision over fingers.

iPadOS

Back to eye-hand interactions, the author categorized them by directness in space and time:

Apple Vision Pro’s eye-hand interaction is direct in time, unlike common VR raycasting which requires pointing hands at targets. However, hands are still spatially offset from targets.

In direct interaction targets map 1:1 to hands/eyes, while in indirect interaction targets may map N:N.

The 2014 research already explored Gaze + Touch interaction on touch tablets similar to Apple’s design, except hand tracking used the touch screen instead of head-worn cameras.

Later research combined stylus and gestures — pinch to zoom, stylus for precision. This addresses important interaction issues for productivity: easy switching between objects, and separating zoom/pan from object manipulation.

Apple Vision Pro only demonstrated simple pinch-based drawing, which seems unsuitable for complex work. The observed single finger typing also hints at limited hand tracking capability.

Summary

This article summarizes the eye-hand symbiotic HCI research that Apple Vision Pro builds on. While not completely innovative, it may be the optimal balance of intuitive efficiency and user experience for certain VR/AR tasks. The Apple glasses greatly expand intuitive access to information beyond the confinement of 2D computer/tablet/phone screens, elevating the digital world from 2D to 3D — humanity’s most familiar and natural domain. However, current limitations in optics, display, battery tech, etc. prevent replicating the same physical interaction experience when increasing the digital world by a dimension. Still, finding the sweet spot between technical constraints and user experience is what Apple excels at.

So from Apple’s perspective, what is XR?

Standing at the intersection of technology and humanities, it does not push humanity towards a virtual “metaverse”, but pulls back our digitally-absorbed minds along with the digital world into reality.

--

--