History of Eyes and Hands for Computer Control

16 min readMar 18, 2024

Descartes’ 1644 woodcut in “Principles of Philosophy” depicts his vision theory. It illustrates the seamless interplay of the eyes and hands, a fundamental principle that affords a powerful class of computer interfaces.

Apple’s Vision Pro provides the first operating system fully designed for our eyes and hands. Perhaps the first-ever consumer device. A game-changer for XR like the first IPhone’s “multi-touch” effect that transformed the industry.

In 1644, René Descartes, a seminal figure in the emergence of science, depicted the eye-hand coordination as a fundamental bridge between human consciousness and the external world, indicating its potential for transformative applications. Fast-forward to 1981, a time where the computer started to gain traction, the first scientific paper on a user interface (UI) that exploits the unity of the eyes and hands was released.

Since then, many eye-hand techniques have popped up in the Human-Computer Interaction (HCI) literature. But, how much was science ahead, how innovative is Apple’s UI, and what can we learn for our future? See for yourself!

Here’s a rundown of research on interacting with the computer using both eyes and hands.

Quick navigation:

First papers for PC, Large Displays, Touchscreens, Virtual Reality
All videos in a youtube playlist

Eyes & Hands for PC

1981: Gaze-orchestrated dynamic windows (Bolt, SIGGRAPH’81)

Display: Large projection
Hand input: Gaze + joystick
Summary: The system showed a set of running TV episodes on a big projection. It used eye-tracking glasses to see which one you were looking at and mute the others. If you focused on one, it zoomed in on that episode. The user can provide explicit zoom in/out commands by using a joystick.

In 1981, the “Gaze-orchestrated dynamic windows” demonstrated eye-based selection and hand-input by joysticks.

1986: An Evaluation of an Eye Tracker as a Device for Computer Input (Ware, Mikaelian; CHI’87)

Display: Monitor
Hand input: Gaze + keyboard button
Summary: The user looks at a target on the screen, and then presses a physical button to select. This basic mechanism was evaluated and found as faster but less precise than full-hands control.

1990: What you look at is what you get: eye movement-based interaction techniques (Jacob, CHI’90)

Display: PC monitor
Input: Gaze + mouse and keyboard
Summary: Integrated gaze and mouse/key techniques for GUIs. Looking and clicking for button selection, moving the mouse to drag objects, and using gaze to open and browse menus. Coined the “Midas Touch” problem: “Ideally, the interface should act on the user’s eye input when he wants it to and let him just look around when that’s what he wants, but the two cases are impossible to distinguish”.

The “What you look at is what you get” work by Jacob explored gaze+mouse interaction for buttons, menus, and object dragging in 1990.

1999: Manual and gaze input cascaded (MAGIC) pointing (Zhai, Morimoto, Ihde, CHI’99)

Display: PC monitor
Input: Gaze + mouse
Summary: Here, the mouse cursor is warped to the gaze point, after which the user continues with using the mouse. This represents a first, more subtle way of using gaze — to enhance pointing, while the rest is done by the hands only. The underlying idea is that gaze is used sparingly, to avoid overloading the eyes with motor control tasks.

A depiction of the MAGIC selection algorithm, where a mouse cursor is warped from a previous to the gaze position to lower cursor dragging effort. Picture from CHI’99

2000: Intelligent Gaze-Added Interfaces (Salvucci, Anderson; CHI’00)

Display: PC monitor
Input: Gaze + keyboard
Summary: The GUI’s standard mouse and keyboard interface has been enhanced with a ‘gaze button’, enabling users to perform mouse actions like clicking and dragging using their gaze as the pointer. This integration offers users the flexibility to utilize both the traditional UI controls and the added gaze-based control mechanism simultaneously.

Salvucci and Andersen propose a system where a special gaze button on the keyboard allows to interact with the eyes, in addition to default mouse control. Gaze pointing targets are highlighted in yellow.

2005: EyeWindows: Evaluation of Eye-Controlled Zooming Windows for Focus Selection (Fono, Vertegaal; CHI’05)

Display: PC monitor
Input: Gaze + keyboard
Summary: The paper proposes to use gaze to switch between different windows of a GUI. By directing their gaze to a window and pressing a key, it becomes focused. Once chosen, size of the window adjust automatically to a preset scale.

In EyeWindows, users can rapidly select a window by gaze and a button press. Picture from CHI’05

2007: EyePoint: practical pointing and selection using gaze and keyboard (Kumar, Paepcke, Winograd; CHI’07)

Display: PC monitor
Input: Gaze + keyboard
Summary: Mouse pointing by gaze is inaccurate. The authors propose the “look-press-look-release” technique for enhanced precision. After looking at an area and pressing down, the area is magnified. Then, the user looks at the desired object in the magnified area and releases the button to finalize selection.

The EyePoint technique uses gaze and keyboard input, where a magnified zoom area provides precise button selections.

2008: Improving the accuracy of gaze input for interaction (Kumar, Klingner, Puranik, Winograd, Paepcke; ETRA’08)

Display: PC monitor
Input: Gaze + keypress
Summary: Investigated several methods to improve accuracy of gaze and keypress inputs. For example, synchronisation errors, as users may click before they look at the target or already have looked away from the target when pressing the button, which can be mitigated through smoothing algorithms.

Kumar et al. study error mitigation of gaze and keypress inputs, such as early (a) and late (d) triggers.

2009: The MAGIC Touch: Combining MAGIC-Pointing with a Touch-Sensitive Mouse (Drewes, Schmidt; INTERACT’09)

Display: PC monitor
Input: Gaze + touch-sensitive mouse
Summary: Extends the MAGIC principle by allowing a new way to warp the mouse cursor to gaze. When users touch the mouse button, before pressing it, the cursor automatically warps to gaze. This avoids the original MAGIC’s problem of when the cursor will warp.

The MAGIC Touch technique uses a touch-sensitive button. When touched, the mouse cursor warps to the gaze position. From then, the normal mouse actions take over. Picture from paper

Eyes & Hands for Large Displays

2010: 3D User Interface Combining Gaze and Hand Gestures for Large-Scale Display (Yoo, Han, Choi, Yi, Suh, Park, Kim; CHI EA’10)

Display: Large display
Input: Gaze + hand gesture
Summary: The paper proposes control of a large screen by using eye-tracking and hand gestures. The user can change a cursor position by gaze, and zoom by a push-pull gesture of the hands. Several combinations of two hands are used to provide a command set to manipulate an image gallery.

This work considered gaze pointing with hand tracking, where different hand movements support a set of commands. Picture from CHI EA’10

2011: Designing gaze-supported multimodal interactions for the exploration of large image collections (Stellmach, Stober, Nurnberger, Dachselt; NGCA’11)

Display: PC Monitor
Input: Gaze + keyboard + touch
Summary: The work combines gaze support with a fisheye lens, a keyboard, and a tilt-sensitive mobile multitouch device for interaction with a remote display. The user can look at an area, and activate a fisheye lens through keypress or a touch sliding gesture.

Users can look at an image area, and touch gestures to modulate a fisheye lens. Picture from NGCA’11

2012: Look & touch: gaze-supported target acquisition (Stellmach, Dachselt; CHI’12)

Display: PC monitor
Input: Gaze + smartphone touch
Summary: This work considered mobile-device touch gestures with gaze for interacting with content on a remote display. Users utilize eye movements to position a cursor while touch-dragging gestures enable precise adjustments to the cursor’s location. This functionality is facilitated by holding a handheld touchscreen device in one hand, where thumb touches control the mouse cursor on the screen.

2012: Gaze and gesture based object manipulation in virtual worlds (Slambekova, Bailey, Geigel; VRST ‘12)

Display: PC monitor
Input: Gaze + hand gesture
Summary: Investigates gaze pointing to a virtual element on-screen and issuing 3D grabbing gestures in mid-air for manipulation. Includes two-handed 3D manipulations for dragging, rotation and scaling of objects.

2013: Still looking: investigating seamless gaze-supported selection, positioning, and manipulation of distant targets (Stellmach, Dachselt; CHI’13)

Display: Distant large projector
Input: Gaze + touch
Summary: Expands on the Look & Touch technique, by supporting dragging, scaling and rotation. The user can look at a target, use touches to refine the cursor position. As well, during drag & drop, the user can use their gaze to move a selected object to a new position.

2013: Interacting with Objects in the Environment by Gaze and Hand Gestures (Hales, Mardanbeigi, Rozado; ECEM’13)

Display: PC monitor, IoT objects in environment
Input: Gaze + hand gesture
Summary: Presents techniques to operate physical appliances in the environment. The technique involves 1–5 finger poses and their directional motion, which was mapped to window control on a screen and robot movements in the environment.

2013: Eye Pull, Eye Push: Moving Objects between Large Screens and Personal Devices with Gaze and Touch (Turner, Alexander, Bulling, Schmidt, Gellersen; INTERACT’13)

Display: Large display + handheld tablet
Input: Gaze + touch (tablet)
Summary: Presents techniques to move object between large and personal device. The user can look at an object, and swipe down to move it down to their personal device. A similar touch gesture will return the object to the public display.

Example use cases for Eye Push, Eye Pull: users can pull an object by simply looking at that and performing a swipe gesture. Picture from INTERACT’13

2014: Cross-device gaze-supported point-to-point content transfer (Turner, Alexander, Bulling, Gellersen; ETRA’14)

Display: Large display + handheld tablet
Input: Gaze + mouse touch (tablet)
Summary: Investigates the use of gaze to transfer content between large and handheld displays. For this, the user points their eyes at the object, performs a touch (or mouse) action, then looks down to the tablet, and performs another touch (or mouse) action to finish transfer.

Setup to study how gaze allows to transfer target from large (1) to small (2) display, supported by touch (left) and mouse (right) actions. Picture from ETRA’14

2014: GaFinC: Gaze and Finger Control interface for 3D model manipulation in CAD application (Song, Choa, Baek, Lee, Bang; CAD’14)

Display: PC monitor
Input: Gaze + hand gesture
Summary: This work focused on 3D CAD application with hand gesture controls, and employed gaze to specify a zooming location.

The GaFinC system proposed a gesture interface that included gaze-based zooming. Image from CAD’14

Eyes & Hands for Touchscreens

2014: Gaze-touch: combining gaze with multi-touch for interaction on the same surface (Pfeuffer, Alexander, Chong, Gellersen, UIST’14)

Display: Touchscreen
Input: Gaze + Touch
Summary: This work also explored gaze and multi-touch gestures together, using the principle division of labour “gaze selects, touch manipulates” for simplicity. This offers users the familiar gesture set, simply redirected to the object defined by gaze. The simplicity afforded a variety of applications as demonstrated below, and supports neat transitions between direct and indirect gestures.

2015: An Empirical Investigation of Gaze Selection in Mid-Air Gestural 3D Manipulation (Velloso, Turner, Alexander, Bulling, Gellersen; INTERACT’15)

Display: PC monitor
Input: Gaze + hand gesture
Summary: The technique involves users looking at an object, and using a pinch gesture to move the object. The study showed that the acquisition time was significantly reduced over hands-only methods.

Velloso et al. studied a system where gaze and pinch manipulates 3D objects on a laptop. Image from INTERACT’15

2015: Gaze+RST: Integrating Gaze and Multitouch for Remote Rotate-Scale-Translate Tasks (Turner, Alexander, Bulling, Gellersen; CHI’15)

Display: Large projector
Input: Gaze + touch (tablet)
Summary: Given gaze to select an object, how can gaze be used to then move an object? The work investigates several variations between touch and gaze based object translation, and provides insights into integrality and separability of the multimodal inputs.

A user interacting with eye-gaze and multi-touch gestures on a wall-sized projection screen. Image from CHI’15

2015: Gaze+Gesture: Expressive, Precise and Targeted Free-Space Interactions (Chatterjee, Xiao, Harrison; ICMI’15)

Display: PC monitor
Input: Gaze + hand gestures
Summary: The paper proposes to combine various hand gestures with gaze pointing for desktop PCs. For precise mouse pointing, a MAGIC-like approach is employed where the eyes point, and dragging gestures refine the cursor point. As well, several other gestures are supported to extend the command set of the user.

2015: The Costs and Benefits of Combining Gaze and Hand Gestures for Remote Interaction (Zhang, Stellmach, Sellen, Blake; INTERACT’15)

Display: Large display
Input: Gaze + hand gesture (grab and flick)
Summary: Presents a study of an interaction concept for a photo sorting task. The user gazes at an element on-screen, and uses hand grab and flick gestures to manipulate the object. The study showed benefits over an gesture-only approach, as users preferred the method, but it also led to a few errors when transitioning from gaze-hover to hand-selection.

Setup and interaction concept of Zhang et al., INTERACT’15

2015: Gaze-Shifting: Direct-Indirect Input with Pen and Touch Modulated by Gaze (Pfeuffer, Alexander, Chong, Zhang, Gellersen; UIST’15)

Display: Pen + touch display
Input: Gaze + pen + touch
Summary: Investigated the coupling between gaze and a stylus for design tasks, such as to allow rapid access to peripheral menus. As well, explored three-modal interactions with gaze, pen, and touch. The central focus is the Gaze-shifting concept, allowing users to seamlessly shift between direct and indirect modes of touch (and pen) gestures.

2016: Three-Point Interaction: Combining Bi-Manual Direct Touch with Gaze (Simeone, Bulling, Alexander, Gellersen; AVI’16)

Display: Large touchscreen
Input: Gaze + touch
Summary: The technique allows interaction with advanced tasks where users manipulate 3 points in parallel. The points are each defined by the gaze, the right hand’s index finger, and the left hand’s index finger. For example, this can be useful for specifying a cube in all three dimensions.

Three-Point Interaction interprets both hands’ touches and the gaze point for parallel 3 point manipulations. Image from AVI’16.

2016: Partially-indirect Bimanual Input with Gaze, Pen, and Touch for Pan, Zoom, and Ink Interaction (Pfeuffer, Alexander, Gellersen; CHI’16)

Display: Large touchscreen
Input: Gaze + touch + pen
Summary: This study explores asymmetric bimanual interaction using pen and touch, focusing on utilizing gaze to select the pivot for indirect touch-based pinch-to-zoom with the non-dominant hand. Combining direct pen and indirect touch avoids physical interference when both hands manipulate directly. Additionally, continuous gaze at the point of interest enhances zoom accuracy, reducing the need for frequent panning actions.

2016: Gaze and Touch Interaction on Tablets (Pfeuffer, Gellersen; UIST’16)

Display: Tablet
Input: Gaze + touch
Summary: This work explored gaze and touch interaction on a mobile touchscreen device in form of a tablet. This is focusing on a more specific but realistic context, when holding the device with one hand. Users mainly interact with brief thumb presses of the gripping hand, with which users can interact with the whole screen through their gaze. The work also proposes Cursor-Shift, a method that allows switching between direct manipulation and precise cursor control, e.g., for browser applications.

2016: GazeArchers: playing with individual and shared attention in a two-player look&shoot tabletop game (Pfeuffer, Alexander, Gellersen; MUM’16)

Display: Large touchscreen
Input: Gaze + Touch
Summary: A towerdefense game is presented that uses the eyes and touch inputs for a look&shoot game. Various enemies are running up the tabletop-screen, and have to be defeated by a look and a tap. Some game units have shields that are directed toward a looking user, requiring teamwork to accomplish the game.

Eyes & Hands for Virtual Reality

2017: Gaze + Pinch Interaction in Virtual Reality (Pfeuffer, Mayer, Mardanbegi, Gellersen; SUI’17)

Display: 3D / VR
Input: Gaze + pinch gesture
Summary: This work proposed the gaze + pinch paradigm for VR, where the eyes select targets and the hands specifically perform pinch gestures. At its core lies a principle division of labour of gaze selects, hands manipulate, which simplifies the interaction through familiar gestures that can be triggered over distance. The work also offers a blueprint for an operating system for buttons, menus and applications. Numerous example prototypes demonstrate the power of the interaction paradigm, that resembles Apple Vision Pro’s UI.

2017: Understanding the impact of multimodal interaction using gaze informed mid-air gesture control in 3D virtual objects manipulation (Deng, Jiang, Chang, Guo, Zhang; IJHCS’17)

Display: PC monitor
Input: Gaze + hand gesture
Summary: The paper tackes the misperception problem for manipulating gaze-selected objects by hand gesture. I.e., the hand can get out of range when performing indirect dragging gestures. The authors study the problem with several variations and provide insights into their usability.

This paper investigates the problem of an offset between hand and gaze point of indirect gestures.

2018: Pinpointing: Precise Head- and Eye-Based Target Selection for Augmented Reality (Kyto, Ens, Piumsomboon, Lee, Billinghurst; CHI’18)

Display: 3D / AR
Input: Gaze + controller, gesture
Summary: This work proposes a set of precision selection techniques for systems that require precise eye input. Users can select a button by looking at it, and then using a freehand pinch or controller movement to refine the initial gaze cursor.

2019: GazeButton: enhancing buttons with eye gaze interactions (Rivu, Abdradou, Mayer, Pfeuffer, Alt; COGAIN@ETRA’19)

Display: Touchscreen
Input: Gaze + touch
Summary: Studied how general (touch) buttons can be extended with eye-tracking. Buttons can be enhanced from 2 states (click on button vs click somewhere else) to 4, as each state can distinguish further if the user looked at the button or somewhere else. Demonstrated the concepts in a GazeButton for a keyboard to allow various functionality.

2019: EyeSeeThrough: Unifying Tool Selection and Application in Virtual Environments (Mardanbegi, Mayer, Pfeuffer, Jalaliniya, Gellersen, Perzl; IEEE VR’19)

Display: 3D / VR
Input: Gaze + controller
Summary: EyeSeeThrough is a user interface technique that uses the eyes and controller input in concert to apply modes to objects in the 3D scene, such as to specify a new color to a graphical object. By aligning the hand menu with the target object in the user’s line of sight and performing a button click, EyeSeeThrough enables users to intuitively apply modes by “seeing through” the mode toward the target, without the need for a sequential two-step process of mode selection and application. A button click or dwell-time confirms the action.

2020: Empirical Evaluation of Gaze-enhanced Menus in Virtual Reality (Pfeuffer, Mecke, Delgado Rodriguez, Hassib, Maier, Alt; VRST’20)

Display: 3D / VR
Input: Gaze + controller
Summary: This work explores gaze in conjunction with a controller for compound drawing and menu tasks. Focus is on asymmetric bimanual interaction: the dominant hand draws with direct manipulation, and the non-dominant hand holds a color palette menu. Gaze is used to implicitly change menu modes when looking at them. In their study, direct manipulation by the controller led to the highest performance, and the gaze techniques are comparable to pointer-based menu selection, with less physical effort.

2021: Gaze-Supported 3D Object Manipulation in Virtual Reality (Yu, Lu, Shi, Liang, Dingler, Velloso, Goncalves; CHI’21)

Display: 3D / VR
Input: Gaze + controller
Summary: Here, the user can look at objects and use the controller to select and drag the object in space. Several translation methods are studied that mix eye and controller inputs to rapidly move objects. No significant differences were revealed in user performance between controller and controller+gaze.

2021: Pinch, Click, or Dwell: Comparing Different Selection Techniques for Eye-Gaze-Based Pointing in Virtual Reality (Mutasim, Batmaz, Stuerzlinger; ETRA’21)

Display: 3D / VR
Input: Gaze + button, pinch gesture
Summary: The work evaluated three ways to confirm a target that has been fixated by the eyes. Results showed that both pinch and button-click confirmation led to higher performance than an eyes-only dwell-time approach.

The Pinch, Click, or Dwell work investigates different confirmation methods for gaze pointing in VR. Picture from ETRA’21.

2022: Gaze-Hand Alignment: Combining Eye Gaze and Mid-Air Pointing for Interacting with Menus in Augmented Reality (Lystbæk, Rosenberg, Pfeuffer, Grønbæk, Gellersen; ETRA’22)

Display: 3D / AR
Input: Gaze + hand alignment, gaze + pinch
Summary: This work proposes a new eye-hand selection mechanism where users spatially align their eye gaze ray with a ray defined by the hand. With Gaze & Finger, users move their index finger in line of sight. With Gaze & Hand, a hand-ray cursor is aligned with the gaze position. An evaluation showed that both methods are on par with gaze + pinch, and all multimodal techniques were superior to a hands only approach (point & pinch).

2022: Exploring Gaze for Assisting Freehand Selection-based Text Entry in AR (Lystbæk, Rosenberg, Pfeuffer, Grønbæk, Gellersen; ETRA’22)

Display: 3D / AR
Input: Gaze + hand alignment
Summary: This work investigated the gaze-hand alignment concept for text entry, which is a different problem with selection frequency. The original techniques were extended to make them work for text entry, using specific timers and constraints. A study compared the technique to direct tapping on the keyboard. Results show that it leads to significantly less physical movement, without compromising performance.

2022: Look & Turn: One-handed and Expressive Menu Interaction by Gaze and Arm Turns in VR (Reiter, Pfeuffer, Esteves, Mittermeier, Alt; COGAIN@ETRA’22)

Display: 3D / VR
Input: Gaze + pinch, hand turn motion
Summary: This project centers around enhancing expressive drawing tools alongside a sophisticated hand menu interface. Users can seamlessly control the hand menu using both gaze and hand gestures. For precise input, users simply fix their gaze on a target, pinch, and smoothly adjust the desired value by rotating their arm until reaching the exact setting.

2023: A Fitts’ Law Study of Gaze-Hand Alignment for Selection in 3D User Interfaces (Wagner, Lystbæk, Manakhov, Grønbæk, Pfeuffer, Gellersen, CHI’23)

Display: 3D / AR
Input: Gaze + Pinch, Gaze + Finger, Gaze + Handray
Summary: This work provides empirical evidence of gaze + pinch interaction in contrast to other multimodal and hands-only techniques in 3D. Results show that all eye-hand techniques led to superior performance than the manual techniques (hand-ray, headcrusher).

2023: PalmGazer: Unimanual Eye-hand Menus in Augmented Reality (Pfeuffer, Obernolte, Dietz, Mäkelä, Sidenmark, Manakhov, Pakanen, Alt; SUI’23)

Display: 3D / AR
Input: Gaze + pinch gesture
Summary: Users can interact with a handheld menu using gaze and pinch. The focus is on mobile interaction, particularly one-handed, akin to holding and interacting with a smartphone. Users initiate the UI with a palm-up gesture and subsequently issue commands through gaze and pinch gestures with the same hand.

2023: Compass+Ring: A Multimodal Menu to Improve Interaction Performance and Comfortability in One-handed Scenarios (Chen, Guo, Feng, Chen, Liu; ISMAR’23)

Display: 3D / AR
Input: Gaze + Pinch, Gaze + Finger, Gaze + Handray
Summary: Compass+Ring is a pie menu design based on gaze, voice, and hand rotation for efficient and low-effort interaction. To interact, the technique supports three steps: 1) the user looks at one of the pie menus making it move closer, 2) they confirm their choice by saying “select”, and 3) they rotate the wrist to specify a mode.

The Compass+Ring menu allows users to summon and select menu items through multimodal gaze, speech, and hand rotation inputs.

2023: GazeHand: A Gaze-Driven Virtual Hand Interface (Jeong, Kim, Yang, Lee, Kim; ACCESS’23)

Display: 3D / AR
Input: Gaze + pinch gesture
Summary: This work presents a technique where gaze indicates a distant target, at which then the virtual hands are warped closely to the target. The idea is that it will add more precision in the far space, in contrast to a gaze-selection of far targets. The study showed that a head-pointing variant was found as more stable.

2023: Exploring gaze-assisted and hand-based region selection in augmented reality (Shi, Wei, Qin, Hui, Liang, Hai-Ning; Proc. HCI’23)

Display: 3D / AR
Input: Gaze + finger, pinch
Summary: This work inspects region selection, a two-step approach to define two corners to specify a rectangular area. In this specific evaluation, the unimodal techniques outperformed the multimodal ones.

For the task of region selection, this paper studied four different techniques based on eyes & hands.

The end is the beginning

The quest for eye-hand interaction went on for decades across PC, touch devices, and XR environments, culminating in the inclusion in the Vision Pro. But it may just be the beginning to a new era in human interaction, where our eyes will take a major role in many of the actions that have been reserved for our hands so far.

Would you like to know more? Check out my recent articles on Design Principles and Issues for Gaze + Pinch, the PalmGazer work, and the Eye-Hand Symbiosis project.

Note: I do not claim any rights for the videos and images in this article. All videos are on linked to Youtube, and all images have been taken from the respective research paper.