
A Large-scale Gaze Tracking Dataset, Method, and Application for Robust 3D Gaze Estimation
Towards Robust Gaze Estimation
This research summary is just one of many that are distributed weekly on the AI scholar newsletter. To start receiving the weekly newsletter, sign up here.
Gaze direction is an important cue for guiding conversations and other social interactions. It helps to understand people's intents, desires, state of mind, interest, and attention in social settings.
The ability to accurately estimate the human gaze direction also has many applications in assistive technologies for people with physical impairments, human-computer interaction, augmented reality, virtual reality, consumer behavior research, visual attention analysis, and more.
In the past, gaze estimation has been achieved through specialized hardware. But thanks to deep learning-based techniques, some advanced steps towards fully unconstrained gaze estimation have been achieved. For instance, researchers have so far managed to achieve high accuracies to variations in gaze, head pose, and image quality. However, challenges such as achieving highly accurate and highly varied gaze data estimates still remain.
Gaze360: Physically Unconstrained Gaze Estimation in the Wild
In this newly published paper, researchers present an approach to help deal with the gaze estimation task and narrow the existing performance gap. First, they describe a method to collect annotated 3D gaze data efficiently in arbitrary environments. They then use the method to obtain one of the largest 3D gaze data set which they are calling Gaze360. Hence, Gaze360 is a large-scale gaze-tracking dataset and method for robust 3D gaze estimation in unconstrained images.

It comprises video content of 238 subjects in indoor and outdoor environments with labeled 3D gaze across a wide range of head poses and distances. According to them, it is the largest publicly available dataset of its kind by both subject and variety.
Gaze Estimation Models
The researchers also train a variety of 3D gaze estimation models on the dataset before finalizing on a model that uniquely takes a multi-frame input and employs a pinball regression loss for error quantile regression to provide an estimate of gaze uncertainty.
Gaze360 was evaluated versus conventional datasets by means of a cross-dataset model performance comparison. Not only that, the researchers then go ahead and show how the model can be applied to real-world use cases including the estimation of a customer’s focus of attention in a supermarket.
Why Does this Matter?
This work basically demonstrates a methodology that can be used to help collect annotated gaze data at scale and using it to generate a large and diverse dataset that’s suitable for deep learning of 3D gaze from images and videos. Its value is demonstrated through cross dataset performance comparison versus three existing 3D gaze datasets, as well as through the application to unconstrained unseen imagery from YouTube videos.
Both quantitative and qualitative evaluation results show that the proposed approach achieves higher accuracies than the state-of-the-art methods and is robust to variation in gaze, head pose, and image quality.
The researchers hope that the application of the model and dataset across a range of fields will help better leverage gaze as a cue to improve the vision-based understanding of human behavior. I think the work goes a long way to help improve existing gaze estimation literature and models and has significant potential to help achieve robust 3D gaze estimation.
Dataset can be accessed here: http://gaze360.csail.mit.edu/
Read more: Physically Unconstrained Gaze Estimation

