An Intro to Fovea, Using Eye Tracking for Interactive Content Delivery

Sharath Koorathota

Published in

Fovea

5 min readJul 3, 2020

This is part 1of a series about Fovea, a new system that uses eye-tracking for interactive content delivery.

Contributed by Patrick Adelman and Sharath Koorathota — July 03, 2020

We’re sharing updates as we head towards our launch and in this post, we present an overview of our platform, describe the technology used, and discuss our current and future strategy.

Intro

Fovea is a cloud-hosted platform that enables content creators to upload content, deliver that content interactively to viewers, and analyze aggregated viewing patterns.

Content creators could be employers conducting remote training, news outlets putting together curated content, or retailers displaying inventory in store. Fovea can be used by anybody who creates content and cares about how viewers see and interact with that content.

Platform: How It Works

Content is uploaded to the cloud by the user. This could be video, training documents, graphics, photos, or any other form of media. State-of-the-art processing algorithms are used to analyze the content and prepare it for viewing.
Viewers enable the standard webcam on their laptop or stand in front of a display with a simple camera. No special software is needed, Fovea can run entirely through a web browser. If desired, Fovea can verify which user is currently watching — for cases of remote training or online education, this will be useful.
While the viewer is viewing the content, Fovea tracks and processes their eye movements in real time, in connection with the analysis from step 1. This allows Fovea to understand exactly what is drawing a viewer’s attention and serve content based on that.
Analytics are generated for the user to determine which content was effective and how to better design viewer experiences going forward.

Pre-Processed Content Data

We use machine learning algorithms such as object detection and natural language processing to learn about what objects, faces, text, or other metadata may be contained within content. When we apply our eye tracking algorithms during viewing, this information is key to determining what it is on the screen that a user is paying attention to.

At a high level, we’re able to parse text, faces, object information, and much more.

Facial detection, object recognition, and natural language processing are all used to extract metadata from content.

Eye Tracking Data

To collect eye movement data, the viewer must enable their webcam and have their face visible for viewing. Because the webcam is enabled, requiring a great deal of trust on the part of the viewer, we are vigilant about privacy. The only information we collect from the viewer’s web cam are the X and Y coordinates of where their gaze is located on the screen, as demonstrated below.

The software that processes face data to determine the viewer’s gaze location runs locally (i.e. client-side). The raw images are not transmitted back to Fovea’s servers.

The red dot represents the X and Y position of a viewers gaze.

Data is sampled at a variable rate depending on the viewer’s computer setup, camera, and lighting conditions, and can range from approx. 5–10 Hz. While research grade eye trackers can sample at 100+ Hz, the benefits of tracking eye movements with no special hardware far outweigh the cost of dealing with less data. Given the data that we’re able to collect, we’re still able to make fairly accurate assessments of where a viewer is looking. Of course, over time, the algorithms used can be improved.

How Are We Testing?

We’re currently testing our platform prototype using Amazon’s Mechanical Turk. In addition to economic efficiency, MTurk provides us with the opportunity to test our platform with completely anonymous users, under various conditions.

We are currently testing the use case of online safety training. This entails the viewer watching a sample OSHA safety video about workplace signage. The viewer answers quiz questions throughout the video, which are queued up based on which areas of interest they don’t focus on. At the end, they are tested on their knowledge of the whole video. And afterwards, the participants fill out a survey about the experience.

Our first round of testing, done with a sample of 10–15 participants, is designed to calibrate future testing. We are seeking feedback about how easy it is to interface with our forward-facing user experience, which technical problems occur during eye movement tracking, as well as how well they actually do on the quizzes and test given the new eye tracking feature.

So far, of course, the core functionality of our prototype is stable. You can read about our results here.

What Are We Learning?

Eye tracking is a tool that is technically challenging to implement and the analysis of eye tracking patterns is a vast and deep field to navigate. Eye tracking on a standard webcam with an individual thousands of miles away in a non-lab setting makes the challenges even tougher.

Perhaps most importantly at this point, we feel confident that webcam-based eye tracking methods can be used and improved in a commercially viable product. In the relatively short time that we’ve been working on Fovea, we have already made improvements to the eye tracking algorithms, making a noticeable improvement in accuracy relative to other groups.

Major developments in machine learning, computer vision, and eye tracking that are sure to happen in the coming years will only help to improve Fovea.

We will continue to post here on Medium and on our website (https://www.foveainsights.com) our future progress and studies.