Week 2— Eye Tracking and Prior Knowledge

Alper Özöner
AIN311 Fall 2022 Projects
3 min readNov 27, 2022

by Alper Özöner and Ali Utku Aydın

This week, we have made some progress on the model selection as well as the representation of the eye tracking data we obtained from our EyeTribe eye tracker. First off, we have finally solved the problem of reading the raw data and its formatting. Amusingly, the formatting of the raw .txt file was just json all along, so we count that as the input formatting done. This means that we can now focus on more interesting things. Let’s go!

First thing we implemented after the above problem was solved was to automatically create histograms for each row of the text input. For clarity, you can see a lorem ipsum sample text below:

The general layout of a sample paragraph that the participants will see during the actual experiment. Each paragraph will roughly have 130 words, and will consist of 14 rows of text.

Because the placement of the text is fixed on the screen, we can quickly determine which datapoints belong to which one of the 14 rows. For that, we have used a simple kMeans algorithm with n_clusters = 14 along the y-axis only. Before we see the classification outcome, let’s see the raw data and what it tells us. Creating a simple scatter plot we obtain the following graph:

This is the raw data of me (Alper) reading the above lorem ipsum text on Latin. I don’t know Latin, so it works as a good baseline as I read the entire paragraph rather slowly and meticulously.

As for the data, there are about 350 foci (fixation of the eye while reading as opposed to saccades, which is sporadic eye movement during reading). Thus, data only has x and y columns in it. Rows on the above plot is for us to see the flow of the reading, which unsurprisingly, follows the rows from left to right. Please note that this is the entire 45 second reading from the sensor, as we did not exclude any part of the data, this can be seen from the large cluster of points on the bottom left which is when I was looking at the button to stop recording, so it is a little noisy, but it is good enough for a first test. Here is my favorite part. The part where we get to see cool representations of the data. Lo and behold, the kMeans classification visualization I absolutely did not spend too much time on:

Each point on the scatter is automatically mapped to one of the 14 rows. In this way, we can automate the training process since the model will take into account the reading patterns of participants for each distinct row.

Discussion of Possible Techniques with Mr. Erdem

Bag of Words vs. Histogram Representation

I have had a chance to meet with Mr. Erdem in his office during the week, and he suggested us that we can use the bag of words representation to keep track of how many fixations there are for a given word. Although I agree with him that this representation is more helpful in atomizing the data, that is, enabling us to keep track of the individual words instead of analyzing the read flow of a row, it falls short when it comes to take into account the time aspect of the input data. The current method, the one you see above, has the advantage of retaining the order of the foci, since it is a timeseries data ordered by readings where lower indices naturally correspond to earlier readings in the data table.

Let us know what you think about the possible techniques we have mentioned. Do you think we should keep the row-by-row representation or switch to bag of words instead? Let us know down in the comment section!

--

--