Week 3 — Eye Tracking and Prior Knowledge

Alper Özöner
AIN311 Fall 2022 Projects
2 min readDec 23, 2022

by Alper Özöner and Ali Utku Aydın

It is almost the end of week 4, but it is better late than nothing. This week, we have finally got rid of the representation problem (bag of words vs. histogram) of the last week. By using PyTesseract, an optical character recognition (OCR) tool for Python, we were able to automatically extract the coordinates of the bounding boxes for any given screenshot:

Another sample text taken from Wikipedia about French Revolution. PyTesseract succesfully extracted all the text and their bounding boxes as seen above.

If you remember from last week, we have the coordinates of the fixations in (x, y) tuples so we wrote the following block of code to automatically store the fixations that are inside the bounding box in a bag of words format:

insideCond = (eyeTrackdf["x"] >= x) & (eyeTrackdf["x"] < x + w) & (eyeTrackdf["y"] >= y) & (eyeTrackdf["y"] < y + h)
boundFixations = eyeTrackdf[insideCond]
for fixation in boundFixations:
try:
count = bag_of_words[str(text)]
count += 1
bag_of_words.update({str(text):count})
except:
bag_of_words[str(text)] = 1

Because we will represent each experiment run as a single bag of word, we have also implemented a cosine similarity function to calculate the similarity between different bag of words. This will enable us to give the model a metric to quantify the difference in reading patterns’ of our subjects, as they will each have unique “bag of words” depending on their prior knowledge of the topic at hand:

def cosineSimilarity(dictA, dictB):
intersection = dictA.keys() & dictB.keys()
dot_product = 0
for word in intersection:
dot_product += dictA[word]*dictB[word]
return dot_product / (magnitudeOfVector(dictA.values())*magnitudeOfVector(dictB.values()))

As expected, the cosine similarity of two identical bag of words will be 1.0:

bag_of_words_1 = bag_of_words.copy()
cosineSimilarity(bag_of_words, bag_of_words_1)
>> 1.0

After making small changes to the copy, the similarity score goes down:

del bag_of_words_1["of"]
del bag_of_words_1["which"]
del bag_of_words_1["general"]
bag_of_words_1["yeah"] = 10
cosineSimilarity(bag_of_words, bag_of_words_1)
>> 0.8414888911456696

All in all, this week comprised of fixing up the shortcomings of the previous week, but it should be said that we are more than excited to have laid the groundwork for finally starting our proof of concept experiments possibly during next week. Stay tuned for more!

--

--