Week 4 — Eye Tracking and Prior Knowledge

Published in

AIN311 Fall 2022 Projects

2 min readJan 8, 2023

Hello all, this week we put our bag of words representation to good use by combining it with a Naive Bayes implementation we hand-crafted specifically for our project.

Because our research question is “Is there a discernible relationship between prior knowledge and reading pattern?” the Bayesian nature of the algorithm fits well in the sense that algorithm will attempt to accurately classify the bag-of-words representation of each experiment (a.k.a. reading of the sample text). Which is to say that algorithm will be good at discerning whether if the subset of words on which the most eye fixations made upon are different between a person who is familiar with the knowledge presented in the text as opposed to a person who knows nothing about the topic of the presented sample text.

Another reason we chose to use Naive Bayes is that our number of data points will be low during the initial phase, as we plan to gather data from 20 people, corresponding to 60 data points in total. Accordingly, we have planned to use three sample paragraphs (each about 100 words long) that we gathered from Wikipedia. The texts we have chosen are the following: the French Revolution, Moai Statues, and the World Cup.

Our custom Naive Bayes implementation can be seen below:

def NaiveBayes (bagOfexperiment, X_train, y_train):
    bagofBags = createBagofBags(X_train,y_train)
    total_unique_words = getTotalUniqueWords(X_train)
    outputDict = dict.fromkeys(bagOfbags.keys())
    N = len(X_train)
    for category in bagOfbags.keys():
        genreBag = bagOfbags[category]
        total_size_of_class = returnSizeofCategory(category, bagofBags) # count(c)
        N_c = returnNc(category, X_train)
        prior = np.log(float(N_c/N))
        for word in bagOfexperiment:
            try:
                word_count_in_class = genreBag[word] # count(w,c)
            except:
                word_count_in_class = 0
            for i in range(int(bagOfexperiment[word])):
                P_w_c = float(word_count_in_class + 1)/float(total_size_of_class + total_unique_words)
                prior += np.log(P_w_c)
        #print('Value for ',genre,': ',prior)
        outputDict[category] = prior
    max_key = max(outputDict, key=outputDict.get)
    return max_key

What makes this implementation unique is that it uses a “bag of bags of words” representation we came up with. Basically, it is a nested python dictionary where bag of words of each experiment (see Week 3 for reference) is stored under their own respective classes, which is the prior knowledge level ranging from 1 to 5. This data will be collected from the subjects after they are done reading the sample text by asking the subjects to rank their knowledge on the subject of the text they have just read.

That’s it for now, see you next week where we talk about other ML models we will use in our project!

Edit: Weekly blog posts will be shared by Ali Utku Aydin from this point on. Here is the blog post of Week 5 written by him: Week 5 — Eye Tracking and Prior Knowledge | by ali utku | Jan, 2023 | Medium

Week 4 — Eye Tracking and Prior Knowledge

Written by Alper Özöner