Week 5 — Eye Tracking and Prior Knowledge

ali utku
AIN311 Fall 2022 Projects
2 min readJan 5, 2023

by Alper Özöner and Ali Utku Aydın

This week we gathered data from our fellow students on campus and preprocessed that data for Support Vector Machines (SVM) Algorithm which is our second algorithm in our project.

At first, we tried fitting a SVM model from sci-kit learn library, but it failed hard as we found out that our 1-D vector format was incompatible with it. The problem was that we had varying sizes of vectors for each experiment as the 1-D vector format did not represent words with 0 fixations in it, as it was converted from a bag of words. After realizing this issue, we quickly went to work.

We decided that we need a fixed size vector representation. This problem arose as a result of situations where subject “A” had a fixation on a word, for example, “revolution”, but the subject “B” does not have any fixations on the word “revolution”. In order to solve this problem, we made sure that each 1-D vector included the entire vocabulary text of their respective sample text.

#1
french_list = []
for bag_of_words in french_BoW_list: #Bag of words of each subject
french_list += list(bag_of_words.keys()) #gather all words across different experiments' bag_of_words
french_vocab_set = dict.fromkeys(set(french_list), 0) #initialize an empty dictionary with all counts as 0

Afterwards we re-construct bag of words for empty words.

#2
for bag_of_words in french_BoW_list: #looping each BoW for vector represantation
for word in bag_of_words.keys():
count = french_vocab_set[str(word)]
count += 1
french_vocab_set.update({str(word):count})

And bag of words was ready. We constructed a vectorization function for SVM ML model.

#3
def convertBoWtoVector(bag_of_words):
vector = []
for count in bag_of_words.values():
vector.append(count) #Appending counts of each word to vector
return vector

And now we are ready for vectorization of french dataset.

train_frenchNew=[]
idx=0
for bag_of_word in french_BoW_list:

x=convertToAllDict(french_BoW_list,bag_of_word) #This function is basically merged version of thhese 2 code blocks above (#1, #2)
x=sum(x)
train_frenchNew.append([x,train_french[idx][1]] )#train_french[idx][1] is basically prior knowledge rate out of 5 of the subject's
idx+=1

So after we constructed train_frenchNew, we converted it to a pandas dataset. And now we are ready to try this on SVMs. Also, we need to mention that we have a few data by this reason we used Leave One Out Cross-Validation for our model:


cv = LeaveOneOut() #Cross validation
svm=SVC()
X=dataset.loc[:, dataset.columns == 'X'] #X is vector scores and y is rates of each subjects (0-5)
y=dataset.loc[:, dataset.columns != 'X']
label_encoder = preprocessing.LabelEncoder()
X_transformed = label_encoder.fit_transform(X)
X=X_transformed.reshape(-1,1)

scores_svmacc = cross_val_score(svm, X, y, scoring='accuracy',
cv=cv, n_jobs=-1)

As we predicted, it gave a relatively low score(0.06) but it gave us more motivation to get a concrete results in following weeks. We plan to use random forest and tf-idf algorithm next week. Stay tuned!

--

--