Natural Language Processing: can a computer solve SAT analogy questions?
I use this space to write about mostly geospatial data and maps. Most of the time I just want to share some PostGIS or Python code snippets that I’ve used that I hope other people might find helpful. But now, it’s time for something completely different. My first foray into the world of Natural Language Processing!
I just watched the first lecture of this Stanford course on natural language processing. I was curious if I could apply what we learned about word vectors to solving SAT analogy questions. These questions disappeared from the SAT in 2005, but I remember learning how to solve them (and hating them). Could a computer learn too?
I searched for some sample questions, and found a source here. This is what a SAT analogy question looks like:
ENTICE: REPEL
A. GERMINATE : SPROUT
B. FLOURISH : FADE
C. OFFICIATE : PRESIDE
D. LUBRICATE : GREASE
E. IMPLORE : ENTREATThe answer is B. You can approach this analogy question by thinking “entice is to repel as __ is to __”.
For my experiment, I usedgensim, a library for similarity modeling, and GloVe, which is a collection of word vectors trained at Stanford from a giant corpus of Wikipedia, web-crawling, and Twitter. There was a whole bit on multivariable calculus and derivatives to minimize the cost function for creating the vectors in the lecture — the word2vec algorithm. But I immediately jumped ahead and wondered “can you solve SAT analogies with this?”.
Here’s how I decided to approach the problem. For each question prompt, I would cycle through the first word in all the options pairs and determine the analogy usinggensim. These answers are open-ended, and sometimes there isn’t one found. Of all the options, I would choose the option where the generated analogy was most “similar” (again determined by the model) to the second word in the option.
guesses = []
for i, prompt in enumerate(prompts):
best_answer = None
current_score = 0
for n, option in enumerate(options[5*i:(5*i)+5]):
possible_answer = analogy(prompt[0], prompt[1], option[0])
# is it better that current best answer?
score = model.similarity(possible_answer, option[1])
if score > current_score:
best_answer = n
current_score = score
guesses.append(best_answer)
print("{} is to {}, as {} is to {}".format(prompt[0],prompt[1],options[5*i:(5*i)+5][best_answer][0],options[5*i:(5*i)+5][best_answer][1]))drip is to gush, as cry is to laugh
walk is to legs, as dress is to hem
enfranchise is to slavery, as equation is to mathematics
topaz is to yellow, as sapphire is to red
lumen is to brightness, as inches is to length
maceration is to liquid, as evaporation is to humidity
clumsy is to botch, as wicked is to insinuate
fugitive is to flee, as braggart is to boast
chronological is to time, as ordinal is to place
soot is to grimy, as rain is to sodden
morbid is to unfavorable, as reputable is to favorable
sullen is to brood, as lethargic is to cavort
author is to literate, as judge is to impartial
massive is to bulk, as gigantic is to size
entice is to repel, as implore is to entreat
humdrum is to bore, as heartrending is to move
hospitable is to courtesy, as infamous is to honor
reinforce is to stronger, as erode is to weaker
braggart is to modesty, as embezzler is to greed
The answers, taken all together, reads almost like a poem. And hey, I got an accuracy of 47.4%! This is way better than just random guessing, even for an algorithm I wasn’t sure even made the most sense. I also tried another method looking at the similarity of the two pairs all together using n_similarity, but I didn’t get nearly as good a score (~21%).
According to this wiki page from the Association for Computational Linguistics, “human voting” is the best algorithm for solving these kinds of problem (accuracy of 81.5%). And it seems that the best computer algorithms use some corpus-based method, but have not gotten close to human-voting accuracy. So I’m pretty happy with my solution, and if I have time I’ll think of how else to improve the method (maybe using larger dimensional word vectors, training on SAT prep books or vocabulary book corpus, exploring other similarity scoring methods).
I was also thinking about creating a visualization for these word vectors. For now, here’s a scatterplot of the word vectors from the analogy questions projected onto their first two principle components. It’s kind of hard to read, but it’s cool to see how the colors group together in the top right corner, and you can find different gems (sapphire, amethyst, amber, diamond…) near by.

Code here: https://github.com/michellemho/nlp/blob/master/Gensim%20word%20vectors%20SAT%20analogy.ipynb