Learning Common Sense

It’s hard for me to describe how frustrated I am by my inability to SOLVE the Winograd Schema Challenge. The “Test” is a list of multiple choice questions which are super easy for us humans to solve, but nearly impossible for the state of the art computers to resolve. This questions set is often proposed as a replacement for the more famous Turing Test, and for good reason. We want our computers to achieve true intelligence, not fake human behavior.

Here is an example from the test, which is named after this guy: https://en.wikipedia.org/wiki/Terry_Winograd

The town councilors refused to give the demonstrators a permit because they feared violence.
— Who is They? The town councilors or the demonstrators?

There is nothing in the WORDS of the sentence to tell us the answer. There is nothing in the STRUCTURE of the sentence that can give us the answer. The answer emerges only when we apply the magic ingredient called “Common Sense”. And that annoys me to the point that I find it hard to focus on other things. Allow me to explain.

I’ve been doing Natural Language Understanding for the past ~8 years. The collection of algorithms fall into the category of Artificial Narrow Intelligence, or “Weak AI”. The system we have built acts intelligently for a specific domain. It has built in a bunch of “Business Logic” that acts as though the Bot has common sense, and for all practical purposes — it does. We have everything from wild heuristics that just work to state of the art Deep Neural Networks that learn from the data with no feature engineering, but there is nothing in my arsenal that can solve an unseen Winograd.

Here is another example:

The trophy would not fit in the brown suitcase because it was too big
— What was too big? The trophy or the suitcase?

Why is the answer so obvious to us and so hard for computers to resolve?

An interesting characteristic of many of these challenges is that replacing a single word changes the answer to the question. Consider this variant:

The trophy would not fit in the brown suitcase because it was too small

Obviously the answer is now reversed despite the fact that the sentence structure looks the same and I replaced one simple adjective used to describe the size of objects with another.

BTW, this is a very simple sentence. Consider this variant:

Dan tried to shove his new Golf trophy into the old brown suitcase but would not fit because it was too small

This version has many more words to confuse the natural language processing, yet when I read the sentence, I have no problem VISUALIZING the situation and answering the question. And therein lies the rub: my algorithms do not know how to visualize these mini-stories.

I have some thoughts on how to proceed, but caveat emptor — these have not been validated yet on any data.

The Trophy problem hinges on our understanding of the “fit in” relationship. We all know that these relationships describes an inner object and a container. The container will not fit if it is too small or if the given object is to big. Simple, right? (check out RDF Schema as a knowledge representation data model).

OOh — all we need is a big dictionary of objects and relationships! We will then apply this magic thingy on the sentence and be able to resolve it! Well, it turns out we already tried that once and the approach did not work (check out Cyc).

See, nothing in natural language is as simple as it first appears. Consider:

Dan shaved his mustache on the first day of college because he feared he would not fit in

Is college the container? Was the mustache really that big? Obviously the “Fit In” expression has hid a new trick up its sleeve, and there are more where this one came from. Here is another one:

Dan finished detailing his master plan. “Where do I fit in”, I asked.

We are challenged with reasoning about Things and Relationships faced with noisy ambiguity. Words and expressions have different Senses (read more: https://en.wikipedia.org/wiki/Word_sense) which we need to consider when all we want to apply is simple (to us) logic.

There has been a breakthrough in Natural Language understanding which, I believe, may assist us, and it is called Word Embedding.

Earliest papers are from 2003 by Yoshua_Bengio and friends (“and friends” sounds much nicer than the latin “et al.”) but the main breakthrough came from a guy called Tomas Mikolov, at Google, who used an obscene amount of text and applied an unsupervised learning algorithm based on a clever neural network to convert words to vectors of numbers. The algorithm learns, by itself (unsupervised machine learning) to translate the word “Computer” into this:

array([-0.00449447, -0.00310097, 0.02421786, …], dtype=float32)

Side note: I’m guessing Tomas Mikolov is not a programmer and his original code looks like crap. Luckily there is an elegant (and efficient!) implementation in Python, here — http://rare-technologies.com/word2vec-tutorial/

These embeddings can magically do math with words. Famous examples are:

KING-MAN+WOMAN~=QUEEN

or

PARIS-FRANCE+POLAND~=WARSAW

~= means that the closest word vector to the resulting math expression is the vector for the given word.

This looks promising. Can we apply word embeddings to the entire sentence? Sentence2vec? And then use some sort of Inference Engine to resolve our questions?

I hope so, because I’m spending my precious few sleeping hours exploring this path.

Next: on the challenges of embedding an entire sentence.