Meaning Matching without Parts of Speech

John Ball
Pat Inc
Published in
8 min readJul 19, 2018

--

How to do NLU without POS

Real NLU matches the meaning of a word in a sentence, based on the meanings of the other words in the sentence. Today we review how NLU is progressing, using our experience in implementing a model based on linguistics, the science of language.

As usual, you can see the companion video with demonstrations on our YoutTube page: https://youtu.be/a14n1b9bd9I.

I view NLP as middleware: a discrete system, that interacts with sequences of symbols. If meaningful, NLU understands them in context and similarly, NLG responds with symbol sequences. The meaning of the response takes advantage of the known context, but in general the choice of response greatly exceeds the focus of language.

Why middleware and not a full solution?

First, the full solution is AI. D’uh. Turing demonstrated human intelligence with a simple wrong answer. The full solution would be more like the HAL9000, passing aspects of the Turing test, as it also plays chess .

And second, language is the system that communicates, learning all the time, even using previously unknown words. As an aside, most machine learning systems today violate this approach by loading first, and then operating on that static data.

Lastly, when you talk to an expert — say a doctor, engineer or linguist — you may not be able to follow them because of vocabulary and a deeper knowledge of the predicates they use, but you still speak your own language. A joke here illustrates: Mechanic says: “You have a failing compressor solenoid.” English speaker: “I don’t understand! OMG! I don’t speak English!”

The Dictionary Goal

The dictionary is a set of network associations to enable automatic recognition. When WordNet was created by the legendary psychologist and cognitive scientist, George Miller, the path to NLU was hinted at[i]. Miller, his wife and three others[ii] set out in 1985 to create an “automated dictionary” that was expected to provide educational benefits, but instead it became a resource for NLP that, probably due to losing battles against computational linguistics, is no longer being developed[iii].

Where Miller’s model hints at the future is with its network of associations. One level has the English words and phrases that make up headwords. They connect to a set of meanings (word-senses), semantic-level elements, that in turn connect to a single part of speech (that we factor out of our model completely). We will come back to the model when we have more time, but the key point is that the word sense is associated with the part of speech, not the word itself.

Where we are going with the dictionary is first to remove the parts of speech that duplicate the definitions, which also duplicate and confuse phrase patterns due to overlap (lost information reduces the clarity of phrases).

Start with table 1, representing forms of ‘destroying’.

Table 1. Samples of the predicate ‘destroy’ definition. Containing phrase underlined.

Today, the meanings are 4 separate dictionary definitions. For NLU we convert those to a single definition and associate a set of elements to control phrase matching correctly.

Table 2. Target definition of the predicate ‘destroy’. Only category is predicate (semantic terms meaning it relates arguments)

Here in table 2, there is one definition for destroy, which is a predicate (a semantic category that relates referents as arguments, in contrast to a referent which categorizes to a ‘kind of’). The definition of ‘destroy’ doesn’t need a gloss to describe it; it can generate a definition in a target language if needed based on its semantic associations. The relations that define the predicate will be covered in detail another time as will the attributes that are used in the phrase patterns to complete the meaning matching.

Our goal is to take the various parts of speech for single definitions and collapse them into a single semantically based one. Keep it simple!

Patom theory

Our NLU begins with Patom theory: a brain model in which the smallest representation can only store, match and use hierarchical, bidirectional linkset patterns. It took a long time to come up with that conclusion, in fact most of the 1990s. It is based on the observations of a variety of cognitive scientists, including neuroscientists and psychologists.

Patom theory looks primarily at what the brain does, not how it does it.

Linguistics application

If a brain works the way Patom theory explains, brain emulation should cater to element decomposition with representation as either sets or lists (as we have done in the meaning matcher).

For example, the word ‘cats’ is decomposed as the meanings of cat (referents, including a kind of animal), plus the meaning ‘plural’, plus the meaning ‘3rd person’. ‘Cats’ as a word is therefore like a set.

The phrase ‘the cats’ is composed of two pieces — the operator[iv] first indicating this is a new, accessible element for context and then the referent (we saw last time that there is a different pattern to recognize “the travelling” where the second word is a predicate). Phrases are lists which we will show in detail for acquisition and use over the next couple of articles.

Note that the meaning of a referent like this isn’t some fixed token like NP, but it retains the meanings of the word “cats” after intersection. So “the cats” is replaced in the matcher by the meanings of “cats”.

In modern mainstream linguistics, the problem of decomposition is evident in many places. The symbol NP for example loses the clarity of whether it is a predicate (e.g. the running) or a referent (the cat). It also loses the singular/plural feature (e.g. the cat, the cats).

Designs like the Penn Treebank use arbitrary combinations for English, like VB (verb base form), VBD (verb past tense), VBZ (verb first person singular present)[v]. For nouns there are a number like NN (noun singular or mass), NNS (Noun plural), NNPS (Proper noun plural) and PRP (personal pronoun).

The Slot Grammar[vi] takes a more decompositional approach but retains parts-of-speech including noun, verb and adj; and then extends them with features like vinf (infinitive), vpast (past tense) and vpers3 (third person). Rather than use the meaning of words, slot grammar adopts (for English) something like 90 ‘basic semantic types’. These types include artf (artefact), cognsa (cognitive state or activity), and cty (named city).

The alternative is to retain the sets of information. If you used noun as an element (we don’t) you could track the set of active elements to avoid language by language redesigns.

As meanings of words are sets (a word often has multiple meanings), intersection is the set concept in which only elements that are in both sets are retained. So we have the set of elements in the phrase and the set of elements in the word to intersect, effectively starting the process of Word Sense Disambiguation (WSD).

WSD allows ambiguous words to be resolved based on syntactic properties (phrase patterns) or predicate properties (semantic patterns).

Example of Word Sense Disambiguation

Let’s look at an example of linkset intersection in action with some simple sentences because a picture is worth a thousand words. “The water ran” can be compared with “the man ran”.

The meaning of ‘ran’ is synonymous with ‘flow’ in the first case and with ‘move quickly using legs’ in the second case.

Here, the meaning of ran is seen — ‘p:flow2’ (with its gloss, “move along, of liquids”). Intersection compares the actor (water) for this activity with its predicate (ran) as predicates determine their arguments. The concept has also been known as “selectional restrictions”, but without the flexibility we have here.

Here, the intersection of ‘r:man’ with ‘ran’ leaves the motion predicate p:run36. In more complex sentences, meaning is retained with multiple predicates possible when their arguments also overlap. All senses are retained and, as usual, can be resolved in context or questioned.

In this next sentence, we add an embedded proposition (the man the dog saw) and tense and aspect and some ‘how’ types: “the man the dog saw has been running continuously evidently”.

Note that in a more complex sentence, the meaning is retained, unchanged from the simpler one.

There can be more than one valid interpretation, and so context is needed for the next level of resolution.

In context, the ultimate intersection tool is available — the clarifying question. Think of Mr. Fantastic, Reed Richards, from the Fantastic 5 (the elastic man). Now understand “Mr. Fantastic ran”. Does that mean he melted or he moved fast on his legs? As both possibilities can be recognized in the sentence, we can resolve the meaning from context, or validate with something like: “You mean, he melted?” Once context is unambiguous, all parties to the conversation have an unambiguous representation to build on the definitions of those words (learning).

The Cup is Shattered

In figure 1, the predicate ‘shattered’ shows a variety of semantic representations in different sentences. While the first form and the third is different (the unspecified causer in the achievement case), the underlying meaning is retained: the cup is broken in many pieces. Whether that difference is relevant is a question for context.

Figure 1. Is shattered an adjective or a verb?

Summary

In summary, NLU is based on meanings, and the best human source of word meanings is a dictionary. Storing definitions as associations (the relations between the predicate and its allowable arguments like actor and undergoer) allows automatic recognition based on them.

Real NLU retains the meanings of the words as sets — in line with Patom theory. This allows us to create a dictionary that is automatically disambiguated based on the sentence in context.

(next — we continue this exploration with how Patom theory enables language acquisition without ‘processing’ to achieve the end result: an unambiguous dictionary based on meaning sufficient to pass the Facebook AI Research conversational tests and beyond.)

[i] John Ball, Speaking Artificial Intelligence, Chapter 6, 2016.

[ii] Fellbaum (ed.), WordNet: An Electronic Lexical Database, The MIT Press, 1998, P xvii.

[iii] John Ball, Miller’s WordNet gives A.I. knowledge, 2nd from last paragraph, 2015, https://www.computerworld.com/article/2935578/emerging-technology/miller-s-wordnet-paves-the-way-for-a-i.html

[iv] In RRG, an operator is an element of meaning that affects the sentence but isn’t a constituent of any element. Tense, illocutionary force and aspect elements are examples of operators which simplify the sentence pattern to allow easier disambiguation.

[v] Beatrice Santorini, Part-of-Speech Tagging Guidelines for the Penn Treebank Project, June 1990.

[vi] Michael C. McCord, IBM Research Report: Using Slot Grammar, March 24, 2010, P 12–14.

--

--

John Ball
Pat Inc

I'm a cognitive scientist working on NLU (Natural Language Understanding) systems based on RRG (Role and Reference Grammar). A mouthful, I know!