A scientific hypothesis starts the process of scientific enquiry. False hypotheses can start the path to disaster, as was seen with the geocentric model of the ‘universe’ in which heavenly bodies moved in circular orbits. It became heresy to suggest that orbits aren’t circular around the stationary earth, leading to epicycles. It’s a good story worth studying in school to appreciate how a hypothesis is critical to validating science.
(You can watch the companion video, if you’d like, here on YouTube)
Here’s an important hypothesis: “The fundamental aim in the linguistic analysis of a language L is to separate the grammatical sequences which are the sentences of L from the ungrammatical sequences which are not sentences of L and to study the structure of the grammatical sequences.”
This hypothesis has polarised the last 61 years of NLP. It comes from Noam Chomsky in Syntactic Structures, 1957, and it suggests that meaning is not part of the fundamental aim.
As Albert Einstein is quoted: “Everything should be made as simple as possible, but not simpler.” The model of language without meaning (or pragmatics) is too simple and has led to an explosion of “epicycles” in linguistics, leading to the field of computational linguistics in particular.
Removing syntax as the focus of human language analysis, and adding in meaning, is the first step needed for NLU. The new hypothesis is: “The fundamental aim in the linguistic analysis of a language L is to study the meaningful words and phrases which comprise the sentences of L, to study their language-independent representation, and their meaningful interactions in conversation.” This somewhat restates the formal model in functional terms to align with the RRG triangle[i] (syntax– semantics–discourse-pragmatics).
We should understand why “me go city” is an English sentence (from a non-native speaker) as well as the words of a two-year-old like[ii] “I got horn.” And “That busy bulldozer truck.” If we can understand it, surely it is a part of our language.
Breaking Down the Syntactic Hypothesis
One of my advisers told Professor Noam Chomsky that my company, Pat Inc., was building a language system that didn’t parse or use parts of speech. His answer was along the lines of: “They sound deluded, like Trump supporters.” Our approach clearly is not in the main stream who follow Chomsky. Why is there such a difference?
While Chomsky demolished Skinner’s behaviourist theory of language[iii], he inherited Bloomfield’s idea that grammar could be studied independently to meaning. Whether this is a behaviourist compromise or not, leaving out meaning makes linguistics too simple.
It’s now 2018, and the basis for formal linguistics still excludes meaning[iv]. Breaking this dominant paradigm, which traces back to Chomsky’s early hypothesis, will no doubt seem obvious by the end of this series, as even language acquisition becomes a consequence of access to meaning.
To comply with Patom theory, language should be bidirectional. When I say something, it should mean the same thing as when you say it. Similarly, if you have the grammatical representation for an English sentence, it should always generate English, but the syntax-only models are not bidirectional because they are too simple.
You can see the comparison with the famous, arguably grammatical sentence “colorless green ideas sleep furiously” being broken down but then to generate the non-grammatical “running running runnings run furiously”. These are both sentences following the phrase pattern — adj adj noun-plural verb adverb. You can’t study the syntax of grammatical sentences on the one hand, when the other hand creates non-grammatical sentences from that syntax[v].
What is missing? My hypothesis is that unless the syntax is mapping to a representation of meaning, the result won’t be bidirectional. The part-of-speech model is just too simplistic (I will explain why in detail another time).
Modelling the Combinatorial Language System
Human language is a combinatorial system: it allows things to combine. The syntactic model says we can swap elements with the same grammatical category: meaning may be lost, but grammaticality remains intact. That appears too simplistic as the following table shows:
In this combinatorial system, the end result of substitutions into meaningful sentences with equivalent grammatical forms can result in meaningless sentences. Worse, it can result in ungrammatical sentences which means the system is not bidirectional.
In the table above, some examples that have a meaningless result have predicates (words that relate referents) swapped for referents.
With effort, we can imagine a context in which the meaningless cases become meaningful (even pink elephants fly in cartoons) but that’s a feature of context.
Meaningful sentences some call ungrammatical, which easily become grammatical with minor word order changes, highlight the problem of syntax-focussed analysis. Context allows for all manner of weird cases to be allowed. We can’t fight it since that’s what languages allow, driven by a brain that excels in pattern recognition within the scope of our universe.
Given a sequence of words in a language, a native speaker can identify the meanings of the words as things the words refer to (referents) and things that relate referents (predicates). The referent/predicate distinction is important[viii], as it is the fundamental meaning element distinction.
RRG defines predicate relations with roles[ix]. Predicates like ‘eating’ relate the eater (actor) with the eatee (undergoer). Eaters eat eatees. Referents like ‘cats’ refer to a kind of animal (in one sense). So, given a valid sentence, a native speaker breaks the elements into referents and predicates using meaning. In the future we will look at why referents and predicates are attributes of the definition (signified), not of the word (sign).
What if we look at meaning as the test of a language’s phrase, instead of its grammaticality (which fails the bidirectional test)? Can we find a general principle to apply to meaning to retain meaningful sentences with substitutions?
Let’s assume that a referent’s hypernym relation (is-a) always retains meaningful sentences upon substitution. What happens when we substitute along this line?
Figure 1. Hypernym (is-a) example. There is no need for an ‘idealized’ model.
Given the relations between the referents, the predicate relations can now extend the combinations as a dictionary definition (one that includes the displayed hypernym link) shows.
Figure 2. Predicate ‘eat’ simplified to show animals eat food substitutions
With the predicate connecting two levels of referent, valid, meaningful sentences follow by substituting any referents higher in the network. (This network is intentionally simplified as not all animals eat the same food.) In other words, reality is more granular, as a brain learns through experience only. Specific experience effectively defines general patterns over time. The principle remains for all predicates and is our working hypothesis.
In the next table, we take a meaningful sample sentence, and then substitute for referents with others in the same hypernym list. Meaning is retained for the sentences this way.
Next, we can substitute the predicate for another that is shared, except with a different manner. We can also substitute for any members of the set of predicates that the sample one entails.
With the action, there can be manner relations: (a) limping is a manner of walking, (b) sprinting is a manner of running and entailing relations like: (a) eating entails biting, chewing and swallowing, (b) talking entails breathing and moving one’s mouth. Let’s substitute here as well to see that meaning is retained.
(next — the grammatical elements underpinning language are exposed as highly ambiguous and therefore causing phrase duplication and inaccuracy. The meaning-based model is compared.)
[i] Robert D. Van Valin, Jr., Exploring the Syntax-Semantics Interface, Cambridge University Press, 2005, P 1–2.
[ii] Steven Pinker, The Language Instinct, HarperPerennial ModernClassics, 1994, P 273.
[iii] Noam Chomsky, Language, 35, №1, A Review of B. F. Skinner’s Verbal Behavior, 1959, P 26–58.
[iv] Alexander Clark et al. (ed.), The Handbook of Computational Linguistics and Natural Language Processing, 2013, P 29–34, shows grammar rules applying to generate parse trees without meaning as being integral to Formal Language Theory.
[vi] Noam Chomsky, Syntactic Structures, Mouton Publishers, Paris, 1957, P 15–17.
[vii] Daniel Jurafsky, Cognitive Science 20. A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, 1996, P 182. Here Jurafsky, a highly respected computational linguist and professor at Stanford, explains that “The woman fell down that was attractive” is “not at all grammatical”. When looking at grammaticality in isolation, should we block a parser from recognizing such sentences? How would this work for NLU in general?
[viii] Robert D. Van Valin, Jr. (Ed). SLCS Volume 105. Investigations of the Syntax-Semantics-Pragmatics Interface. John Benjamins Publishing Company, 2008, P 165.
[ix] Robert D. Van Valin, Jr., Exploring the Syntax-Semantics Interface, Cambridge University Press, 2005, P 53–67.