Series Introduction: Can Machines Talk?
This is the start of Pat’s series on NLU. You can see the companion video on YouTube at PatSeriesIntroVideo.
I propose to consider the question, ‘Can machines talk?’ While an unpopular topic at dinner parties, it leads to interesting questions like, ‘Can machines translate?’ and the uncomplimentary ‘Why do machines suck at conversation so much?’
Nearly seventy years have passed since the great Alan Turing’s 1950 paper, Computing Machinery and Intelligence, so it is high time to reflect on our lack of progress to implement the middleware for him, being a machine’s conversational interface using human language. Then the real challenge of the imitation game can begin — playing verbal chess, incorrectly adding numbers, general lying and perhaps even talking about politics.
In fact, a lot of scientific progress has been made, but as the mainstream’s engineering excludes meaning from their systems today, human conversation is also excluded because conversation communicates with meaning. This series looks at solutions to conversation that are working in our lab and with private clients (i.e. the future’s “talking machines”) and explains what has held back mainstream progress to date.
Our best and brightest scientists and engineers have toiled for more than sixty years to provide natural language processing (NLP) without success — composed of natural language understanding (NLU) and generation (NLG).
To understand human language, mainstream engineering must parse sentences: converting (a) letters to words (and named entities), and then (b) words into trees of lexical categories (parts of speech or pos), and finally © the trees into meaning, resolving word senses.
Lack of progress in the mainstream comes back to hypotheses baked into today’s systems that impede human conversation. In short, lack of progress comes from excluding meaning — how our brains use language to communicate.
This series takes my experience as a cognitive scientist since the 1980s using my Patom (brain) theory to model human language. With more than a decade of trial and error experiments (you know, science), I know only too well what doesn’t work for NLU. Perhaps failure makes the best scientists? My chance discovery of a mature linguistic framework (Role and Reference Grammar or RRG) while travelling halfway around the world led to its integration into my lab’s software and the real step into the world of NLU.
Lack of scientific distribution for RRG delayed my progress by at least 5 years and caused me to question scientific education, although perhaps that’s just my fault for not searching hard enough. And if RRG had been more widely taught, NLU would possibly have been solved by now.
While the lack of publicity for competing scientific models slows us down, it doesn’t stop us. Even Boolean logic eventually became integral to computers once digital computers could be built. Although without doubt our education curriculum will change, I still wonder why RRG isn’t taught to language and AI students.
To rectify the problem of RRG being kept too quiet, we will cover some of its key aspects: our building blocks for the semantic world. While there is a lot to learn, the theory tends to be easy to understand for language speakers like us.
Recently, we demonstrated our system’s ability to outperform the Facebook AI Research team’s conversational tests as a benchmark[i]. We found errors in the machine learning input which results in undetected errors in its output. That’s another reason why I say that NLU holds the keys to conversation.
We will look at the mainstream’s experience using formal linguistic models and contrast that with our system based on functional linguistics with RRG. In short, the need to parse sentences as the first step to NLU is removed. The building blocks of the parser, parts of speech in a tree, are also eliminated and replaced by precise semantic elements that remove ambiguity and combinations.
The series also takes an in-depth look at the semantic network that drives the understanding process, resolving predicates and referents as a key disambiguation tool. That’s the core of our NLU engine. The essential role of context in conversation explains why sentence analysis alone is inadequate for chatbots and NLU in general. We investigate the issue of human language acquisition and hypothesize how to apply it to machines for the world’s languages. The method for a brain’s property of automatically decomposing the world is proposed and then systematically applied to the problems of artificial intelligence (AI) in NLP (because it’s central to Patom[ii] theory).
Throughout, we will use the extensive research behind RRG to illustrate the language-independent nature of our communications.
By looking at (a) NLU as a problem of decoding form (words and phrases) into meaning and (b) back again into words with NLG (the functional model), the horrific scale of the combinatorial explosion created by formal linguistics is removed.
Once again for emphasis: formal linguistics creates a combinatorial explosion when applied to human language — one that impedes mainstream progress. Science solves this with a hierarchical bidirectional model of linksets (Patom theory) as layers and sets come to the rescue. RRG simplifies the scaling path by identifying the building blocks used in the world’s languages, not just the most prolific languages. Let’s proceed to the explanation.
Main Stream Stagnation
If we think of science as a developing river system, a number of different ideas develop in rivulets. The cognitive sciences that formed on Sept 11, 1956[iii], incorporated linguistics and amongst the many ideas, two strong flows emerged. The larger scientific stream excluded the ‘unscientific’ mental states (the behaviourist model). Formal linguistics similarly excludes meaning. A smaller stream became focussed on communications with language — functional linguistics. Today, the main stream is like the rapids at Niagara Falls (formal linguistics), while the little rivulet flows quietly on the side containing functional models built and proven over decades.
For NLP, these functional models will soon flood to become torrents of white water because there is a demand for talking machines that cannot be satisfied by the meaningless systems of formal linguistics. I mean, how can you take meaningless signs and expect them to convey meaning without somehow adding meaning?
When I said that last sentence at a meeting in 2016, the professor I was meeting with stood up and tried to leave his own office to escape my heresy. This must be a sensitive topic! Subsequently a global IT executive explained to me that brains actually use the principles of distributional semantics to learn a language! Seriously.
“You shall know a word by the company it keeps!” went from an obvious point by Firth in 1957[iv] relating Wittgenstein’s work about meaning, to totally ignoring the underlying meaning of the word in favour of proximity. Does it really make sense in 2018 to take such an observation from 1957 and use it as the driver for a new field: distributional semantics? Why not NLU, instead?
Along these lines, formal linguistics considers a word to be a sequence of letters. That’s it. A word with that definition doesn’t represent anything as it is an arbitrary sign, so instead must be modelled with rules, annotated corpora, statistics with n-grams, and so on to add a limited form of meaning. My approach is to use a dictionary that can be machine-disambiguated by leveraging Patom theory. It works for people, so why not machines?
Today’s systems working with words (arbitrary signs) get some meaning from the writers of those words, who knew what they meant. By decoding their meaning, we will get to NLU, but just creating statistics and weights without meaning, we haven’t and won’t.
We will look in detail at our working architecture and how it divides the meaningless part (the sign[v]) from a couple of layers of meaning (the meaning associated with the word itself, and the meaning in the definition). This division allows the re-use of common features so they are defined only once but used many, many times. This re-use and decomposition introduce the concept of the pattern atom (Patom), the smallest unique unit in a brain that represents a specific pattern.
(next — we delve into the main stream hypotheses that diverted us from meaning)
[ii] Introduction to Patom Theory’s origins: John Ball, Machine Intelligence: The Death of Artificial Intelligence, Hired Pen Publishing, 2016.
[iii] Per George Miller quote, Howard Gardner, The Mind’s New Science: A History of the Cognitive Revolution, Basic Books, Inc., NY, 1985, P 28.
[iv] J. R. Firth, A synopsis of linguistic theory, Studies in linguistic analysis, Basil, Blackwell, Oxford, 1957, P 11.
[v] Arbitrary signs as words, per Ferdinand de Saussure, (Translated) Course in General Linguistics, Ed. Bally and Sechehaye, McGraw-Hill Book Company, 1915, P 65–68.