
Natural Language Understanding
Theories from a Medical Student
Natural language processing and understanding (NLP/U) seems to be one of the fundamental barriers left before humans can have human-like robots to play with (and eventually become slave to). Without having done a rigorous review of the literature that exists in mathematics and computer science on how Siri or Alexa is able to process queries like “What is the weather tomorrow?”, I have developed a broad strokes approach as to how I imagine NLU to work.
First, all NLU algorithms must be language independent at their core. I am saying this because I probably fall into Noam Chomsky’s camp of Universal Grammar.
I believe that language can be broken down into a series of abstraction hierarchies that looks something like the following: varying wavelengths of light mixed with the single one dimensional time series of audio (forms of relatively continuous data) get processed by the brain into distinct objects that are converted into categorical data. For example, after looking at a picture of an apple on a table, my brain will process the different colors of light, their intensities, as well as any sounds that may be coming from them, and will turn those into categorical data. Our brains will then abstract that categorical data into what psychologists like to call prototypical representations. From a supervised learning perspective, our prototypical representations of the world will become asymptotically more precise with more data.
Language is a direct derivative of our brains’ ability to categorize the data around us. We will ascribe visual symbols (written language) and auditory symbols (spoken language) to those prototypical representations. After this basic index is made in our minds, we can develop slightly more complicated representation of the world: how the prototypical representations interact with one another, and how we as agents, can interact with them. We can create new abstractions of those relationships, and re-apply them when necessary.
Lastly, the most human aspect of language is something like comedy or metaphor, or the description of emotion in poetry. These stem from fundamental, hardcoded human qualities such as our need for food, and our desire to reproduce.
The objects in the world, and their relationships between themselves will both determine the value of our human drives. For example, apples don’t necessary sate hunger, but eating apples does. Thus [eating], which is an abstract relationship, and [apples], which is an abstract object, can come together to form the [satisfaction of hunger] abstraction. All of these can come together to form the classical sentence: “I ate an apple and felt better”.
I am, of course, skipping the step that is how grammar comes about. Because I think that grammar is arbitrary, and the symbols both written and auditory, are arbitrary too, I don’t feel it is necessary to think too hard about them.
With the above in mind, I shall pose the following question: Is it possible to translate one language to another without having some sort of reference?
I believe the answer is a hard no.
Therefore, can Siri or Alexa ever become truly competent without someone trying to painstakingly hardcode rules for each language and what different parts of speech refer to?
I believe the answer is also no.
My hypothesis: Complete NLU can only occur when the visual, the tactile, and the auditory environment are presented.
The actual NLU algorithm will simply be a neural network that will be able to turn the continuous data into varying levels of abstract categorical data. Language will be a natural by-product of this abstraction.
There is one condition to the hypothesis: For any artificial agent to claim to have NLU, it must also have some artificial manifestation of human drives such as hunger, thirst, pain etc.
Emotions, like words, may likely derive themselves from these basic human drives, but this is a hypothesis for another post ^_^.
It takes a human several years to see, hear, and feel enough to be able to develop a college level ability to use language. Perhaps, without running multiple sets of sensors in parallel, a machine may take a similar amount of time.
As a final side claim, which I shall return to: anaphora is simply the ability to consciously make queries within ones index of objects and their relationships.