Supervised Learning and The Semantic Tower
Let’s do a thought experiment. Think of something outlandish that you have never experienced. Describe it to yourself, your wife, your kids (or in my case, my cat). I am 100% sure that you will be able to do so in words or concepts that you are familiar with. Now, think of any novel, movie, or class that you’ve never read, seen, or taken. You were never trained (a la “machine learning”) to understand these works. Yet, we are able to comprehend them. How is this possible?
To answer the above question, we must go back to what happens in early childhood. While it is true that our brains have an innate ability to learn, we also teach members of our species (intelligence teaching intelligence, billions of us). The key to the success of this process is the “primary caregiver”, the mother and other members of the immediate family, and later, social proxies. We, humans, have all been subjected (and continue to be subjected) to an intense continued process of supervised learning and reinforcement learning by other members of our species.
This process has an innate roadmap. You first learn words that are related to your direct internal and external experiences. These are experiences with people and objects, thoughts important to you as a child, “mama”, “papa”, “booboo”, “daddy”, etc.
Unlike our electronic counterparts, nobody is going to upload to you Newton’s “Philosophiae Naturalis Principia Mathematica” or Bertrand Russell and Alfred North Whitehead’s “Principia Mathematica” or Marcel Proust’s “Search for Lost Time” and ask you to do a vector representation, word embedding, similarity search or anything of the sort. Further, you will be given 15–20 years of supervised (and reinforced) training before you run the chance of being asked to say something intelligent about such texts. Nature has a head start of a couple of billion years, yet, this is the best that Nature came up with; intelligent entities raising other intelligent entities and creating social networks, culture, and ideologies to do such training.
If this is the best Nature can do with a couple of billion years of genetic evolution, who are we to think that in the short flicker of our existence we would so quickly do better?
Let’s return to the baby and his/her primary caretaker. Early on, the baby picks up simple words from Mom and other members of the family and very quickly learns about object permanence (time). Mom and other significant objects do not disappear when you don’t see them. Using its rudimentary internal system of symbols and language, Baby can think about things, people, objects that are not part of its immediate sensory vicinity and in the now (take that Generative Adversarial Neural Network!). Eventually, the child can sequence the system of symbols at her disposal to create new thoughts, new inner experiences that do not correspond to anything she has yet experienced or will ever experience.
So any emerging human intelligence seems to have the ability to generate new sequences of symbols (thoughts) using the symbol set at its disposal. Some of these thoughts may or may not correspond to what is happening in the here and now (or has happened in the past or will happen in the future). As a side effect of, and along with, these constructs, the child starts to fashion a concept of time and space, and that Mom is permanent in time. Mom does not stop existing when you don’t see her. Until the baby learns object permanence, she is going to panic and cry her lungs out when Mom disappears.
Universally, as we develop language, the human mind always starts with the same semantic primes and universals (see Anna Wierzbicka, Semantics: Primes and Universals ). No matter where we are on this planet and what language they speak, all cultures will develop the following palette of semantic primitives. (The author has a more extensive set, since her earlier work, but the original set will do just fine.) See table below.
To my computational linguistic brethren, please note the presence of stopwords in the list above.
Starting from this layer of semantic primitives, we can bootstrap ourselves to talk about anything. We can discuss things we have experienced, and as well as things of which we know very little. Let me illustrate with a short story. Before my then infant son learned to speak, he used to cry a lot. I noticed that when I walked into his room he would scream louder. This continued for some time until he started to develop a limited vocabulary. One night, I came to his room to console him. He was looking very angry and he told me, pointing to my bedroom, “Not you, the other one!” Amazed, I rushed back to my wife to tell her what just happened.
The astute reader will quickly notice the use of semantic primitives in my son’s utterance. Please note the inclusion of “stopwords”. Evolutionary biology at its apex is saying these constructs (semantic primitive and stopwords) are necessary (at least for all human language and mind constructs).
Based on my observation and that of billions of other parents, I propose that, for any machine to be claimed to be even an emerging intelligence, it must first be able to manipulate the above semantic primitives. It should be able to create more complex semantic structures using a semantic tower based on the above. The generated thoughts do not have to be true, but they have to make sense (like fiction or a story makes sense). Such thoughts may or may not correspond to an existing reality, past reality, or future reality. It should be able to distinguish which of its thoughts correspond to some external reality and which thoughts are totally its own fancy. Such intelligence should be endowed with primary caretakers to supervise its intellectual (and emotional?) growth based on social norms. And for the sake of human survival, all such intelligence should be hardwired with Isaac Asimov’s Three Laws of Robotics.
Let us revisit my initial question — how do you speak about something that you’ve never experienced before? You use your semantic tower plus analogy. Even if you can’t say much, you can always say “there is something, or a thing that …”. In philosophy, this approach would be considered a reification. I propose that without reification, intelligence cannot learn anything new. Let me illustrate with another story from watching my son grow up. I took my very young son to the park where he saw geese swimming in a pond. He got excited and yelled, “Look, Daddy! Swimming birds!” Thus my son was having a new experience “geese swimming on a pond”. He does not yet have the concept of geese/goose but he has the concepts of bird and swimming; so he explains his new experience by making a descriptive attribution, “swimming birds” to the observed entity. On another occasion, while I was making a music video website, my son saw a video that he liked and he pointed “Dad! Singing movies!” Please note in all these exchanges, both son and dad assume shared intentionality and the use of hand gesture (I will address these concepts in a future article). We expect that the interlocutor can share intent and observation. And we speak with hand and body gestures. Hand and body gestures are pre-verbal, and we can’t speak without them (even when we are on the phone alone and nobody can see us).
Some version of this experience with my son has been repeated billions of times since humans began to communicate with each other. All our names/nouns in any language show archeological traces of such early encounters in human history. “Arald” my first name literally means “leader of an army, heroic leader”. As the link relates, the name derives from the Ancient Germanic name “Hariwald”, composed of two elements: “harjaz” (army, army leader, commander, warrior) plus “waldaʐ” (ruler, might, mighty one, power, powerful one)”. I am definitely not an army commander; I don’t know what my father was thinking (Haitian teacher, reading Icelandic Saga in the 1950s)?
The name of my dear friend Ramu means “Lord Vishnu”. Most nouns (whether proper or improper) are actually description phrases. If I say “calculus” for example, most people will think of some very abstract and hard field of math. But what if I tell you in Latin, calculus means pebbles and the ancient Romans used pebbles to do addition and subtraction on a counting board. Now you understand better how Newton and Leibniz created their own semantic towers.
From the above, we have a way for artificial intelligence to be able to name things, even things that it has never encountered. Just as we humans do, machine intelligence can use descriptions to name things and concepts. Eventually, at the root of all these descriptions are primary semantics corresponding to basic experiences. With any artificial intelligence mimicking human intelligence, there has to be a set of semantic primes to which every described experience can be potentially reduced.
How can we apply this to real-world problems? While I cannot do a full-blown treatise on AI/ML her, I would like to share an example of how some of the ideas discussed above might be heuristically applied to the subject of an IT help desk chat encounter.
First, channeling Anna Wierzbicka, what is a (potential) list of semantic primitives for help desk encounters? What follows is a candidate list for the sake of discussion. Basically, when we call an IT helpdesk, we ask about the company’s catalog of internal services and assets that we use as employees. It is a very limited language; we ask for and about things and services in the catalog. We call to report that those things and services are broken, to get them fixed. We call to say we don’t want them anymore or we want to recover access to them, etc. The table below tries to come with a candidate list for a chatbot or other machine that we want to teach about such a language. I am not being complete here, just illustrative.
How can such a table be used? You can write a code (and I did) that can reduce the description field of a help desk encounter to a semantic encoding based on the above.
You can achieve this by leveraging regular expression transformation (I think of it as a sea of regular expressions). You can also use BERT clustering to feed your pipeline, e.g., cluster together all variations of utterances of “Requesting AD Password reset”, etc. Think of this approach as me teaching “Jack and Jill …” of help desk utterance to my linguistic model.
What do you gain from the above? If you put aside the clumsy symbols that I am using, these encodings look suspiciously like intents that you can use in a chatbot (NLU). With sufficient development, such a model may discover new intents from new sentences that it never saw before (remember the semantic primitives). Since one can map a wide variation of utterances to the same above intent, you have a new way to do classification.
Let’s say there are some 20,000 to 60,000 utterances that map to say 5–1000 of these intents(encodings) and an analyst wants to map them to the same class of issue, labeling becomes a much cheaper task. Label the encodings, and then you either train the model mapping the encoding to the label or by composition, mapping the original utterances to the label. See figure below.
Since I can reduce a long description field from a help desk interaction or an incident to such a primitive phrase, I have a way to do text summarization to extremely short text (1 to 5 sentences) where I can’t use frequency or statistics. Since this is basically a deterministic approach, you don’t have to worry about bad predictions of minority classes (except for the fact there may be a lot of them and you need time to treat them).
Each encoded phrase carries its own Q&A structure. A good student of compiler and interpreter construction can write an interpreter that can do the following. Let’s say you have a Help Desk interaction description “SDI123”,
‘I can’t log in to my laptop’
It is encoded as:
The helpdesk person asks: “Hey Chabot, what is the issue with Help Desk Interaction SDI123?”
Chatbot responds: “Caller cannot log in?”
Helpdesk person asks: “Caller cannot log in to what?”
Chatbot responds: “Caller cannot log in to his laptop”
What is the downside of this approach? No one gets something for nothing, not even for the BERT and the GPT-# models. (The last I heard, these models were using trillions of parameters, and we are just at the beginning of the growth curve).
You cannot use computational numerical algebra as a silver bullet. You need people with deep knowledge of languages, linguistics, semantics, and who are obsessed with simplification of a sentence to its bare essential semantic structure (just as a mother would do for her young child). What is the profile of such a computational linguistic “Mom” or primary caretaker?.
1) She must be able to write hundreds to thousands of regex expressions to simplify each description to its core primitive form.
2) She must love human languages and their grammar, not just algebraic spaces, differential calculus, and numerical methods (although the latter field can help a lot, we are not purists here).
3) As a corollary of item (2), she will go back to study the old linguistic masters like Tesniere, Saussure, Chomsky(the linguist in him), etc. In my pilot project above, I have implicitly leveraged Tesniere’s idea of the valency of verbs which can help in the resolution of semantic ambiguity and anaphora. Take what you can learn from the Ancients and be brutally practical — you want results, not ideological purity.
4) You must have a strong penchant for editorializing. In a way, what you are doing is editorializing to a semantic encoding (using regular expressions to the wazoo)!
5) Along with item (4), it helps if you prefer Emily Dickenson’s poetry and Ernest Hemingway and Jack London’s writing styles and as opposed to that of Marcel Proust’s. Then you are mentally wired for item (4).
6) Remember, you are not covering the whole spoken language. No human being does or can do that. You only need to cover what is relevant to the conversation of your professional village/tribe or your domestic village. How many distinct phrase bites do you use in a day? How many of them are semantically equivalent? Of course, you must be able to extend your vocabulary to new ideas and phrases as they come. Focus on core and be ready to extend as needed.
That’s what we humans do as an intelligent and very verbal race, we know how to bootstrap ourselves up the semantic ladder and so far it is working very well.