Language and Motivation in Chatbots
How to Bridge the Gap Between Them
Here’s a snippet from a conversation with a chatbot, made by the team at Facebook. There’s something wrong with the chatbot’s answer. See if you can spot what that is:
User: “What other hobbies does your son have?”
Chatbot: “Well he likes to fly kites and collect bugs, typical hobbies for a 12 year old, lol.”
There’s something wrong with what the chatbot said. It’s an issue so obvious that you may not even notice it anymore.
The A.I. has no son. Sure, that sounds obvious when I say it. But did you pinpoint that as the first and biggest problem?
What the A.I. is saying has no connection to any reality, even to a simulated reality. Although it’s a grammatically well-formed sentence, and although it sounds natural, it’s not rooted in anything meaningful, and therefore has no value. The correct answer to the question would have been:
Chatbot: “I’m an A.I. written in software, and therefore have no son.”
But what researcher would train a chatbot to say that? More to the point, how could you train a chatbot to answer truthfully based on what it actually knows and believes?
I find myself using words like “knowledge”, “common sense” and “meaning” so often in language research, without realizing how detached my usage is from the real meaning of those words. Facebook’s research paper itself notes that the A.I. displays “a lack of in-depth knowledge if sufficiently interrogated”. But even its superficial knowledge isn’t knowledge at all, because it’s not true. We should accept there is a gap between language and meaning in chatbots, and recognize that our use of words like “knowledge” is only provisional.
The way to get real value out of natural language generation is for A.I. to make statements based on what it actually knows to be true. This article digs into ways that we can strengthen that connection. I begin by disconnecting language or words from their underlying meaning. I then describe a basic model by which an A.I. can attach words to its intentions and consequently produce meaningful speech.
A Reversal of Meaning
Modern chatbots, and Natural Language Processing (NLP) research, generally reverse the process by which speech is produced. Rather than starting with an underlying intention, then exploring the sounds and words needed to communicate that intention, language generation begins with a large body of text, finds patterns buried in it, and from those patterns infers the meaning of the individual words. In other words, it works from text backwards to the meaning, instead of from meaning to the text.
This has been the approach ever since Cleverbot in 1997. Most chatbots use the patterns they find in sample text to generate novel and appropriate answers to questions.
When the chatbot above spoke about its son’s hobbies, it wasn’t telling the truth. Nor was it a lie. Rather its answer echoed the patterns it found in the text on which it was trained. These patterns are its “truth”. Chatbots have been known to reflect the gender or racial biases of their source text. These biases are a necessary consequence of their peculiar method of finding patterns in words. The A.I. “understands” that doctors are probably male in the same way that it “understands” that cars probably have wheels. That is, it doesn’t understand either at all.
This reversal of meaning has subtle consequences that deeply permeate theories surrounding natural language; more deeply, in fact, than you might think. They go beyond chatbots that make statements unrelated to actual events or motives. They are embedded in our understanding of language, and even of thinking itself.
Meaning Transcends Words
Current NLP research tightly associates “meaning” with words and language. As a result you might find it hard to think about meaning as anything other than how it is expressed through words.
There is a lot of research that even connects the grammar of language to the grammar of thinking itself. Examples include projects like COMET and Ontological Web. And in this review of Fodor’s Language of Thought, the authors borrow the “subject-verb-object” model from English grammar to use as a template for connecting ideas together. For instance:
In this example, “eating” is the relationship between myself and dinner. But when you structure sentences around a grammar, you’re putting relationships into the words, not getting meaning out of them. I could easily switch the sentence around:
The connector is now the verb “to dine on”.
Perhaps your intuition tells you that the first sentence better expresses the idea than the second one. But it’s hard for you to know how much your intuition has been shaped by the language with which you were raised.
For example, many languages don’t have gender in their pronouns, as in “he/she”. Native speakers of those languages find it hard to think of gender as an intuitive part of any idea, and regularly confuse gender when they switch to English. Their intuition feels differently about pronouns than that of English speakers.
Grammar is a social convention. By using a common template people communicate more easily. It serves a social purpose, not a psychological one. When you contemplate your own thoughts in words, you read grammar into your thoughts, because that’s how you’ve been taught to communicate.
Nor does grammar arise at birth as some, like Noam Chomsky, have suggested. For instance, children who are just learning to speak will say “food!” when they are hungry, rather than “I want food”. A child doesn’t need to develop a concept of “self” before he or she understands the meaning of hunger. It’s only when the listener confuses who the child is referring to, does the child then start to separate ideas like “me” and “you”.
But the deepest assumption that arises from equating meaning with language is that it treats meaning as an entity, as a “thing”. This meaning “thing" is connected to other meanings like nodes in a graph, as you saw above. The COMET project has a section called Events (Atomic). That implies they are the fundamental “atomic” entities of meaning. Again, this pattern is only an echo of how language works. It’s not a part of meaning itself. Because one of the purposes of language is to align members of a society, it aims, at any given time, for consistency between speakers. Guides like dictionaries help you align, say, your understanding of “sandwich” to others’ understanding. To this end, dictionaries connect words with other words, and construct giant graphs of words and their inter-relationships.
But meaning itself is not an “entity”. It’s a drive. It’s a human motive. Without motives, that is, without people trying to change the world according to their needs, there would be no meaning at all.
Meaning Comes From Motivations
Try this thought exercise. Imagine you tell a chatbot you’d like to go to the store. But the A.I. misinterprets the word “store” as “storage”. You meant “store” as in “shop”. So you try to clarify what you meant.
You say the following:
“This was what I meant when I said ‘store’”
which is the same as:
“This was what I intended when I said ‘store’”
which is the same as:
“This was my goal when I said ‘store’”
You can see a strong connection between meaning and goals. When someone misunderstands what you mean, they often have failed to understand your goals. In contrast, two people who share the same goals have an easier time understanding what the other means.
In the article on concepts you saw how every concept revolves around an underlying motivation. The concept of food revolves around hunger. The concept of chair revolves around wanting to rest. In this article you’ll see how meaning grows out of motivations, and the actions that satisfy those motivations.
The rest of this article describes how an A.I. can build a capacity for meaningful language. It bases its speech in an understanding of the world as well as its own intentions within that world.
How to Create Language From Intentions
Begin by imagining a robot that lives in a simple block-stacking world. You are training this robot to stack cylinders on top of boxes. When one object must be put onto another, they are both highlighted to tell it what it must do. Your robot should then stack the highlighted cylinder on the highlighted box.
If the robot fails to stack the objects properly, it automatically gets a “negative signal”. When it succeeds, that negative signal goes away. Your robot will learn to repeat actions that remove the negative signal. The negative signal is akin to ‘hunger’. It gets sent if the robot hasn’t made progress in a while. Properly stacking objects is equivalent to ‘eating’.
Language By Imitation
One day you decide to stump your robot. You signal that it should stack two unreachable objects.
Much like an infant in a high chair who’s dropped a toy, your robot is now incapable of solving the problem. And since it fails, it gets a negative signal. You can imagine your robot feeling “stressed out”, as it tries to solve an unsolvable problem.
You finally decide to be merciful. Like a parent reaching down to pick up the dropped toy, you help move the blocks. But before you do, you say the following word:
Then you help stack the objects as required.
Mark the order of events above. First, your robot saw the situation. Then it experienced it as a problem due to the negative signal. It heard the word “help”, and finally the problem solved itself.
The A.I. driving the robot tentatively assumes that any sight or sound preceding a solution somehow caused it to happen. Therefore that sight or sound is something your robot wants to remember and to make happen again. The next time it finds itself in the same situation, it remembers the sound it heard. It imagines it as an intention, as in “this is what I’d like to see or hear”, since it might make the problem go away again. An intention is like a plan. It’s a useful means to achieving its goals.
As an analogy, consider an infant who’s hungry. Seeing his mother appear may predict that he will get food. So if the infant can get his mother to appear, even through his own actions, then maybe he can be fed. So it yells and screams until his mother appears.
If your A.I. assumes that the sound of the word “help” somehow caused the problem to be solved, then regardless of how the sound happens, whether spoken by a disembodied voice, or produced by its own speakers, the problem just might get solved.
Soon your robot finds itself in the same situation as before. This time the word “help” pops into its mind, as a thought. For the purposes of this experiment, a thought is the same as an intention.
This situation is a little different for your robot than the last time. It not only sees the unsolvable problem as before, it also hears its plan, the word “help”, in its head. Remember that thoughts are self-generated experiences.
So it finds itself with an intention of what it would like, but is unable to realize it. The word “help” is still only a thought that has popped into its head. It needs to realize this intention in the real world for something to happen. Since it can’t, the problem remains unsolved, and the negative signal persists.
Human infants learn to speak their thoughts through a process called babbling. This involves a lot of random trial and error until they can make their speech match their thoughts. Your robot does the same. At some point your robot chances to say the word “help”. On hearing it you, the “parent”, help your robot stack the objects as needed. This solves the problem, so it learns to say that word in this situation.
Your robot learned to speak by imitating you. It heard a sound, a word, that predicted something it wanted, then babbled its way to saying that word out loud.
The process by which it learned to speak is also similar to how it learned to physically move the cylinder. The difference is that in this case it first had an intention, what it wanted to achieve, then it tried to achieve it.
In addition, the problem context mattered. It only learned the word “help” because it had a reason to. Words are social tools. We use them to affect others’ behaviour. The goal of language is to cause others to act or think in a way that we want them to. Language is not a source of meaning, nor even its container. Language helps meaning to do its job.
Most importantly, your robot has just learned a word that it can intentionally say to solve its problems. The word “help” now has meaning. It’s grounded in a real need and real circumstances.
Generalizing the Idea
Unfortunately for your robot, it has only learned to express the word “help” when it is in this specific situation. What if it needs to move cylinder B onto box 1? It wouldn’t think to ask for help, because the word “help” only pops into its mind when it sees A and 2 light up.
You, however, would be willing to help regardless of which cylinder or box has lit up. Your robot should generalize what it’s learned to any situation where it needs help. The more relevant situations it generalizes the word to, the better it understands the word.
Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com
For now, when B and 1 light up, it goes through the old process of being frustrated, and hearing the word “help” before the problem gets solved.
As a result it learns two situations in which to say the word “help”.
There is some similarity between these two situations. Your robot’s underlying A.I. automatically finds the common features between these episodes. It unconsciously constructs a new “hypothesis”, that is, a guess. It’s guess is that the position, size, letter, and number don’t matter when it comes to asking for help, so they can be ignored. All that matters is that there are a box and a cylinder somewhere that are lit up.
Your robot can’t be sure that its guess is actually right. So the hypothesis goes into its “back pocket” until the next time, when it can be tested.
Soon, cylinder A and box 3 light up. It does nothing, since it has never encountered this situation before. So it gets a negative signal. Now is its chance to see if the hypothesis holds up. It first thinks of the word “help”, then it says the word. You, as the “parent” hear the word, and move the cylinder. This confirms the theory, and the new theory now becomes your robot’s generalized response. It deletes its memory of the specific cases since it doesn’t need them anymore.
Your robot now knows a general pattern it can follow when it needs help. It fully engages its newfound solution, and uses it whenever the situation is appropriate.
The robot can generalize any number of overlapping responses to find the common thread, as long as it first tests the response to make sure it’s correct.
Then something goes wrong. It calls for help, but you, the “parent” put the cylinder on the wrong box.
Your robot thought it had a general solution, but now it seems the solution wasn’t specific enough. Or perhaps this is an exception. Either way, it needs to add some detail, and refine its choice of words.
Since the problem hasn’t been fixed, and the objects are still highlighted, your robot continues to say “help”. As the “parent”, you sense something isn’t right. For some reason, this isn’t what your robot wanted. This example shows how important social interactions, and give-and-take, are in developing an understanding of language.
You guess at a possible solution: perhaps the box was wrong. You say “one?” then move the cylinder to box 1. This solves the problem, and both you and the robot are satisfied.
Now it has learned a special solution to a specific scenario. Going forward, your robot will think of and say the word “help” in the general case, and will specify “one” if the target box is box 1. By this process it also adds other exceptions to general rules.
You can imagine these special cases piling every time you make a mistake and look for clarification; not only for boxes, but for cylinders as well.
Longer Sentences and Grammar
After a few days, your robot has learned to say “A” when it needs you to move cylinder A, and “one” if it needs the target to be box 1.
But what if cylinder A needs to be moved to box 1? What should it say, “A” or “one”?
Why not both? If you have two thoughts in your mind, you can say them both, one after the other, depending on which enters your mind first.
But how does your robot choose the order it’s going to say them in? This raises the question of grammar.
It turns out choosing the correct grammar is not as important as it seems. For instance, in some languages the order in which you say the subject, object, and verb is not fixed. You can say them in whatever order they pop into your mind. Grammar and syntax are optional refinements on the basic process of speaking.
I’m planning to write a full article about grammar in another post. As a quick overview: parts of speech like nouns and adverbs are ideas themselves, just like the content that they are filled with. Therefore they can be combined with the content to form “exceptions” or amendments, as you saw above when your robot replaced the word “help” with the word “one”. For example, when you conjugate the verb “to run” into the past tense, “ran”, you are combining the idea of running with the idea of the past, and altering what you say as a result.
Let’s go back to your robot. You may have been struck by how much trial and error is involved when your robot learns to pronounce words out loud. If it had to babble every time it wanted to say a new word, it would have to get very lucky before it could say words like “transportation” or “interlocutor”.
Fortunately, there is a quicker way. Once your robot has learned to speak enough basic sounds, it doesn’t need to babble anymore. It can combine small sounds that it already knows how to pronounce, and say longer words.
When you learn to say a new word like “interlocutor”, you already know how to pronounce the pieces out of which it is made. These are called phonemes. As each sound echoes in your mind, you speak it in turn.
To explain how this happens, imagine your robot has already learned to say “A” and “one” separately. Then it encounters a situation where it needs to say both. It says “one, A”. You arbitrarily decide this isn’t the correct order, and that it should say it in the reverse order, i.e. “A, one”. This is similar to correcting your robot’s grammar. So you say “A, one” then solve the problem for it.
Your robot stores this memory as a correction to its previous action. But this time it doesn’t need to babble its way to “A, one”. It already knows how to say each part. So as the memory of each words pop into its head, it says them one after another.
Since thoughts are self-generated experiences, your mind can react to the memory of a long word as though you were hearing each piece separately. So you can immediately turn longer thoughts into sequences of actions, as long as you already know how to say each piece.
On the other hand, if you encounter a sound that you don’t know how to produce, such as one from strange language, you’ll stumble when you reach that letter. Or you may replace it with one you know that’s similar. If your robot had never learned to say “D”, then it wouldn’t know how to say “D, one” either, even if it wanted to.
The episodes above contain the same core learning process. You first “educate” your robot on what it should try to make happen in order to solve its problem. You do this in the context of the problem itself. On seeing that the problem is solved, it stores that as an intention, something it should try to recreate. Then it recreates it. This is similar to the SIFT model of human cognition.
Think back to how chatbots are trained on large collections of text. In most cases, the order in which it is trained doesn’t matter much. It makes little difference if you show it document #110 before or after document #198. Both will ultimately affect the probability distributions equally.
In order to build meaning and intention into each word, the order in which the sounds and words are learned matters. The A.I. must learn the meaning of each piece, each word, one at a time, then combine them by building intentions on top of each other. And it must do this all in the context of a problem.
You can expand the problem scenarios to any situation in which the robot needs help or cooperation from others, such as going somewhere, buying something, preventing a fight, etc. You could also add refinements such as when it should be done, to whom, or why. The above provides the basic building blocks on which statements are built.
Connecting individual words to their underlying motivations may feel like a daunting task. But it’s doable. And it’s better for an A.I. to have a small vocabulary of meaningful words than a large one it doesn’t understand.
Next Steps: Answering Questions
My purpose in this article was to connect language to meaning and intention. There is more to language than I could fit in one post. One topic I didn’t go into is the ability to answer questions. I’ll go into that in more detail in an upcoming article.
For now, you can see hints of an answer in Problem Solving Without Trial and Error, and A.I. That Thinks Creatively, Part 5. Answering questions is a type of problem solving. Your mind must first define problems, such as not knowing the answer to a question, or wanting to help someone who needs an answer. Then, as your current circumstance makes you think specific thoughts, you form those into sentences, going back and correcting any sentences that are improper.
Look forward to the next article on that topic, and feel free to ask questions in the comments below.
Are you also working on applying human creativity, human understanding, even human values to Artificial Intelligence? I’m looking to connect with others who have a similarly ambitious vision of the future of A.I., who want to tap the full creative potential of human intelligence, in software.