Nuanced Language in AI and the Complexity of Human Communication

Sam Bobo
Speaking Artificially
6 min readAug 29, 2023
Imagined by Bing Image Creator — Powered by DALL-E

Communication is difficult, even within one’s own native tung! Whether a casual conversation, formal document, and everything in between spanning written and verbal modalities alike (including non-verbal gestures), finding the correct words and phrases to say can be difficult.

Attorneys spend hours crafting language that is precise, non-ambiguous, and binding; storytellers use hyperbole and metaphors to paint pictures in your mind and draw similarities to daily life and experiences. Words inherently have power over others and meaning imbued within them. Synonymous words can carry different meanings depending on the context or within the meaning of the words themselves.

Reality is, language is nuanced and communication among humans can be complex. This is the long-standing challenge facing researchers and engineers in the Conversational AI space.

There is immense hype around Generative AI — the ability to large models a prompt and get a desired output, from a blog post, mortgage preapproval letter, marketing copy, and more. The biggest challenge, however, is the aforementioned nuance in our language. I’ve long argued that, while AI will certainly catalyze productivity with generated content, it’s always a first draft that solves the blank canvas problem. Humans should always curate thereafter, because after all, communication is a craft. Furthermore, this statement does not even begin to address translation capabilities among languages whereby capturing the nuance in one language and translating that to another (with the target language’s nuances) and the accuracies/inaccuracies that exist there.

So, what are some of the nuances of the English language that challenge modern Conversational AI engines today?

  1. Connotation and denotation: Connotation refers to the implied or emotional meaning of a word, while denotation refers to its literal or dictionary definition. For example, the word “home” has a positive connotation of warmth, comfort, and belonging, while the word “house” has a neutral denotation of a building where people live.
    Generative AI engines utilize probability and statistics to predict the next series of words. Depending on the underlying Foundation Model and transfer learning done on top, the model may not take into consideration careful diction and word choice that could have a significant emotional impact on the message being communicated to the intended end-audience.
  2. Register and tone: Register and tone are aspects of language that reflect the level of formality, attitude, and emotion of the speaker or writer. Register can be influenced by the context, purpose, and audience of the communication, while tone can be influenced by the mood, personality, and intention of the speaker or writer. For example, the register and tone of a formal letter to a professor would be different from those of an informal text message to a friend. Weighting systems and modality selections, such as formats (paragraph, email, blog post, ideas) or tones (professional, casual, enthusiastic, etc.) and temperature (creative, precise, etc.) introduced in Generative AI playgrounds help temper the outputs and handle register and tone quite well, which is a positive for the technology.
  3. Idioms and figurative language: Idioms and figurative language are expressions that use words in a non-literal or creative way to convey a message or idea. Idioms are fixed phrases that have a specific meaning in a certain culture or context, while figurative language uses figures of speech such as metaphors, similes, hyperbole, personification, etc. For example, the idiom “break a leg” means “good luck” in English, while the figurative language “her eyes sparkled like stars” means “her eyes were bright and lively”. Machine translation capabilities often fall short in “translating” the idiomatic meaning from one language to another. Aside from the one-to-one idiomatic translation, language experts and translators take their knowledge of the origin and target languages and associated creative liberties in shortening, expanding, or elaborating more when translating phrases that both traditional and Generative AI engines may fail to capture.
  4. Dialects and accents: Dialects and accents are variations of a language that are influenced by the geographic, social, or cultural background of the speakers. Dialects can differ in vocabulary, grammar, syntax, or pronunciation, while accents can differ in the sound, stress, or intonation of speech. For example, American English and British English are dialects of English that have different spellings, words, and expressions for some concepts, while American accent and British accent are accents of English that have different ways of pronouncing some letters or words. In intent-intelligent era systems such as traditional text-to-speech and speech-to-text systems, regional dialects variants of languages such as the aforementioned British and American English are available within language packs or data packs. One may argue, however, that speech-to-text systems may fall short depending on the thickness of an accent (I’ve had funny encounters when my grandpa, who had a thick southern accent, frequently could not use a voice assistant).
  5. Homonyms and homophones: Homonyms and homophones are words that sound alike but have different meanings or spellings. Homonyms are words that have the same spelling and pronunciation but different meanings, while homophones are words that have the same pronunciation but different spellings and meanings. For example, “bat” is a homonym that can mean either a flying mammal or a wooden stick used in sports, while “write” and “right” are homophones that have different spellings and meanings. Statistical models adequately address homonyms and homophones based on the surrounding context they are trained on and utilize when generating output but can often use additional tuning to get stressed phonemes correctly.
  6. Slang and jargon: Slang is informal language that is used by a specific group of people, while jargon is technical language that is used by a specific profession or field. For example, the slang term “cool” means “impressive” or “fashionable”, while the jargon term “CPU” means “central processing unit”. Typically, in traditional Conversational AI services, additional domain-level training is required to include and/or disambuguate technical slang and jargon within an application. Generative AI, using large foundation models, have a good understanding of technical jargen given its broad training data and can handle generating such fairly well.
  7. Euphemisms and dysphemisms: Euphemisms are mild or indirect expressions that replace harsh or unpleasant ones, while dysphemisms are harsh or offensive expressions that replace mild or pleasant ones. For example, the euphemism “passed away” means “died”, while the dysphemism “kicked the bucket” also means “died”.
  8. Hyperbole and understatement: Hyperbole is an exaggerated statement that is not meant to be taken literally, while understatement is a statement that downplays or minimizes something. For example, the hyperbole “I’m starving” means “I’m very hungry”, while the understatement “It’s a bit chilly” means “It’s very cold”.
  9. Irony and sarcasm: Irony is a contrast between what is expected and what actually happens, while sarcasm is a form of irony that expresses mockery or contempt. For example, the irony of saying “What a beautiful day” when it’s raining heavily, while the sarcasm of saying “Nice job” when someone makes a mistake. Often, causal bots may be programmed with “small talk” to include ironic and/or sarcastic language but is not common within formal applications.

In summary, nuances within the English language that involve phrases, non-literal meanings of words, and underlying meaning, while likely included within the outputs Generative models, arguably fall short given the statistical nature of the model text generation and should have human oversight and curation (likely a middleware layer prior to exposing the generated text to a user).

Humans possess emotion, complex thoughts, and sophisticated communicative vehicles to express those complex thoughts and emotions. Whether crafting a legal agreement that requires exact non-ambiguous language, defined words, and clear references, to a social blog post having a underlying satirical comments, require the curation by the originating author to ensure the maximum communicative effect of the words and phrases either verbally uttered or written in text. Conversational AI capabilities that model natural language and convert among modalities is extremely valuable technology but any practitioner should have a clear understanding of its strengths and weaknesses.

Should you be interested in reading more of my related blog posts, I encourage you to read:

--

--

Sam Bobo
Speaking Artificially

Product Manager of Artificial Intelligence, Conversational AI, and Enterprise Transformation | Former IBM Watson | https://www.linkedin.com/in/sambobo/