The 3 Dimensions of Human Communication for Conversational Systems

Published in

Speaking Artificially

3 min readDec 8, 2022

“55% of communication is body language, 38% is the tone of voice, and 7% is the actual words spoken.” — Is Nonverbal Communication a Numbers Game? | Psychology Today

Nearly 93% of all human communication is considered non-verbal, embedded within our facial expressions, posture, body position, voice intonation, eye contact, gestures, and more. Considering the present-day Conversational AI and cognitive computing markets which seek to build human-to-machine interactions in a dialog-style manner, there is an argument to be made that technology organizations are simply scratching the surface of interactivity between humans and AI based systems.

A framework to consider is communication as vectors:

(1) First dimensionality communication (text) — text-based communication can be considered the most primitive when examining communication between humans in a virtual setting as well as human-to-machine interactions. Imagine a scenario which most people experience daily, texting. Rhetorically speaking, how often is context misinterpreted and/or assumed incorrectly, sparking unnecessary conflict? Yes emojis exist but that is a short-term fix. Text-based communication removes any undertone that contextualizes the diction used and misses a large aspect of dialog-based interactivity.

(2) Second dimensionality communication (Voice) — voice adds an additional layer above text-based communication, allowing for variations in our pitch, timbre, tone, and vocal speed. As a basic example, ending one’s sentence with a higher voice inflection almost always implies a question is being asked, and we as humans innately interpret and recognize that. Built upon that example is vocal tone to empathize with humans, a vital element to communication. Take a scenario whereby someone is delivering unfavorable news to another. In response to receiving such news, one might respond in a more sympathetic tone of voice, characterized as being low and somber with a slower rate of speech; in contrast with an excited response which is higher pitched and faster paced. Translating that scenario to human-to-machine interaction, that can be achieved through primitive sentiment analysis derived from speech-to-text conversion and then mapped to an appropriate built-in speaking style for an artificially synthesized speech response via a text-to-speech system, however, I’d argue that few voice bots take that into consideration today. As a Product Manager who manages text-to-speech systems, this is an aspect I push for when consulting my customers.

(3) Third Dimensionality Communication (Non-verbal) — the final layer of communication exists in the 53% done through body language. Rhetorically speaking, have you ever made a comment that offended another individual? Did that individual form a shocked look on their face, increased their distance, and changed their posture? Those vital cues are essential for human interaction. Barring the fact that some individuals are unable to interpret non-verbal cues (more in a future post), the inability to analyse such posture in traditional Conversational systems leaves much to be improived upon when dealing with human-to-machine interactions. While traditional mediums today do not account for non-verbal communication — being on chat based systems or voice bots — the proliferation of video-based conferencing and video generation could easily fill this void. What is required is a new segment of Conversational AI to take video into consideration for human-to-machine interaction, maybe with virtual avatars, to start and capture this level of complexity. Furthermore, this can only be powered by datasets that allow for the training of such systems to recognize non-verbal communication within videos, likely starting with facial expressions and moving to full-body interpretation alongside a cultural shift to allow for that type of video calling.

The above framework is inspired by an organization I recently (organically) discovered — Hume.ai, an organization providing models and raw data to fuel the next generation of human-to-human and human-to-machine interaction with capabilities such as facial expression analysis, expressive language analysis, and more. I am personally inspire by the work that Hume is undertaking and look forward to the exciting developments that come from Hume’s scientific research.

The bottom line herein is that human interaction is complex and our journey into Conversational AI is nearly beginning. While many facets of the market are mature, others remain to be discovered and productized, and I personally look forward to watching tht unfold.

The 3 Dimensions of Human Communication for Conversational Systems

Written by Sam Bobo