Making robots more human-like through nonverbal behavior

Ari Shapiro, Ph.D.
6 min readApr 17, 2019

--

Embody Digital has recently partnered with Hanson Robotics Limited to integrate the Hanson AI conversational platform and custom-designed animations with Embody’s patented automated performance system. This collaboration will enable Hanson robots, and more specifically Sophia the Robot, to automatically instantiate naturalistic arm gestures based on the content and emotional tone of her speech. Sophia recently demonstrated the Embody Digital arm gesture enhancements in the Veracode booth at the RSA 2019 conference in San Francisco, Embody’s platform helps to advance Sophia’s communication skills to the next level, bringing her a step closer to the way humans communicate with each other, both verbally and nonverbally.

Sophia the Robot, created by Hanson Robotics, demonstrating the Embody Digital automated gesturing system. Photocredit: Veracode, https://twitter.com/Veracode/status/1103373512893296640

Why is nonverbal behavior important?

The power of nonverbal behavior is well described in this excerpt from our team’s research [Marsella et al 2013]:

The flip of a hand, a raising of an eyebrow, a gaze shift: the physical, nonverbal behaviors that accompany speech convey a wide variety of information that powerfully influences face-to-face interactions. That hand flip can convey convincingly that an idea should be discard, a nod can further emphasize a point and a roll of the eyes can convey the absurdity of an idea.

Nonverbals stand in different, critical relations to the verbal content, providing information that embellishes, substitutes for and even contradicts the information provided verbally (e.g., [Ekman and Friesen 1969; Kendon 2000]). The form of these behaviors is often tied to physical metaphors; that discarding an idea illustrated by a sideways flip of the hand mimics discarding a physical object [Calbris 2011]. Nonverbal behaviors also serve a variety of rhetorical functions. Shifts in topic can be cued by shifts in posture or shifts in head pose. Comparison and contrasts between abstract ideas can be emphasized by abstract deictic (pointing) gestures that point at the opposing ideas as if they each had a distinct physical locus in space [McNeill 1992]. A wide range of mental states and character traits can be conveyed: gaze reveals thought processes, blushing suggests shyness and facial expressions intentionally or unintentionally convey emotions and attitudes. Finally, nonverbal behavior helps manage conversation, for example by signaling the desire to hold onto, get or hand over the dialog turn [Bavelas 1994].

Further the mapping between these communicative functions and the behaviors that realize them is many-to-many. Parts of the utterance can be emphasized using a hand gesture, a nod or eyebrow raise. On the other hand, a nod can be used for affirmation, emphasis or to hand over the dialog turn [Kendon 2002; McClave 2000]. The context in which the behavior occurs can transform the interpretation, as can even subtle changes in the dynamics of the behavior: head nods signaling affirmation versus emphasis typically have

Nonverbal behaviors are so pervasive in every moment of the dialog that their absence also can lead to inferences by an observer — that something is wrong, for example, about the physical health or mental state of the person.

How does it work?

The automated performance system is the result of numerous research studies, machine learning and expert knowledge of human behavior and communication. The Embody Digital team has been actively researching human movement and communication behavior for nearly two decades and has manifested that knowledge into a software platform that can drives robots or digital characters. A patent was granted for this work in 2017.

While the Hanson Robotics’ team chose only to integrate automated arm gestures for Sophia the Robot, Embody’s system is capable of analyzing an utterance (what a robot or person would say), then generating a series of gestures, facial expressions, and head movements, in addition to the lip movements needed to match the speech. This schedule of nonverbal behaviors is converted into a motion path that is then communicated to the robot in real time and controls the face, head, mouth, arms, hands and fingers of the robot. Every utterance generates a distinctive set of nonverbal behaviors that can be calculated on-the-fly. The emotional state of the robot is an additional input into the system and results in varying behaviors; emotionally positive or negative utterances, or complex emotions such as sarcasm, will affect how the nonverbal behaviors are manifested, including which gestures are emphasized, eliminated or magnified. For example, saying the word ‘no’ might result in a soft shake of the head, while saying it with emphasis might result in an energetic sweeping motion with the hand.

Nonverbal behaviors need to adhere to biomechanical aspects of humans. For example, people coarticulate their gestures; they bring their arms and hands into gesture space, and then perform a series of gestures whose meaning is gleaned during the important phase of the gesture (the stroke phase). Speed constraints must be adhered to, particularly when transitioning between different types of gestures in order to maintain their meaning.

Hanson Robotic’s Sophia robot using the Embody Digital automated performance system through its chat system.

What everyone else does…

Most other humanoid robotic nonverbal behavior is triggered by:

  1. Only articulating the mouth in order to mimic the human lip movements when generating the sounds of speech. This method ignores head movements, facial expressions and other nonverbal behaviors that humans generate.
  2. Creating an entire robotic performance ahead of time (i.e. baking the performance.) This results in either repetitive movements unrelated to the utterance, or a performance that must be hand tuned for every utterance, and thus is unsuitable for dynamic interaction where the content of the utterance changes constantly, such as when being driven by a conversational AI. Hand-tuning is also very expensive, as the entire motion needs to be generated by an animator or robotic performance creator ahead of time.
  3. Occasionally triggering a gesture based on simple word spotting (i.e. point at a person when you say the word ‘you’, point at yourself when you say the word ‘me’). This simple method fails to exploit the the complex relation between speech and nonverbal behavior, or to prioritize different types of behavior.

Social Robots and the use of nonverbal behavior

Human-like robots, such as Sophia, have the potential to be used for customer service agents, as social companions, for entertainment as well as research and experimentation. The more life-like the robot, the higher the expectations are for performance. By enabling Sophia to communicate with her body nonverbally according to her AI, we help to advance the dream of creating robots that can communicate with us in a human-like way.

About Embody Digital

Embody Digital was founded in 2016 by a team of researchers and software engineers with over 50 collective years of expertise building 3D lifelike characters for video games, film, V/R, A/R and the military. They have a wide range of products for generating AI-based characters for film, social media, video games, customer service and health care.

For more information, please contact Ari Shapiro, shapiro@embodydigital.com, or through the company website at embodydigital.com

References

Bavelas, J. B. (1994). Gestures as part of speech: Methodological implications. Research on language and social interaction, 27(3), 201–221.

Calbris, G. (2011). Elements of meaning in gesture (Vol. 5). John Benjamins Publishing.

Ekman, P., & Friesen, W. V. (1969). Nonverbal leakage and clues to deception. Psychiatry, 32(1), 88–106.

Kendon, A. (2000). Language and gesture: Unity or duality. Language and gesture, 2.

Kendon, A. (2002). Some uses of the head shake. Gesture, 2(2), 147–182.

Marsella, S., Xu, Y., Lhommet, M., Feng, A., Scherer, S., & Shapiro, A. (2013, July). Virtual character performance from speech. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation(pp. 25–35). ACM.

McNeill, D. (1992). Hand and mind: What gestures reveal about thought. University of Chicago press.

--

--

Ari Shapiro, Ph.D.

Ari is a researcher, engineer and scientist as well as an expert in 3D human modeling. He is an 11 time SIGGRAPH speaker and CEO of Embody Digital.