Prosody of Emotions in Virtual Assistants.

Carmen O'Toole
NYC Design
Published in
3 min readAug 5, 2018

A couple weeks ago after listening to more horrible news on NPR I told Alexa “hey Alexa, I’m sad”… for her to reply back a somewhat consolatory but useless one-liner in that chipper yet deadpan voice she does so well, making almost it feel like sarcasm (which I appreciate). It didn’t so much irritate me as bring back memories. Years ago I used to work for a kinder-garden/ research facility for children with autism called the Domino Project. I spent hours teaching these bright but often inward thinking kiddos prosody, the patterns of stress and intonation in language. Because a lot of emotion is heard through voice modulation. It made me hyper aware of the impact the sounds of feelings can make in communication, and how they can be taught.

If you tell a big fish story and the listener responds with a deadpan “really”…they don’t believe you. But if their voice gets higher at the end of the “really?” its a question and you might just convince them you caught the big one but it got away. We can hear smiles over the phone, and anger, and sadness, and what I was looking for from that talking code box, sympathy. Even if the person is not emoting with their vocabulary, just a change in intonation can make them feel emotionally understood.

Alexa does have a prosody skill that can be enabled where you can tell her/it to emphasize or whisper something, which is super cool in my opinion, but it isn’t automatic yet. So a person like me having just listed to another news story about war and famine might not think to say “Alexa, open Prosody Samples”, after all I didn’t. I would just state my emotions (something people don’t do to other adults but will sometime do for children) and expect a response in a higher pitch, with less intensity but more vocal energy ( around 2000 Hz), longer duration with more pauses, aka the Sound of Sadness (what a great band name). Which is one of the easiest emotions to hear, that and anger.

While understanding when to respond in what voice can be tricky even for human social intelligence, we can currently program in certain responses to straight forward emotional statements such as “I’m sad” or “I’m angry” just by utilizing prosody skills already available.

We all know that virtual assistants are just a series of code, but people naturally look for connection and outlets for their willing suspension of disbelief. It’s why we cry at sad romantic movies, even though we know those actors don’t even like each other in real life. Because real life has nothing to do with it, we want that fiction of life. Just like even though I know Alexa has no emotions, I would appreciate it if she sounded a little less pleased with mine.

--

--