Conversational Cookie Coach Avatar by Nestle

Amy Stapleton
Conversational Avatars
4 min readJan 6, 2022

Early last year, Nestle introduced Ruth, a fully conversational cookie coach avatar. Ruth is available from a web browser on desktop or mobile and her focus is on helping you bake delicious cookies.

In my previous article on the talking Santa created by Pandorabots and Rapport, I described the key components of a conversational avatar:

  • A convincing animated avatar
  • The ability to understand spoken language (or typed language as a fallback)
  • The ability to respond coherently to what the human is saying
  • A voice

Bret Kinsella of Voicebot.ai published a podcast with Orchid Bertelsen. At the time of interview, Orchid was head of digital strategy and innovation at Nestle USA. By listening to their conversation, I was able to understand the technologies used by the team at Nestle to power Ruth.

What Technologies Power Ruth the Cookie Coach?

Hopeful bakers can chat with Ruth to get suggestions on what kind of cookies to bake. They can also receive a step-by-step tutorial on how to bake the cookies. Ruth’s coaching includes spoken and written instructions, along with videos to illustrate each step.

Ruth is a fully conversational avatar with multimodal accessories! Let’s break down the various parts.

Step 1 — The Avatar

The first thing you notice when interacting with Ruth is that she looks almost human. Based on the information shared by Orchid Bertelsen in the Voicebot.ai podcast, we know Nestle partnered with the company Soul Machines for the creation of avatar.

Soul Machines specializes in photo-realistic avatars. While they seem to offer off-the-shelf characters, the avatar used to represent Ruth the cookie coach was a custom design.

Step 2 — The Ability to Understand Spoken Language

Ruth the Cookie Coach can understand what people are saying when they talk to her through their computer or smartphone microphone. While this step was not specifically addressed in the Voicebot.ai podcast episode, my assumption is that Google’s Speech-to-Text service is being used.

Google’s Speech-to-Text service can hear a human’s spoken audio and transcribe it into text. If you recall, the talking Santa we discussed previously worked in a similar fashion, except it used AWS Transcribe.

Step 3 — Knowing How to Respond

In order to converse with prospective cookie bakers, Ruth needs to understand what they’re saying. In the Voicebot.ai podcast, Orchid Bertelsen indicated that Ruth is powered by Dialogflow. Dialogflow is Google’s natural language processing platform, built using an Intent-based framework.

Google’s Speech-to-Text service transcribes the speaker’s audio into text and feeds it to Dialogflow. Dialogflow maps the utterances to pre-defined Intents. In the case of the Ruth the Cookie Coach, the context of the conversation is narrowly focused on baking cookies. Not only that, but the conversation flow is linear, with Ruth guiding the person through a series of questions, leading up to the suggestion of a few cookie recipes for them to choose from.

Since the conversation is focused on specific topic, one would hope that building the Dialogflow interaction model would be of limited complexity.

Step 4 — Speaking the Response

Obviously Ruth needs a voice in order to communicate with others. In the Voicebot.ai podcast, Orchid Bertelsen indicated that Ruth uses a voice from the Amazon Polly collection.

I have not been able to identify which Amazon Polly voice Ruth uses. (I have a feeling the voice might actually be a Google cloud TTS voice, but I have not been able to identify one of those voices that sounds exactly like Ruth either). So for right now, Ruth’s TTS voice remains a mystery.

If you happen to know what TTS voice Ruth uses, please leave a note in the comments.

What‘s it Like to Talk to Ruth the Cooke Coach?

Take a look at the video of my sample conversation with Ruth.

As you can see from the video, the interaction with Ruth is very smooth and relies heavily on multimodal features — in this case, visual representations of the various choices for the human baker.

Displaying Ruth’s questions in text also makes the cookie coach accessible to the hearing impaired.

Multimodal Choice Options

The use of multimodal features is very beneficial for this specific use case.

For example, Ruth needs to know what ingredients I have in my pantry in order to recommend a cookie recipe. Listing out a bunch of ingredients, and then asking me to say which ones I have, would not be a user-friendly experience.

By offering up visual choice options, or cues, Ruth also helps keep me on the linear track of our discussion.

Summary

Ruth the Cookie Coach by Nestle is another great example of an ambitious project using a fully conversational avatar. Ruth has a specific purpose — to help prospective cookie bakers — and she does her job well.

Let’s end with a quick comparison of the technologies used by Ruth and by the talking Santa from our last article:

Transcription Services for Speech to Text:

  • Santa — AWS Transcribe
  • Ruth — Google Speech-to-Text (presumably)

Natural Language Processing Service:

  • Santa — Pandorabots
  • Ruth — Dialogflow

Voice — Text-to-Speech Service:

  • Santa — Amazon Polly “Matthew”
  • Ruth — Amazon Polly / Google Text-to-Speech ??

Now go out there and bake some cookies! Just ask Ruth for help.

--

--

Amy Stapleton
Conversational Avatars

Chatables - CEO & Co-founder - Building conversational experiences powered by virtual characters to mitigate isolation in older adults.