Illustration of someone playing the guitar on a computer screen.

AoGProTips: Synchronize animations with the Text-To-Speech

Mandy Chan
Google Developers
Published in
3 min readApr 2, 2020

--

Missed our weekly video? Don’t worry, watch this week of #AoGProTips 🎥

When building smart display games for the Google Assistant using Interactive Canvas, you can add fun animations to create a fully immersive experience. Do you know you can synchronize your animations with audio, such as making a dinosaur open its mouth at the exact moment you play the roaring sound? In this blog post, you will learn how to use the SSML mark tag with the onTtsMark callback function to synchronize your animations with audio.

SSML stands for Speech Synthesis Markup Language. By using SSML, you can make your conversation’s responses sound more natural by adding breaks between words, and adjusting the speed, pitch and rate of a word. Look at the SSML example below.

<speak> The dinosaur is about to roar <mark name = ‘START_ROAR’><audio src=’roar.mp3’/><mark name =’STOP_ROAR’></speak>

In the SSML, the <mark> tag allows you to indicate during the generated TTS audio when the dinosaur should start and stop animating. It generates events during TTS; your code has a callback that gets triggered by each mark tag. Each mark event has a name. In this example, we have two events named “START_ROAR” and “STOP_ROAR”. You need to write code that can be triggered by each mark event. The code for the “START_ROAR” event can open the dinosaur’s mouth after the spoken prompt “The dinosaur is about to roar “ is complete. Similarly, the code for the “STOP_ROAR” event can close the dinosaur’s mouth.

Now that the mark tags are in place within the SSML, you can write the logic for each animation when the mark tag is hit. Keep in mind, the name of the mark tag must be unique within the SSML so that the onTtsMark callback responds to the correct cue. Let’s look at the code below.

You register a callback for onTtsMark. The onTtsMark() method receives the markName in the SSML, and depending on the name of the mark tag, triggers the corresponding logic which plays the animation of the dinosaur moving its mouth. For the ‘START_ROAR’ markName, it will call the beginRoaring function, whereas ‘STOP_ROARING’ will call the stopRoaring function to stop the animation.

Now that you have learned how to synchronize animations using the SSML mark tag and the onTtsMark callback, we hope you apply this tip in your next Action to create a fully immersive gaming experience for your users.

If you have a tip that you think other developers should know about, share your thoughts with us on Twitter using #AoGProTips. Lastly, check out our collection of other pro tips here.

--

--

Mandy Chan
Google Developers

| ☆ Creator of SSML Builder | Google Assistant Developer Advocate | I write about Actions on Google | Voiceholic on http://bit.ly/gcp-mandy