Say It Like You Mean It! — New Expressive Neural Voices for IBM Text to Speech

Transform your text into a vibrant dialogue with our new expressive voices.

Richie Verma
IBM Watson Speech Services
4 min readJun 11, 2024

--

A woman holding a balloon with a happy smiley face in front of her face, and some other balloons
Photo by Lidya Nada on Unsplash

INTRODUCTION

Have you ever experienced the power of natural conversation in a VoiceBot? Introducing IBM Watson’s Expressive Neural Voices — bringing text to life with state-of-the-art natural-sounding speech that is exceptionally clear and crisp. These voices capture the essence of human speech, and are designed to convey emotion and nuance, making every interaction more engaging and natural.

WHAT’S NEW

Spanish is one of the most widely spoken languages in the world, with over 580 million speakers across the globe. As the official language of 21 countries, Spanish serves as a vital tool for communication, business, and cultural exchange. Recognizing its significance, in this release, we have added a new female voice for Latin American Spanish to our existing portfolio of expressive neural voices, which already includes US English and Australian English. You can listen to samples for all these voices in our catalog. Here is a voice sample for the new Spanish voice:

FEATURES

Our expressive neural voices capture text sentiment, producing speech that reflects the mood — gratitude, happiness, empathy, or confusion — naturally and conversationally.

Photo by Nik on Unsplash

You can enhance the Spanish Expressive Voice with SSML to emphasize specific Speaking Styles. Listen to the samples below:

  • cheerful: Upbeat and positive — <express-as style="cheerful"> ¡Excelente! ¿Hay algo más en lo que pueda ayudarte? </express-as>
  • empathetic: Compassionate and understanding — <express-as style="empathetic"> Lamento escuchar que este asunto aún no se ha resuelto, y le pido disculpas. </express-as>
  • neutral: Objective and instructional — <express-as style="neutral"> Durante el otoño, el clima es ventoso. </express-as>
  • uncertain: Reflects doubt and confusion — <express-as style="uncertain"> Disculpe, pero ¿le importaría repetir lo que me decía? No logré oirlo bien. </express-as>

These voices also detect and accentuate common Interjections — spontaneous expressions of emotions like surprise, joy, or frustration in a conversation — naturally, just like human speech. The new Spanish Expressive Voice currently supports five common interjections that are enabled by default. Hear the difference in our samples below:

  • Aah, ese pago ya fue realizado la semana pasada, y su cuenta quedó saldada.
  • ¡Ajá, ya logré localizar su pedido! Está en camino y llegará dentro de 24 horas.
  • Eh, no sé si podremos enviar un técnico a su residencia antes del miércoles.
  • Oh, claro que podemos guardar su equipaje en consigna hasta la hora de su salida. No hay problemas.
  • Uf, no creo que pueda concertarle una cita hasta que recibamos sus documentos.

All our expressive voices also allow for Word Emphasis using SSML. Stressing certain words adds clarity and tells the listener what’s important in the sentence. We currently allow for 4 levels of emphasis. Listen to this example of using medium level emphasis for the words “Mazda” and “Azul” in this sentence:

  • Efectivamente, es un <emphasis level="medium"> Mazda </emphasis> CX3 de color <emphasis level="medium"> azul.</emphasis>

USE CASES

Tailored for customer care, these voices default to a conversational style, but they also support a neutral tone suitable for diverse applications like newscasting, e-learning, and audiobooks. Additionally, this voice was utilized for Spanish AI narration during this year’s Golf Masters event. Here are the steps you can follow to check out the Spanish Narration:
- Navigate to www.masters.com website
- Click on “Account” and then “My Group” on the top-right section
- Select a player
- Click on the navigation bar to find the audio menu
- Select the preferred narration language

WHAT’S NEXT

Experience the full spectrum of human emotion and revolutionize your interactions and elevate your user experience to a whole new level with our new expressive voices. Say it like you mean it and bring your text to life like never before!

Stay tuned for improved and additional support for Spanish interjections along with addition of more expressive voices in other languages coming out soon!

--

--