Watson Speech Services: Introducing a Simple Interface to Train STT and TTS Custom Models!

Marco Noel
IBM Watson Speech Services
4 min readDec 21, 2020

By Marco Noel, Offering Manager, Watson Speech Services / Alexander Faisman, Software Engineer, Watson Speech Services

Photo by Dylan McLeod on Unsplash

In previous Medium articles, we discussed how you can train Watson Speech-to-Text (STT) and Text-to-Speech (TTS) with your data, using curl commands and SDKs. Now, we are happy to share with you a new, simple user interface to customize these services right on your local machine!

This user interface allows you to use our state-of-the-art customization features for STT and TTS, executing tasks like:

  • Creating custom STT Language Models
  • Adding your own corpora, words and grammars
  • Training and testing your new custom language model
  • Creating custom STT Acoustic Models
  • Adding your own audio files for custom acoustic model
  • Training and testing your new custom acoustic model
  • Creating custom TTS models
  • Adding your own words and pronunciations
  • Testing your new custom TTS model

This code simply passes REST API calls to our STT and TTS services with no other processing. All you need is an IBM Cloud account, an STT or TTS instance, and an API key with a URL to connect to the UI.

In Part 1, we will walk you through how to install the code on your local machine and use it for STT customization.

Let’s get started!

How to download and install this little gem

In this video, Alex shows you how to install and start the UI on a Windows machine:

Go to our public Github repository to download the code as a ZIP file or by using Github Desktop:

Unzip the file into a local folder of your choice.

Before running the Speech user interface code, here’s what you need to install on your machine :

- Install Maven — https://maven.apache.org/install.html- Install latest Java 8 JDK — you can use Java or OpenJDK- Install NodeJS — https://nodejs.org/en/download/
  • Make sure the environment variable JAVA_HOME is set to your latest Java 8 JDK:
For Mac OSX: - run “/usr/libexec/java_home -V”, find the path similar to “/Library/Java/JavaVirtualMachines/jdk1.8.xxxx.jdk/Contents/Home”- from a Terminal window, run "echo $JAVA_HOME" and make sure it's not emptyFor Windows:

- the path should be similar to “C:\Progra~1\Java\jdk1.8.xxxx”
- from a Command prompt, run "echo %JAVA_HOME%" and make sure it's not empty
  • If you install it for the first time — OPTIONAL:
- Open a terminal window / command prompt- Go to the folder where the code is located- run the command “mvn clean install”- Once completed, you can skip this step next time

From a Macbook, the terminal window should look like this — note the “BUILD SUCCESS”:

To launch the Speech UI code:

  • Launch the server
- Open a terminal window / command prompt- Go to the folder where the code is located- run the command “mvn spring-boot:run”

By default, it will use port 8080 but if you have a conflict with another process that is already using it, you can run the following command to use a different one:

mvn spring-boot:run -Dspring-boot.run.arguments=--server.port=<port_number>

Again, from a Macbook terminal window, it should look like this:

See the “Started ServletInitializer…” at the bottom of the terminal window. — Note that this is it’s normal that the prompt does not return since it’s now running the program.

  • Open your browser to get to the user interface
Go to the URL http://localhost:8080"

How to use the STT Language Model customization features

Now that you have the user interface, let me share some articles on the basic concepts and methodology for training STT. These resources will help you get the best results from your customization efforts. I encourage you to read the following Medium articles to get familiar with them:

In these previous articles, I use curl commands to execute each training step, but now, you can run them using the new user interface. In this next video, Alex demoes how to use the different STT Language Model customization features :

How to use the STT Acoustic Model customization features

The user interface can also help you create a custom acoustic model for STT. In this next video, Alex shows you how to use the Acoustic Model customization feature:

Stay tune for Part 2! I’ll walk you through how to handle TTS customization using the new user interface.

You can learn more about IBM Watson Speech-to-Text and Text-to-Speech in my other Medium articles and public documentation. You can also go through the STT Getting Started video HERE

--

--

Marco Noel
IBM Watson Speech Services

Sr Product Manager, IBM Watson Speech / Language Translator. Very enthusiastic and passionate about AI technologies and methodologies. All views are only my own