Watson Speech Services: Introducing a Simple Interface to Train STT and TTS Custom Models!

Published in

IBM Watson Speech Services

4 min readDec 21, 2020

By Marco Noel, Offering Manager, Watson Speech Services / Alexander Faisman, Software Engineer, Watson Speech Services

In previous Medium articles, we discussed how you can train Watson Speech-to-Text (STT) and Text-to-Speech (TTS) with your data, using curl commands and SDKs. Now, we are happy to share with you a new, simple user interface to customize these services right on your local machine!

This user interface allows you to use our state-of-the-art customization features for STT and TTS, executing tasks like:

Creating custom STT Language Models
Adding your own corpora, words and grammars
Training and testing your new custom language model
Creating custom STT Acoustic Models
Adding your own audio files for custom acoustic model
Training and testing your new custom acoustic model
Creating custom TTS models
Adding your own words and pronunciations
Testing your new custom TTS model

This code simply passes REST API calls to our STT and TTS services with no other processing. All you need is an IBM Cloud account, an STT or TTS instance, and an API key with a URL to connect to the UI.

In Part 1, we will walk you through how to install the code on your local machine and use it for STT customization.

Let’s get started!

How to download and install this little gem

In this video, Alex shows you how to install and start the UI on a Windows machine:

Go to our public Github repository to download the code as a ZIP file or by using Github Desktop:

IBM/speech-customization-ui

This code is a user interface for IBM Watson Speech-To-Text and Text-To-Speech. This will allow users to use the speech…

github.com

Unzip the file into a local folder of your choice.

Before running the Speech user interface code, here’s what you need to install on your machine :

- Install Maven — https://maven.apache.org/install.html- Install latest Java 8 JDK — you can use Java or OpenJDK- Install NodeJS — https://nodejs.org/en/download/

Make sure the environment variable JAVA_HOME is set to your latest Java 8 JDK:

For Mac OSX: - run “/usr/libexec/java_home -V”, find the path similar to “/Library/Java/JavaVirtualMachines/jdk1.8.xxxx.jdk/Contents/Home”- from a Terminal window, run "echo $JAVA_HOME" and make sure it's not emptyFor Windows:
 
- the path should be similar to “C:\Progra~1\Java\jdk1.8.xxxx”- from a Command prompt, run "echo %JAVA_HOME%" and make sure it's not empty

If you install it for the first time — OPTIONAL:

- Open a terminal window / command prompt- Go to the folder where the code is located- run the command “mvn clean install”- Once completed, you can skip this step next time

From a Macbook, the terminal window should look like this — note the “BUILD SUCCESS”:

To launch the Speech UI code:

Launch the server

- Open a terminal window / command prompt- Go to the folder where the code is located- run the command “mvn spring-boot:run”

By default, it will use port 8080 but if you have a conflict with another process that is already using it, you can run the following command to use a different one:

mvn spring-boot:run -Dspring-boot.run.arguments=--server.port=<port_number>

Again, from a Macbook terminal window, it should look like this:

See the “Started ServletInitializer…” at the bottom of the terminal window. — Note that this is it’s normal that the prompt does not return since it’s now running the program.

Open your browser to get to the user interface

Go to the URL “http://localhost:8080"

How to use the STT Language Model customization features

Now that you have the user interface, let me share some articles on the basic concepts and methodology for training STT. These resources will help you get the best results from your customization efforts. I encourage you to read the following Medium articles to get familiar with them:

Watson Speech-To-Text: How to Train Your Own Speech “Dragon” — Part 1: Data Collection and…

Over the past years, we’ve seen a lot of AI chatbots deployed in across many organizations. They typically handle…

medium.com

Watson Speech-To-Text: How to Train Your Own Speech “Dragon” — Part 2: Training with Data

In Part 1, I walked you through the different components in Watson STT available for adaptation. I also covered the…

medium.com

In these previous articles, I use curl commands to execute each training step, but now, you can run them using the new user interface. In this next video, Alex demoes how to use the different STT Language Model customization features :

How to use the STT Acoustic Model customization features

The user interface can also help you create a custom acoustic model for STT. In this next video, Alex shows you how to use the Acoustic Model customization feature:

Stay tune for Part 2! I’ll walk you through how to handle TTS customization using the new user interface.

You can learn more about IBM Watson Speech-to-Text and Text-to-Speech in my other Medium articles and public documentation. You can also go through the STT Getting Started video HERE