This is how Watson understands your Speech

Vidyasagar Machupalli
vmacwrites
Published in
4 min readMar 14, 2017

This post is about injecting Watson Speech-to-Text into an Android native app. Speech-to-Text is available as a service on IBM Cloud i.e.., Bluemix. You will integrating the service available on Bluemix into our favourite chatbot “The WatBOT” using Watson Developer Android SDK with minimal lines of code.

Why Watson Speech-to-Text?

The IBM® Speech to Text service provides an Application Programming Interface (API) that lets you add speech transcription capabilities to your applications. To transcribe the human voice accurately, the service leverages machine intelligence to combine information about grammar and language structure with knowledge of the composition of the audio signal. The service continuously returns and retroactively updates the transcription as more speech is heard.

Overview for developers introduces the three interfaces provided by the service: a WebSocket interface, an HTTP REST interface, and an asynchronous HTTP interface (beta).

Input Features

  • Languages: Supports Brazilian Portuguese, French, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English.
  • Models: For most languages, supports both broadband (for audio that is sampled at a minimum rate of 16 KHz) and narrowband (for audio that is sampled at a minimum rate of 8 KHz) models.
  • Audio formats: Transcribes Free Lossless Audio Codec (FLAC), Linear 16-bit Pulse-Code Modulation (PCM), Waveform Audio File Format (WAV), Ogg format with the opus codec, mu-law (or u-law) audio data, or basic audio.
  • Audio transmission: Lets the client pass as much as 100 MB of audio to the service as a continuous stream of data chunks or as a one-shot delivery, passing all of the data at one time. With streaming, the service enforces various timeouts to preserve resources.

Output Features

  • Speaker labels (beta): Recognizes different speakers from narrowband audio in US English, Spanish, or Japanese. This feature provides a transcription that labels each speaker’s contributions to a multi-participant conversation.
  • Keyword spotting (beta): Identifies spoken phrases from the audio that match specified keyword strings with a user-defined level of confidence. This feature is especially useful when individual words or topics from the input are more important than the full transcription. For example, it can be used with a customer support system to determine how to route or categorize a customer request.
  • Word alternatives (beta), confidence, and timestamps: Reports alternative words that are acoustically similar to the words that it transcribes, confidence levels for each of the words that it transcribes, and timestamps for the start and end of each word.
  • Maximum alternatives and interim results: Returns alternative and interim transcription results. The former provide different possible hypotheses; the latter represent interim hypotheses as the transcription progresses. In both cases, the service indicates final results in which it has the greatest confidence.
  • Profanity filtering: Censors profanity from US English transcriptions by default. You can use the filtering to sanitize the service’s output.
  • Smart formatting (beta): Converts dates, times, numbers, phone numbers, and currency values in final transcripts of US English audio into more readable, conventional forms.

Integrating STT into existing android app

  • Add this permission to RECORD_AUDIO in Manifest.xml
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
  • Open build.gradle(app) and add the below entries under dependencies
compile 'com.squareup.okhttp3:okhttp-ws:3.4.2'
compile 'com.ibm.watson.developer_cloud:android-sdk:0.2.3'
compile 'com.ibm.watson.developer_cloud:speech-to-text:3.5.3'
  • Add an image (mic) as an asset under res/mipmap
  • Open res/layout/content_chat_room.xml and add the below code
<android.support.v7.widget.AppCompatImageButton
android:id="@+id/btn_record"
android:layout_width="wrap_content"
android:layout_height="match_parent"
android:layout_marginBottom="10dp"
android:background="@null"
android:elevation="0dp"
android:paddingLeft="10dp"
android:scaleType="fitCenter"
android:src="@mipmap/ic_mic" />
  • Entries in MainActivity.java to request permission from the user to access Microphone and record audio
int permission = ContextCompat.checkSelfPermission(this,
Manifest.permission.RECORD_AUDIO);
if (permission != PackageManager.PERMISSION_GRANTED) {
Log.i(TAG, "Permission to record denied");
makeRequest();
}
// Speech-to-Text Record Audio permission
@Override
public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults);
switch (requestCode){
case REQUEST_RECORD_AUDIO_PERMISSION:
permissionToRecordAccepted = grantResults[0] == PackageManager.PERMISSION_GRANTED;
break;
case RECORD_REQUEST_CODE: {
if (grantResults.length == 0
|| grantResults[0] !=
PackageManager.PERMISSION_GRANTED) {
Log.i(TAG, "Permission has been denied by user");
} else {
Log.i(TAG, "Permission has been granted by user");
}
return;
}
}
if (!permissionToRecordAccepted ) finish();
}protected void makeRequest() {
ActivityCompat.requestPermissions(this,
new String[]{Manifest.permission.RECORD_AUDIO},
RECORD_REQUEST_CODE);
}
  • Add the below code in MainActivity.Java outside onCreate
//Record a message via Watson Speech to Text
private void recordMessage() {
//mic.setEnabled(false);
speechService = new SpeechToText();
speechService.setUsernameAndPassword(STT_username, STT_password);
if(listening != true) {
capture = new MicrophoneInputStream(true);
new Thread(new Runnable() {
@Override public void run() {
try {
speechService.recognizeUsingWebSocket(capture, getRecognizeOptions(), new MicrophoneRecognizeDelegate());
} catch (Exception e) {
showError(e);
}
}
}).start();
listening = true;
Toast.makeText(MainActivity.this,"Listening....Click to Stop", Toast.LENGTH_LONG).show();
} else {
try {
capture.close();
listening = false;
Toast.makeText(MainActivity.this,"Stopped Listening....Click to Start", Toast.LENGTH_LONG).show();
} catch (Exception e) {
e.printStackTrace();
}
}
}
btnRecord.setOnClickListener(new View.OnClickListener() {
@Override public void onClick(View v) {
recordMessage();
}
});
};
  • Add these private methods to complete the story
//Private Methods - Speech to Text
private RecognizeOptions getRecognizeOptions() {
return new RecognizeOptions.Builder()
.continuous(true)
.contentType(ContentType.OPUS.toString())
//.model("en-UK_NarrowbandModel")
.interimResults(true)
.inactivityTimeout(2000)
.build();
}
//Watson Speech to Text Methods.
private class MicrophoneRecognizeDelegate implements RecognizeCallback {
@Override
public void onTranscription(SpeechResults speechResults) {
System.out.println(speechResults);
if(speechResults.getResults() != null && !speechResults.getResults().isEmpty()) {
String text = speechResults.getResults().get(0).getAlternatives().get(0).getTranscript();
showMicText(text);
}
}
@Override public void onConnected() { } @Override public void onError(Exception e) {
showError(e);
enableMicButton();
}
@Override public void onDisconnected() {
enableMicButton();
}
}
private void showMicText(final String text) {
runOnUiThread(new Runnable() {
@Override public void run() {
inputMessage.setText(text);
}
});
}
private void enableMicButton() {
runOnUiThread(new Runnable() {
@Override public void run() {
btnRecord.setEnabled(true);
}
});
}
private void showError(final Exception e) {
runOnUiThread(new Runnable() {
@Override public void run() {
Toast.makeText(MainActivity.this, e.getMessage(), Toast.LENGTH_SHORT).show();
e.printStackTrace();
}
});
}

Here’s the complete code to understand where to make the above entries, click here to see the MainActivity.Java

Also, check how to integrate Text-to-Speech here

--

--

Vidyasagar Machupalli
vmacwrites

Architect, Developer, IBMer, Speaker, Blogger, Teetotaller, Geek & many more…