Google Cloud Speech API on Android

This article is about implementing Google’s Cloud Speech Recognition API to perform Speech-to-Text operation.

As we may all know Google is the leading service provider in Speech Recognition with up-to support of 120 languages and it’s variants. More about the Google Cloud Speech Recognition can be found here.

The API lets us to do the recognition in two ways one being non-streaming mode and the other being streaming mode.

The non-streaming mode lets user to pick an audio file and the API call via REST is made which processes the audio file and returns the transcript. The Streaming mode user has two options one by the audio file from user or URL link to the audio. The streaming request is made via RPC call.

Here we will be using the non-streaming request.

Get the API key and enable Cloud Speech API in google cloud console. Declare that API key as a String (Not recommended to keep API key in Plain sight which is vulnerable to quota theft).

private final String CLOUD_API_KEY = "API KEY HERE";

A variable to select languages.

String opt;

Create a button to pick a file from the file manager of the device.

final Button browse = findViewById(R.id.browse_button);

A Spinner which is type of dropdown list is used for selecting languages.

MaterialSpinner spinner =  findViewById(R.id.spinner);
spinner.setItems("en-IN", "kn-IN", "hi-IN", "te-IN", "ta-IN");
spinner.setOnItemSelectedListener((MaterialSpinner.OnItemSelectedListener<String>) (view, position, id, item) -> opt= item);

The code above is using spinner of Material Spinner library from another GitHub repository.

Note: The code above uses lamda functions which need updated android studio and SDK tools.

The languages and their codes are supported by google cloud speech are given here.

The filePicker intent is created using the below code(using lamda function).

browse.setOnClickListener(v -> {
Intent filePicker = new Intent(Intent.ACTION_GET_CONTENT);
filePicker.setType("audio/*");// all audio formats
MainActivity.this.startActivityForResult(filePicker, 1);

});

The audio file is fetched from the filePicker to stream using the following code.

protected void onActivityResult(int requestCode, int resultCode,
Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if(resultCode == RESULT_OK) {
final Uri soundUri = data.getData(); //audio data

AsyncTask.execute(() -> {
InputStream stream = null;
try {
stream = getContentResolver() //stream of audio
.openInputStream(soundUri);
} catch (FileNotFoundException e) {
e.printStackTrace();
}

The audio data is to be converted to byte this is done by

byte[] audioData = new byte[0]; //byte format of audio
try {
audioData = toByteArray(stream);
} catch (IOException e) {
e.printStackTrace();
}

the method toByteArray() is imported from a Commons IO library.

The data is then 64 bit encoded.

String base64EncodedData =
Base64.encodeBase64String(audioData);

While the data is being processed in the background, a media player is used to play the same audio being processed, in the foreground so that user is kept engaged.

MediaPlayer player = new MediaPlayer();
try {
player.setDataSource(MainActivity.this, soundUri);

player.prepare();
} catch (IOException e) {
e.printStackTrace();
}
player.start();

// Release the player
player.setOnCompletionListener(
MediaPlayer::release);

The Configuration of the Speech API is done here

Speech speechService = new Speech.Builder(
AndroidHttp.newCompatibleTransport(),
new AndroidJsonFactory(),
null
).setSpeechRequestInitializer(
new SpeechRequestInitializer(CLOUD_API_KEY))
.build();
RecognitionConfig recognitionConfig = new RecognitionConfig();
recognitionConfig.setLanguageCode(opt);
RecognitionAudio recognitionAudio = new RecognitionAudio();
recognitionAudio.setContent(base64EncodedData);

Request is created with the following code

SyncRecognizeRequest request = new SyncRecognizeRequest();
request.setConfig(recognitionConfig);
request.setAudio(recognitionAudio);

The response and transcript is obtained using the following code

// Generate response
SyncRecognizeResponse response = null;
try {
response = speechService.speech()
.syncrecognize(request)
.execute();
} catch (IOException e) {
e.printStackTrace();
}

// Extract transcript
SpeechRecognitionResult result = response.getResults().get(0);
final String transcript = result.getAlternatives().get(0)
.getTranscript();

runOnUiThread(() -> {
EditText speechToTextResult =
findViewById(R.id.speech_to_text_result);
speechToTextResult.setText(transcript);
});
});

The Code Worked Fine getting transcript of .wav file with configuration as mentioned by the speech API other format support and specification can be found here.

Here is the full code:

import android.content.Intent;
import android.media.MediaPlayer;
import android.net.Uri;
import android.os.AsyncTask;
import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.widget.Button;
import android.widget.EditText;

import com.google.api.client.extensions.android.http.AndroidHttp;
import com.google.api.client.extensions.android.json.AndroidJsonFactory;
import com.google.api.client.repackaged.org.apache.commons.codec.binary.Base64;
import com.google.api.services.speech.v1beta1.Speech;
import com.google.api.services.speech.v1beta1.SpeechRequestInitializer;
import com.google.api.services.speech.v1beta1.model.RecognitionAudio;
import com.google.api.services.speech.v1beta1.model.RecognitionConfig;
import com.google.api.services.speech.v1beta1.model.SpeechRecognitionResult;
import com.google.api.services.speech.v1beta1.model.SyncRecognizeRequest;
import com.google.api.services.speech.v1beta1.model.SyncRecognizeResponse;
import com.jaredrummler.materialspinner.MaterialSpinner;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;

import static com.google.common.io.ByteStreams.toByteArray;



public class MainActivity extends AppCompatActivity {
private final String CLOUD_API_KEY = "API KEY HERE";
String opt;//option for language
Button button;// talk->record intent
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
final Button browse = findViewById(R.id.browse_button);

button = findViewById(R.id.talk);

button.setOnClickListener(view -> {

// Start NewActivity.class
Intent myIntent = new Intent(MainActivity.this,
AudRec.class);
startActivity(myIntent);
});

MaterialSpinner spinner = findViewById(R.id.spinner);
spinner.setItems("en-IN", "kn-IN", "hi-IN", "te-IN", "ta-IN");
spinner.setOnItemSelectedListener((MaterialSpinner.OnItemSelectedListener<String>) (view, position, id, item) -> opt= item);


browse.setOnClickListener(v -> {
Intent filePicker = new Intent(Intent.ACTION_GET_CONTENT);
filePicker.setType("audio/*");// all audio formats
MainActivity.this.startActivityForResult(filePicker, 1);

});

}

@Override
protected void onActivityResult(int requestCode, int resultCode,
Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if(resultCode == RESULT_OK) {
final Uri soundUri = data.getData(); //audio data

AsyncTask.execute(() -> {
InputStream stream = null;
try {
stream = getContentResolver() //stream of audio
.openInputStream(soundUri);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
byte[] audioData = new byte[0]; //byte format of audio
try {
audioData = toByteArray(stream);
} catch (IOException e) {
e.printStackTrace();
}
try {
stream.close();
} catch (IOException e) {
e.printStackTrace();
}

String base64EncodedData =
Base64.encodeBase64String(audioData);

MediaPlayer player = new MediaPlayer();
try {
player.setDataSource(MainActivity.this, soundUri);

player.prepare();
} catch (IOException e) {
e.printStackTrace();
}
player.start();

// Release the player
player.setOnCompletionListener(
MediaPlayer::release);
Speech speechService = new Speech.Builder(
AndroidHttp.newCompatibleTransport(),
new AndroidJsonFactory(),
null
).setSpeechRequestInitializer(
new SpeechRequestInitializer(CLOUD_API_KEY))
.build();
RecognitionConfig recognitionConfig = new RecognitionConfig();
recognitionConfig.setLanguageCode(opt);
RecognitionAudio recognitionAudio = new RecognitionAudio();
recognitionAudio.setContent(base64EncodedData);

// Create request
SyncRecognizeRequest request = new SyncRecognizeRequest();
request.setConfig(recognitionConfig);
request.setAudio(recognitionAudio);

// Generate response
SyncRecognizeResponse response = null;
try {
response = speechService.speech()
.syncrecognize(request)
.execute();
} catch (IOException e) {
e.printStackTrace();
}

// Extract transcript
SpeechRecognitionResult result = response.getResults().get(0);
final String transcript = result.getAlternatives().get(0)
.getTranscript();

runOnUiThread(() -> {
EditText speechToTextResult =
findViewById(R.id.speech_to_text_result);
speechToTextResult.setText(transcript);
});
});


}

}




}

Here is the .xml

<?xml version="1.0" encoding="utf-8"?>

<android.support.constraint.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"

xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".MainActivity"
>


<EditText
android:id="@+id/speech_to_text_result"
android:layout_width="308dp"
android:layout_height="121dp"
android:layout_alignParentTop="true"
android:layout_marginBottom="252dp"
android:layout_marginEnd="8dp"
android:layout_marginStart="8dp"
android:hint="wait"
android:inputType="textMultiLine"
android:text=""
android:textSize="25sp"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent" />


<Button
android:theme="@style/MyButton"
style="@style/Widget.AppCompat.Button.Colored"
android:id="@+id/browse_button"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_alignParentBottom="true"
android:layout_margin="20dp"
android:layout_marginBottom="8dp"
android:layout_marginLeft="20dp"
android:layout_marginTop="8dp"

android:text="Browse"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.128"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/speech_to_text_result"
app:layout_constraintVertical_bias="0.808" />

<Button
android:theme="@style/MyButton"
style="@style/Widget.AppCompat.Button.Colored"
android:id="@+id/talk"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_alignParentBottom="true"
android:layout_margin="20dp"
android:layout_marginBottom="8dp"
android:layout_marginEnd="148dp"
android:layout_marginStart="8dp"
android:layout_marginTop="8dp"


android:text="talk"
app:layout_constraintBottom_toBottomOf="parent"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.87"
app:layout_constraintLeft_toLeftOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/speech_to_text_result"
app:layout_constraintVertical_bias="0.819" />

<LinearLayout
android:layout_width="wrap_content"
android:layout_height="wrap_content"
>

<com.jaredrummler.materialspinner.MaterialSpinner
android:id="@+id/spinner"
android:layout_width="match_parent"
android:layout_height="match_parent"
app:ms_background_color="#008B8B"
app:ms_text_color="#ffffff"
/>
</LinearLayout>


</android.support.constraint.ConstraintLayout>

Download The Project Here