Android Speech to Text Tutorial

Abhinav Singh
Voice Tech Podcast
Published in
4 min readMay 28, 2020

In this article, we will learn how to implement speech to text functionality in android. To enable our app to use speech to text we have to use the SpeechRecognizer class.

Image from https://predictivehacks.com/simple-example-of-speech-to-text/

Let’s see what is written in the official documentation:

This class provides access to the speech recognition service. This service allows access to the speech recognizer. Do not instantiate this class directly, instead, call SpeechRecognizer#createSpeechRecognizer(Context). This class's methods must be invoked only from the main application thread.

The implementation of this API is likely to stream audio to remote servers to perform speech recognition. As such this API is not intended to be used for continuous recognition, which would consume a significant amount of battery and bandwidth.

Let’s implement this to understand it better.

First of all, create a new Android Studio project and in the manifest file add the following user-permissions:

<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
package="com.example.texttospeech">
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.INTERNET"/>

Now in the activity_main.xml file, we will add an EditText and an ImageView.I will not explain the layout file as I don’t want to waste anybody’s time who is here to learn the Speech to Text.You can see the code below.

<?xml version="1.0" encoding="utf-8"?>
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".MainActivity">
<RelativeLayout
android:layout_width="match_parent"
android:layout_centerInParent="true"
android:layout_marginLeft="10dp"
android:layout_marginRight="10dp"
android:layout_height="wrap_content">
<EditText
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:layout_toLeftOf="@id/button"
android:layout_marginRight="15dp"
android:padding="10dp"
android:hint="Tap to Speak"
android:id="@+id/text"
android:layout_centerInParent="true"
/>
<ImageView
android:layout_width="40dp"
android:layout_height="40dp"
android:backgroundTint="#F9F8FA"
android:paddingRight="10dp"
android:src="@drawable/ic_mic_black_off"
android:layout_alignParentRight="true"
android:id="@+id/button"/>
</RelativeLayout>



</RelativeLayout>

Let’s dive into the MainActivity.java file.

First, we will check for the permissions:

if(ContextCompat.checkSelfPermission(this,Manifest.permission.RECORD_AUDIO) != PackageManager.PERMISSION_GRANTED){
checkPermission();
}

If the permission is not granted then we will call the checkPermission method.

private void checkPermission() {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {
ActivityCompat.requestPermissions(this,new String[]{Manifest.permission.RECORD_AUDIO},RecordAudioRequestCode);
}
}

Now here comes the important part first which will initialize the SpeecRecognizer object and then create the intent for recognizing the speech.

speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this);final Intent speechRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());

As you can see we have added some extras let me explain what are they.

The constant ACTION_RECOGNIZE_SPEECH starts an activity that will prompt the user for speech and send it through a speech recognizer.

EXTRA_LANGUAGE_MODEL: Informs the recognizer which speech model to prefer when performing ACTION_RECOGNIZE_SPEECH.

LANGUAGE_MODEL_FREE_FORM: Use a language model based on free-form speech recognition.

EXTRA_LANGUAGE: Optional IETF language tag (as defined by BCP 47), for example, “en-US”.

Now we will set a speechRecognitionListener to our speechRecognizer object using the setRecognitionListener() method.

You can see after setting the listener we get several methods to implement. We will go the onResults method and add the following code

@Override
public void onResults(Bundle bundle) {
micButton.setImageResource(R.drawable.ic_mic_black_off);
ArrayList<String> data = bundle.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
editText.setText(data.get(0));
}

We get an ArrayList of String as a result and we use this ArrayList to update the UI (EditText here).

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

In the onBeginningOfSpeeh() method we will add the following code to tell the user that his voice is being recognized.

@Override
public void onBeginningOfSpeech() {
editText.setText("Listening...");
}

Now, lets set up the imageView. We will add a touchListener to the image view to know when the user has pressed the image.

micButton.setOnTouchListener(new View.OnTouchListener() {
@Override
public boolean onTouch(View view, MotionEvent motionEvent) {
if (motionEvent.getAction() == MotionEvent.ACTION_UP){
speechRecognizer.stopListening();
}
if (motionEvent.getAction() == MotionEvent.ACTION_DOWN){
micButton.setImageResource(R.drawable.ic_mic_black_24dp);
speechRecognizer.startListening(speechRecognizerIntent);
}
return false;
}
});

When the user taps the imageView the listener starts listening and the imageView source image is also changed to update the user that his voice is being listened to.

You can find the full source code below:

package com.example.texttospeech;

import androidx.annotation.NonNull;
import androidx.appcompat.app.AppCompatActivity;
import androidx.core.app.ActivityCompat;
import androidx.core.content.ContextCompat;

import android.Manifest;
import android.content.Intent;
import android.content.pm.PackageManager;
import android.os.Build;
import android.os.Bundle;
import android.speech.RecognitionListener;
import android.speech.RecognizerIntent;
import android.speech.SpeechRecognizer;
import android.view.MotionEvent;
import android.view.View;
import android.widget.Button;
import android.widget.EditText;
import android.widget.ImageView;
import android.widget.TextView;
import android.widget.Toast;

import java.util.ArrayList;
import java.util.Locale;

public class MainActivity extends AppCompatActivity {
public static final Integer RecordAudioRequestCode = 1;
private SpeechRecognizer speechRecognizer;
private EditText editText;
private ImageView micButton;

@Override
protected void onCreate(final Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
if(ContextCompat.checkSelfPermission(this,Manifest.permission.RECORD_AUDIO) != PackageManager.PERMISSION_GRANTED){
checkPermission();
}

editText = findViewById(R.id.text);
micButton = findViewById(R.id.button);
speechRecognizer = SpeechRecognizer.createSpeechRecognizer(this);

final Intent speechRecognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault());

speechRecognizer.setRecognitionListener(new RecognitionListener() {
@Override
public void onReadyForSpeech(Bundle bundle) {

}

@Override
public void onBeginningOfSpeech() {
editText.setText("");
editText.setHint("Listening...");
}

@Override
public void onRmsChanged(float v) {

}

@Override
public void onBufferReceived(byte[] bytes) {

}

@Override
public void onEndOfSpeech() {

}

@Override
public void onError(int i) {

}

@Override
public void onResults(Bundle bundle) {
micButton.setImageResource(R.drawable.ic_mic_black_off);
ArrayList<String> data = bundle.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
editText.setText(data.get(0));
}

@Override
public void onPartialResults(Bundle bundle) {

}

@Override
public void onEvent(int i, Bundle bundle) {

}
});

micButton.setOnTouchListener(new View.OnTouchListener() {
@Override
public boolean onTouch(View view, MotionEvent motionEvent) {
if (motionEvent.getAction() == MotionEvent.ACTION_UP){
speechRecognizer.stopListening();
}
if (motionEvent.getAction() == MotionEvent.ACTION_DOWN){
micButton.setImageResource(R.drawable.ic_mic_black_24dp);
speechRecognizer.startListening(speechRecognizerIntent);
}
return false;
}
});


}

@Override
protected void onDestroy() {
super.onDestroy();
speechRecognizer.destroy();
}

private void checkPermission() {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {
ActivityCompat.requestPermissions(this,new String[]{Manifest.permission.RECORD_AUDIO},RecordAudioRequestCode);
}
}

@Override
public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) {
super.onRequestPermissionsResult(requestCode, permissions, grantResults);
if (requestCode == RecordAudioRequestCode && grantResults.length > 0 ){
if(grantResults[0] == PackageManager.PERMISSION_GRANTED)
Toast.makeText(this,"Permission Granted",Toast.LENGTH_SHORT).show();
}
}
}

Let's run the app.

Now you are capable of adding a cool functionality to your app. What are you waiting for start coding…

You can get the full source code from my GitHub.

Let’s connect on LinkedIn, Github, Twitter.

Something just for you

--

--

Abhinav Singh
Voice Tech Podcast

Software Engineer | Linkedin @cachedengineer | Twitter @cached_engineer