HMS ML Kit Using Text To Speech And Automatic Speech Recognition Together

Published in

Huawei Developers

7 min readApr 5, 2023

HMS ML Kit Automatic Speech Recognition And Text To Speech Integration

Introduction

Hello Dear Kotlin lovers! Welcome to my new article. Today I’m going to explain how to implement HMS kit ASR and TTS in your project . I Hope you will like it.

How To Implement ASR And TTS?

This part is common for both of them

· You need to register as a developer account in AppGallery Connect.

· You must create an application and enable ML Kit from AppGallery Connect.

· When you finish process of creating project, you need to get agconnect-services.json file for configurations from AppGallery Connect. Then, you have to add it into our application project level under the app folder.

· After that, we need to add dependencies into project level gradle files.


buildscript {
    repositories {
        google()
        jcenter()
        maven {url 'https://developer.huawei.com/repo/'}
    }
    dependencies {
        classpath "com.android.tools.build:gradle:4.0.0"
        classpath 'com.huawei.agconnect:agcp:1.3.1.300'

        // NOTE: Do not place your application dependencies here; they belong
        // in the individual module build.gradle files
    }
}

allprojects {
    repositories {
        google()
        jcenter()
        maven { url 'https://developer.huawei.com/repo/' }
    }
}

Then, we need to add dependencies into app level gradle files.


...
apply plugin: 'com.huawei.agconnect'

android {
...
}

dependencies { 
    ...
    // Import the base SDK.
      
         // ML Kit
    implementation 'com.huawei.hms:ml-computer-voice-tts:3.7.0.303'
    implementation 'com.huawei.hms:ml-computer-voice-asr-plugin:3.7.0.301'
}

Lets set api key

  MLApplication.initialize(this)
  MLApplication.getInstance().apiKey = API_KEY

Here we set api key for ml kit

Lets create layout file

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <TextView
        android:id="@+id/outputTextView"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Hello World!"
        android:gravity="center"
        android:textSize="32sp"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

    <Button
        android:id="@+id/asrButton"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="ASR"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/outputTextView" />

    <Button
        android:id="@+id/ttsButton"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="TTS"
        app:layout_constraintBottom_toTopOf="@+id/outputTextView"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

</androidx.constraintlayout.widget.ConstraintLayout>

Here we created 2 button ASR and TTS and 1 TextView to show ASR

Now shall we start to integrate TTS (Helper Class)

class TextToSpeechHelper(
    private val isPlayingCallback: (TTSPlayingStatus) -> Unit
) {

    private var mlTtsConfig: MLTtsConfig? = null
    private var mlTtsEngine: MLTtsEngine? = null

    private var ttsPlayingStatus = TTSPlayingStatus.NOT_STARTED

    /**
     * ML Kit TTS has a 500 character limit. We split to reduce the size using punctuations.
     */
    fun startTTS(text: String) {
        mlTtsEngine?.let {

            when (ttsPlayingStatus) {
                TTSPlayingStatus.NOT_STARTED -> {
                    it.stop()
                    val sentences = text.split("\n|\\.(?!\\d)|(?<!\\d)\\.").toTypedArray()
                    for (sentence in sentences)
                        if (sentence.length < 500) {
                            it.speak(sentence, MLTtsEngine.QUEUE_APPEND)
                        }
                }

                TTSPlayingStatus.PLAYING -> {
                    pauseTTS()
                }
                TTSPlayingStatus.PAUSED -> {
                    resumeTTS()
                }
            }
        }
    }

    fun pauseTTS() {
        if (ttsPlayingStatus == TTSPlayingStatus.PLAYING) {
            mlTtsEngine?.pause()
        }
    }

    fun resumeTTS() {
        if (ttsPlayingStatus == TTSPlayingStatus.PAUSED) {
            mlTtsEngine?.resume()
        }
    }


    fun stopTTS() {
        mlTtsEngine?.stop()
    }

    fun destroyTTS() {
        mlTtsEngine?.shutdown()
        mlTtsEngine = null
    }

    init {
        isPlayingCallback.invoke(TTSPlayingStatus.NOT_STARTED)
        mlTtsConfig = MLTtsConfig().apply {
            language = MLTtsConstants.TTS_EN_US
            person = MLTtsConstants.TTS_SPEAKER_FEMALE_EN
            speed = 1.0f
            volume = 1.0f
        }

        mlTtsEngine = MLTtsEngine(mlTtsConfig)

        val callback: MLTtsCallback = object : MLTtsCallback {
            override fun onError(taskId: String?, error: MLTtsError?) {
                ttsPlayingStatus = TTSPlayingStatus.NOT_STARTED
            }

            override fun onWarn(taskId: String?, warn: MLTtsWarn?) {

            }

            override fun onRangeStart(taskId: String?, start: Int, end: Int) {

            }

            override fun onAudioAvailable(
                p0: String?,
                p1: MLTtsAudioFragment?,
                p2: Int,
                p3: Pair<Int, Int>?,
                p4: Bundle?
            ) {

            }


            override fun onEvent(taskId: String?, eventId: Int, bundle: Bundle?) {

                when (eventId) {
                    MLTtsConstants.EVENT_PLAY_START -> {
                        ttsPlayingStatus = TTSPlayingStatus.PLAYING
                        isPlayingCallback.invoke(ttsPlayingStatus)
                    }
                    MLTtsConstants.EVENT_PLAY_STOP -> {
                        ttsPlayingStatus = TTSPlayingStatus.NOT_STARTED
                        isPlayingCallback.invoke(ttsPlayingStatus)
                    }
                    MLTtsConstants.EVENT_PLAY_RESUME -> {
                        ttsPlayingStatus = TTSPlayingStatus.PLAYING
                        isPlayingCallback.invoke(ttsPlayingStatus)
                    }
                    MLTtsConstants.EVENT_PLAY_PAUSE -> {
                        ttsPlayingStatus = TTSPlayingStatus.PAUSED
                        isPlayingCallback.invoke(ttsPlayingStatus)
                    }

                }
            }

        }

        mlTtsEngine?.setTtsCallback(callback)

    }

}

enum class TTSPlayingStatus { NOT_STARTED, PLAYING, PAUSED }

Here we create a TTS engine. You can create a TTS engine using MLTtsConfig after create a TTS callback class to process the audio synthesis result. Pass the TTS callback to the TTS engine created in step 1 to perform audio synthesis. Control the playback (valid only when an internal player is used).Stop the current TTS task, clear all tasks (including tasks that are being played and have not been played) in the queue, and retain the TTS engine.Destroy the TTS engine and release related resources.

Now it is time to initialization TTS

 private fun initTTS() {
        textToSpeechHelper = TextToSpeechHelper { ttsPlayingStatus ->
            when (ttsPlayingStatus) {
                TTSPlayingStatus.PLAYING -> {

                }

                TTSPlayingStatus.PAUSED -> {

                }

                TTSPlayingStatus.NOT_STARTED -> {

                }
            }
        }
    }
  override fun onDestroy() {
        textToSpeechHelper?.destroyTTS()
        super.onDestroy()
    }

Shall we start to integrate TTS

   when {
                ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.RECORD_AUDIO
                ) == PackageManager.PERMISSION_GRANTED -> {
                    startASRWithSpeechPickupUI()
                }
                else -> {
                    requestPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
                }
            }

Integrate ASR in MainActivity after that you can click ASRButton or TTSButton

class MainActivity : AppCompatActivity() {
    lateinit var asrButton: Button
    lateinit var ttsButton: Button
    lateinit var outputTextView: TextView
    private var textToSpeechHelper: TextToSpeechHelper? = null
    val REQUEST_CODE_ASR: Int = 100
    var textGlobal = ""
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        MLApplication.initialize(this)
        MLApplication.getInstance().apiKey = BuildConfig.API_KEY
        asrButton = findViewById(R.id.asrButton)
        ttsButton = findViewById(R.id.ttsButton)
        outputTextView = findViewById(R.id.outputTextView)
        initTTS()
        textGlobal = outputTextView.text.toString()
        asrButton.setOnClickListener {
            when {
                ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.RECORD_AUDIO
                ) == PackageManager.PERMISSION_GRANTED -> {
                    startASRWithSpeechPickupUI()
                }
                else -> {
                    requestPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
                }
            }
        }

        ttsButton.setOnClickListener {
            textToSpeechHelper?.startTTS(textGlobal)
        }
    }

    val requestPermissionLauncher =
        registerForActivityResult(ActivityResultContracts.RequestPermission()) { isGranted ->
            if (isGranted) {
                startASRWithSpeechPickupUI()
            } else {
                Toast.makeText(this, "Permission Error", Toast.LENGTH_SHORT).show()
            }
        }

    fun startASRWithSpeechPickupUI() {
        // Use Intent for recognition settings.
        val intent = Intent(this, MLAsrCaptureActivity::class.java)
            .putExtra(MLAsrCaptureConstants.LANGUAGE, "en-US")
            // Set whether to display the recognition result on the speech pickup UI. MLAsrCaptureConstants.FEATURE_ALLINONE: no; MLAsrCaptureConstants.FEATURE_WORDFLUX: yes.
            .putExtra(MLAsrCaptureConstants.FEATURE, MLAsrCaptureConstants.FEATURE_WORDFLUX)


        // REQUEST_CODE_ASR: request code between the current activity and speech pickup UI activity. You can use this code to obtain the processing result of the speech pickup UI.
        startActivityForResult(intent, REQUEST_CODE_ASR)
    }

    override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
        super.onActivityResult(requestCode, resultCode, data)
        var text = ""
        // REQUEST_CODE_ASR: request code between the current activity and speech pickup UI activity defined in step 3.
        if (requestCode == REQUEST_CODE_ASR) {
            when (resultCode) {
                MLAsrCaptureConstants.ASR_SUCCESS -> if (data != null) {
                    val bundle = data.extras
                    // Obtain the text information recognized from speech.
                    if (bundle!!.containsKey(MLAsrCaptureConstants.ASR_RESULT)) {
                        text = bundle.getString(MLAsrCaptureConstants.ASR_RESULT).toString()
                        Log.d(TAG, "onActivityResult: $text")
                        outputTextView.text = text
                        // Process the recognized text information.
                        textGlobal = text
                    }
                }
                MLAsrCaptureConstants.ASR_FAILURE ->                     // Processing logic for recognition failure.
                    if (data != null) {
                        val bundle = data.extras
                        // Check whether a result code is contained.
                        if (bundle!!.containsKey(MLAsrCaptureConstants.ASR_ERROR_CODE)) {
                            val errorCode = bundle.getInt(MLAsrCaptureConstants.ASR_ERROR_CODE)
                            // Perform troubleshooting based on the result code.
                        }
                        // Check whether error information is contained.
                        if (bundle.containsKey(MLAsrCaptureConstants.ASR_ERROR_MESSAGE)) {
                            val errorMsg = bundle.getString(MLAsrCaptureConstants.ASR_ERROR_MESSAGE)
                            // Perform troubleshooting based on the error information.
                        }
                        // Check whether a sub-result code is contained.
                        if (bundle.containsKey(MLAsrCaptureConstants.ASR_SUB_ERROR_CODE)) {
                            val subErrorCode =
                                bundle.getInt(MLAsrCaptureConstants.ASR_SUB_ERROR_CODE)
                            // Process the sub-result code.
                        }
                    }
                else -> {
                }
            }
        }
    }

    // Use the callback to implement the MLAsrListener API and methods in the API.
    internal inner class SpeechRecognitionListener : MLAsrListener {
        override fun onStartListening() {
            // The recorder starts to receive speech.
            Log.d(TAG, "onStartListening: ")
        }

        override fun onStartingOfSpeech() {
            // The user starts to speak, that is, the speech recognizer detects that the user starts to speak.
            Log.d(TAG, "onStartingOfSpeech: ")
        }

        override fun onVoiceDataReceived(data: ByteArray, energy: Float, bundle: Bundle) {
            // Return the original PCM stream and audio power to the user. This API is not running in the main thread, and the return result is processed in the sub-thread.
        }

        override fun onRecognizingResults(partialResults: Bundle) {
            // Receive the recognized text from MLAsrRecognizer. This API is not running in the main thread, and the return result is processed in the sub-thread.
            val text = partialResults.getString(MLAsrRecognizer.RESULTS_RECOGNIZING) ?: ""
            Log.d(TAG, "onRecognizingResults: $text")
        }

        override fun onResults(results: Bundle) {
            // Text data of ASR. This API is not running in the main thread, and the return result is processed in the sub-thread.
            val text = results.getString(MLAsrRecognizer.RESULTS_RECOGNIZED) ?: ""
            Log.d(TAG, "onResults: $text")
            runOnUiThread {
                outputTextView.text = text
            }
        }

        override fun onError(error: Int, errorMessage: String) {
            // Called when an error occurs in recognition. This API is not running in the main thread, and the return result is processed in the sub-thread.
            Log.d(TAG, "onError: ErrorCode:$error, errorMessage:$errorMessage")
        }

        override fun onState(state: Int, params: Bundle) {
            // Notify the app status change. This API is not running in the main thread, and the return result is processed in the sub-thread.
            val message = when (state) {
                MLAsrConstants.STATE_LISTENING -> "Listening..."
                MLAsrConstants.STATE_NO_NETWORK -> "No network"
                MLAsrConstants.STATE_NO_SOUND -> "No sound"
                MLAsrConstants.STATE_NO_SOUND_TIMES_EXCEED -> "Silence"
                MLAsrConstants.STATE_NO_UNDERSTAND -> "Not recognized"
                else -> "Else"
            }
            Log.d(TAG, "onState: $message")
        }
    }

    companion object {
        const val TAG = "MainActivity"
    }


    // TTS

    private fun initTTS() {
        textToSpeechHelper = TextToSpeechHelper { ttsPlayingStatus ->
            when (ttsPlayingStatus) {
                TTSPlayingStatus.PLAYING -> {

                }

                TTSPlayingStatus.PAUSED -> {

                }

                TTSPlayingStatus.NOT_STARTED -> {

                }
            }
        }
    }

    override fun onDestroy() {
        textToSpeechHelper?.destroyTTS()
        super.onDestroy()
    }

}

Start ASR With Speech Pickup UI

Set authentication information for your app. For details, please refer to Notes on Using Cloud Authentication Information.
Create Intent for recognition settings by referring to the supported languages listed in LANGUAGE
Create a speech pickup UI activity and pass Intent created in step 2 for speech pickup to send the result to the current activity. Speech not longer than 60s can be recognized in real time.
Override the onActivityResult method to process the result returned by ASR

First Screen If you click TTS button app says hello word

After clicked ASR button you can speak later whatever you said will see instead of hello world

Conclusion

In this article, we learn how to implement TTS and ASR together. We did a project where we used both. When we did this project, we saw how ASR and TTS work. I hope it helps.

References

Huawei Developer Official Website_Innovation Starts Here

The Huawei Global App Innovation Contest ("Apps UP" for short) is a global event held by Huawei that pools the skills…

developer.huawei.com