HMS ML Kiti Text To Speech ve Automatic Speech Recognize Özelliğinin Birlikte Kullanımı

Published in

Huawei Developers - Türkiye

7 min readApr 18, 2023

HMS ML Kit Automatic Speech Recognize Ve Text To Speech

Giriş

Merhaba Sevgili Kotlin severler! Yeni makaleme hoş geldiniz. Bugün Hms kiti ASR ve TTS’yi projenize nasıl uygulayacağınızı açıklayacağım . Umarım beğenirsin

ASR Ve TTS Uygulamanıza Nasıl Entegre Edilir?

Bu kısım ikisi içinde ortaktır.

· AppGallery Connect’te bir geliştirici hesabı olarak kaydolmanız gerekir.

· Bir uygulama oluşturmalı ve ML Kit’i AppGallery Connect’ten etkinleştirmelisiniz.

· Proje oluşturma işlemini bitirdiğinizde, AppGallery Connect’ten yapılandırmalar için agconnect-services.json dosyasını almanız gerekir. Ardından, onu uygulama klasörü altındaki uygulama proje düzeyimize eklemelisiniz.

· Bundan sonra, proje seviyesindeki gradle dosyalarına bağımlılıkları eklememiz gerekiyor.

buildscript {
    repositories {
        google()
        jcenter()
        maven {url 'https://developer.huawei.com/repo/'}
    }
    dependencies {
        classpath "com.android.tools.build:gradle:4.0.0"
        classpath 'com.huawei.agconnect:agcp:1.3.1.300'

        // NOTE: Do not place your application dependencies here; they belong
        // in the individual module build.gradle files
    }
}

allprojects {
    repositories {
        google()
        jcenter()
        maven { url 'https://developer.huawei.com/repo/' }
    }
}

· Ardından, uygulama seviyesindeki gradle dosyalarına bağımlılıklar eklememiz gerekiyor.

...
apply plugin: 'com.huawei.agconnect'

android {
...
}

dependencies { 
    ...
    // Import the base SDK.
      
         // ML Kit
    implementation 'com.huawei.hms:ml-computer-voice-tts:3.7.0.303'
    implementation 'com.huawei.hms:ml-computer-voice-asr-plugin:3.7.0.301'
}

· API anahtarını ayarlayalım

 MLApplication.initialize(this)
 MLApplication.getInstance().apiKey = API_KEY

Burada ml kit için api key ayarlıyoruz.

· Layout dosyası oluşturalım

<?xml version="1.0" encoding="utf-8"?>
<androidx.constraintlayout.widget.ConstraintLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".MainActivity">

    <TextView
        android:id="@+id/outputTextView"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="Hello World!"
        android:gravity="center"
        android:textSize="32sp"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

    <Button
        android:id="@+id/asrButton"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="ASR"
        app:layout_constraintBottom_toBottomOf="parent"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toBottomOf="@+id/outputTextView" />

    <Button
        android:id="@+id/ttsButton"
        android:layout_width="wrap_content"
        android:layout_height="wrap_content"
        android:text="TTS"
        app:layout_constraintBottom_toTopOf="@+id/outputTextView"
        app:layout_constraintEnd_toEndOf="parent"
        app:layout_constraintStart_toStartOf="parent"
        app:layout_constraintTop_toTopOf="parent" />

</androidx.constraintlayout.widget.ConstraintLayout>

Burada ASR ve TTS için 2 button bulunmakta ortasında ise ASR ile aldıgımız veriyi okumak icin bir textview mevcut . TTS ise textviewu okumak için bulunuyor.

· Şimdi TTS’yi (Helper Class) entegre etmeye başlayalım mı?

class TextToSpeechHelper(
    private val isPlayingCallback: (TTSPlayingStatus) -> Unit
) {

    private var mlTtsConfig: MLTtsConfig? = null
    private var mlTtsEngine: MLTtsEngine? = null

    private var ttsPlayingStatus = TTSPlayingStatus.NOT_STARTED

    /**
     * ML Kit TTS has a 500 character limit. We split to reduce the size using punctuations.
     */
    fun startTTS(text: String) {
        mlTtsEngine?.let {

            when (ttsPlayingStatus) {
                TTSPlayingStatus.NOT_STARTED -> {
                    it.stop()
                    val sentences = text.split("\n|\\.(?!\\d)|(?<!\\d)\\.").toTypedArray()
                    for (sentence in sentences)
                        if (sentence.length < 500) {
                            it.speak(sentence, MLTtsEngine.QUEUE_APPEND)
                        }
                }

                TTSPlayingStatus.PLAYING -> {
                    pauseTTS()
                }
                TTSPlayingStatus.PAUSED -> {
                    resumeTTS()
                }
            }
        }
    }

    fun pauseTTS() {
        if (ttsPlayingStatus == TTSPlayingStatus.PLAYING) {
            mlTtsEngine?.pause()
        }
    }

    fun resumeTTS() {
        if (ttsPlayingStatus == TTSPlayingStatus.PAUSED) {
            mlTtsEngine?.resume()
        }
    }

    fun stopTTS() {
        mlTtsEngine?.stop()
    }

    fun destroyTTS() {
        mlTtsEngine?.shutdown()
        mlTtsEngine = null
    }

    init {
        isPlayingCallback.invoke(TTSPlayingStatus.NOT_STARTED)
        mlTtsConfig = MLTtsConfig().apply {
            language = MLTtsConstants.TTS_EN_US
            person = MLTtsConstants.TTS_SPEAKER_FEMALE_EN
            speed = 1.0f
            volume = 1.0f
        }

        mlTtsEngine = MLTtsEngine(mlTtsConfig)

        val callback: MLTtsCallback = object : MLTtsCallback {
            override fun onError(taskId: String?, error: MLTtsError?) {
                ttsPlayingStatus = TTSPlayingStatus.NOT_STARTED
            }

            override fun onWarn(taskId: String?, warn: MLTtsWarn?) {

            }

            override fun onRangeStart(taskId: String?, start: Int, end: Int) {

            }

            override fun onAudioAvailable(
                p0: String?,
                p1: MLTtsAudioFragment?,
                p2: Int,
                p3: Pair<Int, Int>?,
                p4: Bundle?
            ) {

            }

            override fun onEvent(taskId: String?, eventId: Int, bundle: Bundle?) {

                when (eventId) {
                    MLTtsConstants.EVENT_PLAY_START -> {
                        ttsPlayingStatus = TTSPlayingStatus.PLAYING
                        isPlayingCallback.invoke(ttsPlayingStatus)
                    }
                    MLTtsConstants.EVENT_PLAY_STOP -> {
                        ttsPlayingStatus = TTSPlayingStatus.NOT_STARTED
                        isPlayingCallback.invoke(ttsPlayingStatus)
                    }
                    MLTtsConstants.EVENT_PLAY_RESUME -> {
                        ttsPlayingStatus = TTSPlayingStatus.PLAYING
                        isPlayingCallback.invoke(ttsPlayingStatus)
                    }
                    MLTtsConstants.EVENT_PLAY_PAUSE -> {
                        ttsPlayingStatus = TTSPlayingStatus.PAUSED
                        isPlayingCallback.invoke(ttsPlayingStatus)
                    }

                }
            }

        }

        mlTtsEngine?.setTtsCallback(callback)

    }

}

enum class TTSPlayingStatus { NOT_STARTED, PLAYING, PAUSED }

Burada bir TTS motoru oluşturuyoruz. Ses sentezi sonucunu işlemek için bir TTS geri arama sınıfı oluşturduktan sonra MLTtsConfig’i kullanarak bir TTS altyapısı oluşturabilirsiniz. Ses sentezini gerçekleştirmek için TTS geri aramasını 1. adımda oluşturulan TTS altyapısına iletin. Oynatmayı kontrol edin (yalnızca dahili oynatıcı kullanıldığında geçerlidir). Geçerli TTS görevini durdurun, kuyruktaki tüm görevleri (oynatılan ve oynatılmayan görevler dahil) silin ve TTS motorunu koruyun.TTS motorunu yok edin ve ilgili kaynakları yayınlayın.

· Şimdi TTS’yi başlatma zamanı

private fun initTTS() {
        textToSpeechHelper = TextToSpeechHelper { ttsPlayingStatus ->
            when (ttsPlayingStatus) {
                TTSPlayingStatus.PLAYING -> {

                }

                TTSPlayingStatus.PAUSED -> {

                }

                TTSPlayingStatus.NOT_STARTED -> {

                }
            }
        }
    }
  override fun onDestroy() {
        textToSpeechHelper?.destroyTTS()
        super.onDestroy()
    }

· TTS’yi entegre etmeye başlayalım mı ?

when {
                ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.RECORD_AUDIO
                ) == PackageManager.PERMISSION_GRANTED -> {
                    startASRWithSpeechPickupUI()
                }
                else -> {
                    requestPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
                }
      }

· ASR’yi MainActivity’ye entegre edin, bundan sonra ASR Düğmesine veya TTSButton’a tıklayabilirsiniz.

class MainActivity : AppCompatActivity() {
    lateinit var asrButton: Button
    lateinit var ttsButton: Button
    lateinit var outputTextView: TextView
    private var textToSpeechHelper: TextToSpeechHelper? = null
    val REQUEST_CODE_ASR: Int = 100
    var textGlobal = ""
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)
        MLApplication.initialize(this)
        MLApplication.getInstance().apiKey = BuildConfig.API_KEY
        asrButton = findViewById(R.id.asrButton)
        ttsButton = findViewById(R.id.ttsButton)
        outputTextView = findViewById(R.id.outputTextView)
        initTTS()
        textGlobal = outputTextView.text.toString()
        asrButton.setOnClickListener {
            when {
                ContextCompat.checkSelfPermission(
                    this,
                    Manifest.permission.RECORD_AUDIO
                ) == PackageManager.PERMISSION_GRANTED -> {
                    startASRWithSpeechPickupUI()
                }
                else -> {
                    requestPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
                }
            }
        }

        ttsButton.setOnClickListener {
            textToSpeechHelper?.startTTS(textGlobal)
        }
    }

    val requestPermissionLauncher =
        registerForActivityResult(ActivityResultContracts.RequestPermission()) { isGranted ->
            if (isGranted) {
                startASRWithSpeechPickupUI()
            } else {
                Toast.makeText(this, "Permission Error", Toast.LENGTH_SHORT).show()
            }
        }

    fun startASRWithSpeechPickupUI() {
        // Use Intent for recognition settings.
        val intent = Intent(this, MLAsrCaptureActivity::class.java)
            .putExtra(MLAsrCaptureConstants.LANGUAGE, "en-US")
            // Set whether to display the recognition result on the speech pickup UI. MLAsrCaptureConstants.FEATURE_ALLINONE: no; MLAsrCaptureConstants.FEATURE_WORDFLUX: yes.
            .putExtra(MLAsrCaptureConstants.FEATURE, MLAsrCaptureConstants.FEATURE_WORDFLUX)

        // REQUEST_CODE_ASR: request code between the current activity and speech pickup UI activity. You can use this code to obtain the processing result of the speech pickup UI.
        startActivityForResult(intent, REQUEST_CODE_ASR)
    }

    override fun onActivityResult(requestCode: Int, resultCode: Int, data: Intent?) {
        super.onActivityResult(requestCode, resultCode, data)
        var text = ""
        // REQUEST_CODE_ASR: request code between the current activity and speech pickup UI activity defined in step 3.
        if (requestCode == REQUEST_CODE_ASR) {
            when (resultCode) {
                MLAsrCaptureConstants.ASR_SUCCESS -> if (data != null) {
                    val bundle = data.extras
                    // Obtain the text information recognized from speech.
                    if (bundle!!.containsKey(MLAsrCaptureConstants.ASR_RESULT)) {
                        text = bundle.getString(MLAsrCaptureConstants.ASR_RESULT).toString()
                        Log.d(TAG, "onActivityResult: $text")
                        outputTextView.text = text
                        // Process the recognized text information.
                        textGlobal = text
                    }
                }
                MLAsrCaptureConstants.ASR_FAILURE ->                     // Processing logic for recognition failure.
                    if (data != null) {
                        val bundle = data.extras
                        // Check whether a result code is contained.
                        if (bundle!!.containsKey(MLAsrCaptureConstants.ASR_ERROR_CODE)) {
                            val errorCode = bundle.getInt(MLAsrCaptureConstants.ASR_ERROR_CODE)
                            // Perform troubleshooting based on the result code.
                        }
                        // Check whether error information is contained.
                        if (bundle.containsKey(MLAsrCaptureConstants.ASR_ERROR_MESSAGE)) {
                            val errorMsg = bundle.getString(MLAsrCaptureConstants.ASR_ERROR_MESSAGE)
                            // Perform troubleshooting based on the error information.
                        }
                        // Check whether a sub-result code is contained.
                        if (bundle.containsKey(MLAsrCaptureConstants.ASR_SUB_ERROR_CODE)) {
                            val subErrorCode =
                                bundle.getInt(MLAsrCaptureConstants.ASR_SUB_ERROR_CODE)
                            // Process the sub-result code.
                        }
                    }
                else -> {
                }
            }
        }
    }

    // Use the callback to implement the MLAsrListener API and methods in the API.
    internal inner class SpeechRecognitionListener : MLAsrListener {
        override fun onStartListening() {
            // The recorder starts to receive speech.
            Log.d(TAG, "onStartListening: ")
        }

        override fun onStartingOfSpeech() {
            // The user starts to speak, that is, the speech recognizer detects that the user starts to speak.
            Log.d(TAG, "onStartingOfSpeech: ")
        }

        override fun onVoiceDataReceived(data: ByteArray, energy: Float, bundle: Bundle) {
            // Return the original PCM stream and audio power to the user. This API is not running in the main thread, and the return result is processed in the sub-thread.
        }

        override fun onRecognizingResults(partialResults: Bundle) {
            // Receive the recognized text from MLAsrRecognizer. This API is not running in the main thread, and the return result is processed in the sub-thread.
            val text = partialResults.getString(MLAsrRecognizer.RESULTS_RECOGNIZING) ?: ""
            Log.d(TAG, "onRecognizingResults: $text")
        }

        override fun onResults(results: Bundle) {
            // Text data of ASR. This API is not running in the main thread, and the return result is processed in the sub-thread.
            val text = results.getString(MLAsrRecognizer.RESULTS_RECOGNIZED) ?: ""
            Log.d(TAG, "onResults: $text")
            runOnUiThread {
                outputTextView.text = text
            }
        }

        override fun onError(error: Int, errorMessage: String) {
            // Called when an error occurs in recognition. This API is not running in the main thread, and the return result is processed in the sub-thread.
            Log.d(TAG, "onError: ErrorCode:$error, errorMessage:$errorMessage")
        }

        override fun onState(state: Int, params: Bundle) {
            // Notify the app status change. This API is not running in the main thread, and the return result is processed in the sub-thread.
            val message = when (state) {
                MLAsrConstants.STATE_LISTENING -> "Listening..."
                MLAsrConstants.STATE_NO_NETWORK -> "No network"
                MLAsrConstants.STATE_NO_SOUND -> "No sound"
                MLAsrConstants.STATE_NO_SOUND_TIMES_EXCEED -> "Silence"
                MLAsrConstants.STATE_NO_UNDERSTAND -> "Not recognized"
                else -> "Else"
            }
            Log.d(TAG, "onState: $message")
        }
    }

    companion object {
        const val TAG = "MainActivity"
    }

    // TTS

    private fun initTTS() {
        textToSpeechHelper = TextToSpeechHelper { ttsPlayingStatus ->
            when (ttsPlayingStatus) {
                TTSPlayingStatus.PLAYING -> {

                }

                TTSPlayingStatus.PAUSED -> {

                }

                TTSPlayingStatus.NOT_STARTED -> {

                }
            }
        }
    }

    override fun onDestroy() {
        textToSpeechHelper?.destroyTTS()
        super.onDestroy()
    }

}

StartASRWithSpeechPickupUI ile ASR’yi Başlatın

Uygulamanız için kimlik doğrulama bilgilerini ayarlayın. Ayrıntılar için lütfen Bulut Kimlik Doğrulama Bilgilerinin Kullanılmasına ilişkin notlara bakın.
LANGUAGE içinde listelenen desteklenen dillere başvurarak tanıma ayarları için intent oluşturun.
Bir konuşma alma kullanıcı arabirimi etkinliği oluşturun ve sonucu geçerli etkinliğe göndermek için konuşma alma için 2. adımda oluşturulan intenti iletin. 60 saniyeden uzun olmayan konuşmalar gerçek zamanlı olarak tanınabilir.
ASR tarafından döndürülen sonucu işlemek için onActivityResult yöntemini override edin.

İlk Ekran TTS düğmesine tıklarsanız uygulama merhaba diyor

ASR butonuna tıkladıktan sonra daha sonra konuşabilirsiniz merhaba dünya yerine ne dediyseniz onu göreceksiniz.

Sonuç

Bu yazıda,Huawei ML Kit icinde bulunan TTS ve ASR’yi birlikte nasıl code entegrasyonu yapilir birlikte nasil kullanabiliriz bunları anlatan İkisini de kullandığımız demo bir proje yaptık. Umarım yardımcı olmuştur.

Referanslar

Huawei Developer Official Website_Innovation Starts Here

The Huawei Global App Innovation Contest ("Apps UP" for short) is a global event held by Huawei that pools the skills…

developer.huawei.com