Text to Speech & Language Detection | HMS ML Kit

Mustafa Sürücü
Huawei Developers
Published in
4 min readOct 19, 2020

Machine learning has a vital role for many mobile applications that affect our daily lives. While different variety of use cases in Machine Learning direct the technology and future, it seems that the effects of this tremendous technology on human life will continue to increase every passing day.

HMS ML Kit offers many features that will make great contributions to your mobile applications in terms of content with its easy to use structure. One of the most common features that pioneered Machine Learning era and offerred by HMS is “Text to Speech”. In this article, I will explain the development process of Language Detection and Text to Speech features of HMS ML Kit. Our main aim will be detecting the language of any document that we have picked from our devices’ file system and converting the content to human voice.

If you have any questions about how you can integrate HMS Core into your project, please take a look at below post before beginning.

Note: Please do not forget to activate ML Kit in AppGalery Connect (Project Setting → Manage API) and to add build dependencies to app level build.gradle file.

Note: Please do not forget to add below permissions into AndroidManifest.xml file.

Note: Itext and ApachePoi libraries have been used to read content of .pdf and .docx documents for this project. You can download poi jar files and add into libs directory by using below link.

You can use the activity layout that shared below as reference.

In our project structure, there will be 3 classes that correspond interface, presenter and activity respectively.

  1. View interface methods will be overridden in TTS.kt (Activity) class. Our process will start by selecting a document from the file system. When user clicks on document icon on the action bar, openFile() method will be triggerred and intent will be opened for file picker. After selection, URI of the selected file will be obtained in onActivityResult() and readDocument() method will be called by passing the URI as paremeter.
  2. Firstly, path of the URI will be checked to determine the real path of the file. After initial check, fullpath will be generated by adding the local or external storage path according to device file system.
  3. Extension of the file will be checked and document will be read thanks to pdf and docx readers.

After these steps, user is able to see content on TextView and click on play button to convert it to speech. After on-click, the source text will be sent to giveText() method over presenter object that created in TTS class.

In Presenter class, pesenter methods have been defined that mentioned in interface class. These are giveText(), detectLanguage() and init() methods.

In giveText() method, detectLanguage() will be called to idetify language of selected document after source text is separated into sentences with an iterator. Each sentence will be added into an ArrayList which is called “sentences”.

Note: The language detection service can detect the language of text. Both single-language text and multi-language text are supported. ML Kit detects languages in text and returns the language codes and their respective confidences or the language code with the highest confidence. Currently, 52 languages can be detected on the cloud and 51 languages can be detected on the device.

Note: Text to speech (TTS) can convert text information into audio output. The timbres are available for 6 languages (Chinese, English, French, Spanish, German and Italian).

To identify the language, a language detector object has been generated by using Factory and Settings classes. This object has 2 methods: firstBestDetect and probabilityDetect. In our case, firstBestDetect is used to identify code of the language with highest confidence.

To convert our content to speech we need to create a TTS engine. The first step is to create a configuration object to specify engine settins. mlConfigs object has been initialized from MLTtsConfig class inside detectLanguage() method by setting the speed and volume along with the language and speaker according to detected language of the document.

After configurations are set, init() method is called where we will provide configurations to generate mlTtsEngine. We should also specify a TTS callback for our engine. It consists of callback methods to manage the engine:

  • onError(): Processing logic for TTS failure.
  • onWarn(): Alarm handling without affecting service logic.
  • onRangeStart(): Returns the mapping between the currently played segment and text.
  • onEvent(): Callback method of an audio synthesis of events.

Note: You can manage the events like Start, Stop, Pause, Resume etc. by fetching the eventId from MLTtsConstants class.

Now, as our engine is ready, we can convert each sentence into human speech respectively.

MLTtsEngine.QUEUE.APPENDED has been used to generate a queue and each sentence is converted to speech one by one. When the first sentence is completed, the next one will be handled. For other modes please check the links that shared at the end of the post.

Here is a short video to observe how our app works.

In this article, we have discovered the Language Detection and Text to Speech features of HMS ML Kit. I hope this post would be a good reference for your ideas that you want to put into practice.

Thank you for reading !

References

--

--