👨🏼‍💻Custom UI Audio Player & Text Tracking — Audio Editor Kit & AI Dubbing

Ahmet Yunus Sevim
Huawei Developers
Published in
6 min readApr 19, 2022

Introduction

In this article, we will develop an audio player app with you. The application has two main features. One is to convert the text to an audio file, and the other is to play and control this audio file. We will add an audio play-control feature with an Audio Editor Kit and audio file creation with ai-dubbing. Also while playing audio text will be tracked and highlighted simultaneously. We will implement these features with ML Kit file transcription. File transcription finds texts and their timeline in the audio file.

What we will learn

  1. Transforming text to an audio file with AI-dubbing.
  2. Playing an audio file with Audio Editor Kit custom UI.
  3. Using ML Kit file transcription.
  4. Highlighting text simultaneously while playing.

AI-Dubbing vs Audio Editor Kit

If you are new to these concepts it can be difficult to understand the differences between them and the main purposes of these features. They have similar functionalities but in a nutshell, AIDubbing is a feature of the Audio Editor Kit. The main functionality of ai-dubbing is converting text to an audio file. During this process speed, volume and speaker type(Male, Female) can be set. But these parameters cant be assigned after the audio file is created. Therefore AIDubbing is not suitable for creating players to control audio files even if it can play converted audio. On the other hand, the Audio Editor Kit has a wide range of functionalities to manage audio files. We will use its tracking audio progress, pause/play, and change speed features.

Requirements

  • JDK version: 1.8 or later
  • Android Studio version: 3x or later
  • minSdkVersion: 21 or later
  • Huawei Phone with EMUI 5.0 or later | Non-Huawei phones with Android 5.0 or later

HMS Core Integration

First of all, we need to create a project in android studio and AppGallery Connect. Then enable Audio Editor Kit. If you are new to these processes you can follow these steps:

  1. Creating a Project and App in AppGallery Connect
  2. Creating an Android Studio Project
  3. Generating a Signing Certificate
  4. Generating a Signing Certificate Fingerprint
  5. Configuring the Signing Certificate Fingerprint in AppGallery Connect
  6. Enabling Required Services in AppGallery Connect
  7. Adding Configurations
  8. Configuring the Signing Information for Your Project
  9. Synchronizing the Project

Dependencies and Preparations

We need to add dependencies to the app-level gradle file.

Then required permissions must be added to the AndroidManifest.xml file. In this case, to save the converted audio file and play it from phone storage we need to get storage access permissions. Also, internet connection permissions are required.

Permission Request

The next step is requesting storage permissions from the user.

You can check ManagePermissions class in the sample project I shared to have better use of requesting permission or if you have any problems in this step.

API Key

Also setting the API key is essential. You can get it from agconnect-services.json file.

AI Dubbing — Text to Audio File

We will create a simple page for this section. The page contains speaker type selection and volume adjustment for creating the audio file. You can see the layout code below.

Now we can start to integrate ai-dubbing. The first step is to create the HAEAiDubbingEngine object and assign it config and callback objects to HAEAiDubbingEngine.

Parameters such as speed, volume, speaker type, and language are added in this step to the config object. In this project, we will use only English and its male&female speakers. Also, I will add these parameters as a static value. But ai dubbing engine supports methods that return a list of languages it supports and speaker types for each language. It's highly suggested to use these methods and gather speaker types with these methods. Since speaker type values and language values are constant values and can be changed in the next versions. I’m using speaker type values static as you see above code(18 for male, 19 for female) for example project, please avoid using static values and use the recommended method as followed code:

Saving an audio file to the phone and changing its format processes added to the callback object.

Now we can run AIDubbing to convert text to audio files.

Note: Ai-dubbing converts text to .pcm format. To convert it to the .wav file you can use code in the sample project I shared. Also audio file path need to be processed and reformatted. For these processes and to create .wav file you can use codes in utils folder.

Audio Editor Kit — Player

Let’s add a new page with a simple UI. Play/pause button, slider bar for control audio, text that shows the left time of audio, and another button to change speed.

Layout code and view assignments:

First of all, we need to create the HuaweiAudioEditor object and assign it an audio file path.

Now we will add play/pause functionality.

Now we will add functionality for dragging the slider bar and changing audio progress.

As you see on the code above editor has a timeline structure. A timeline is an object that holds the length of the audio and current time when playing in milliseconds. It is essential when you want to change audio files’ start point or drag the slider bar to another point of the audio timeline and track left time like in this example.

If you want to only show the left time of audio and slider bar without changing the speed option, it is easy. You can only use endTime and currentTime of timeline. Since they are in milliseconds you only need to turn it to the minutes and seconds.

If you want to change the speed of audio with these features, things are getting tricky. Changing speed changes the endTime and duration values of the editor according to the speed value. Let us assume the duration of the file is 32 seconds and you change the speed to 2x. Duration and endTime will changed to 16000 from 32000 milliseconds. This makes it difficult to track left time and audio progress with these variables. Because of that, we need to change the codes a little bit. Instead of tracking left time with currentTime as shown to you before, I used slider bar progress with little change in the code.

Finally changing adding changing speed functionality.

ML Kit File Transcription — Text Tracking

First of all, we need to transcribe the audio file. We are creating an audio file in DubbingFragment so we will run this function on DubbingFragment when the audio file is created.

As you see above we need to declare the AftListener object to catch ML Kit result. We will store them in the list that is reachable from other pages and we will create a data model class to objectify the results.

The last thing is to highlight detected words while audio is playing. We will add this functionality in PlayerFragment and call it on onPlayProgress. Since ML Kit returns each words’ timeline we will use the audio editor kits timeline to detect which word is reading.

So everything is set and this is the project we just created:

Tips & Tricks

  1. Don’t forget to enable Audio Editor Kit from AGConnect, add sha256 to AGConnect and add agconnect-services.json to the app folder of your project.
  2. Use getLanguages and getSpeaker functions of AIDubbingEngine to retrieve their constant values.
  3. If you want to see all features of Audio Editor Kit including ai-dubbing, you can use its UI SDK. This SDK is easy to implement and it launches a page with all features of the Audio Editor Kit.

Conclusion

We develop a simple audio player app and used just a few features of the Audio Editor Kit. You can check the links below to see a wide range of features of the Audio Editor Kit. I’m sharing the project repo in references. If you have any problem you can use it.

Thank you for reading. See you in my other articles…;)

References

--

--