Audio Recognition in Android with OpenVINO
This article shows Natural Language Processing example using Intel OpenVINO on Android Phone with ARM CPU. An application is able to record audio samples from microphone and predict one of the commands: {Yes, No, Up, Down, Left, Right, On, Off, Stop, Go}.
OpenVINO toolkit initially designed for Computer Vision deep learning networks. However it is mature enough to process efficiently any dimensional data such as audio, text, 3D data or time series.
OpenVINO for Android
Official distribution of OpenVINO provides Windows, Linux, Mac and Raspbian (Raspberry Pi OS) installers. So this article is based on my own build for ARM64 Android (arm64-v8a ABI). Make sure that your device also has 64bit CPU. Otherwise you can repeat the build steps for 32bit version (armeabi-v7a ABI).
After creating an Empty Project in Android Studio, add the following block to build.gradle
:
And the following line into the dependencies:
Audio Recognition model
For this demo we use a model trained for keywords spotting. Due OpenVINO uses own representation, you need download necessary ov_model.xml
and ov_model.bin
from HuggingFace Hub.
Copy both files to app/src/main/assets
folder (create if does not exist).
Application code
Application consists of two major parts: audio recording and deep learning inference. Audio recording can be easily done with helps of Android public tutorials:
For inference, we need a 1 second of raw audio stream in floating point values.
If you are already familiar with OpenVINO in C++ or Python you can understand Java API with no extra effort:
- Initialize OpenVINO
- Create OpenVINO Core object. This step is a bit different from non-Android applications because of resources management. We need create a temporal copy of
plugins.xml
config file to make it visible for native libraries:
- Read and compile model (same as in C++). Also, we need create temporal
ov_model.xml
andov_model.bin
similar to theplugins.xml
- Create an instance named as inference request, put the input data to Tensor, run inference and get the prediction:
That’s it! Check complete code on GitHub. Contributions welcome.