Audio Recognition in Android with OpenVINO

2 min readSep 14, 2022

This article shows Natural Language Processing example using Intel OpenVINO on Android Phone with ARM CPU. An application is able to record audio samples from microphone and predict one of the commands: {Yes, No, Up, Down, Left, Right, On, Off, Stop, Go}.

OpenVINO toolkit initially designed for Computer Vision deep learning networks. However it is mature enough to process efficiently any dimensional data such as audio, text, 3D data or time series.

OpenVINO for Android

Official distribution of OpenVINO provides Windows, Linux, Mac and Raspbian (Raspberry Pi OS) installers. So this article is based on my own build for ARM64 Android (arm64-v8a ABI). Make sure that your device also has 64bit CPU. Otherwise you can repeat the build steps for 32bit version (armeabi-v7a ABI).

After creating an Empty Project in Android Studio, add the following block to build.gradle:

And the following line into the dependencies:

Audio Recognition model

For this demo we use a model trained for keywords spotting. Due OpenVINO uses own representation, you need download necessary ov_model.xmland ov_model.binfrom HuggingFace Hub.

Copy both files to app/src/main/assetsfolder (create if does not exist).

Application code

Application consists of two major parts: audio recording and deep learning inference. Audio recording can be easily done with helps of Android public tutorials:

For inference, we need a 1 second of raw audio stream in floating point values.

If you are already familiar with OpenVINO in C++ or Python you can understand Java API with no extra effort:

Initialize OpenVINO

Create OpenVINO Core object. This step is a bit different from non-Android applications because of resources management. We need create a temporal copy of plugins.xml config file to make it visible for native libraries:

Read and compile model (same as in C++). Also, we need create temporal ov_model.xmland ov_model.bin similar to the plugins.xml

Create an instance named as inference request, put the input data to Tensor, run inference and get the prediction:

That’s it! Check complete code on GitHub. Contributions welcome.

How the demo works on Redmi Note 9S

Audio Recognition in Android with OpenVINO

OpenVINO for Android

Audio Recognition model

Application code

Useful links

Written by Dmitry Kurtaev