Author: David Reiss, Software Engineer, AI Mobile Platform, Facebook
Today, we are announcing a prototype feature in PyTorch: support for Android’s Neural Networks API (NNAPI). PyTorch Mobile aims to combine a best-in-class experience for ML developers with high-performance execution on all mobile hardware. The support for NNAPI is essential to meeting that goal since it expands the set of hardware that we can use to quickly execute models. This initial release includes fully-functional support for a small but powerful set of features and operators, and we will be expanding support in the coming months.
NNAPI allows Android apps to run computationally intensive neural networks on the most powerful and efficient parts of the chips that power mobile phones, including GPUs (Graphics Processing Units) and NPUs (specialized Neural Processing Units). It was introduced in Android 8 (Oreo) and significantly expanded in Android 10 and 11 to support a richer set of AI models. While NNAPI provides a conveniently unified interface to various hardware drivers, it is a low-level API that requires significant integration work. Higher-level frameworks like PyTorch make these benefits accessible to more application developers.
NNAPI doesn’t have a native on-disk model format, so we have chosen to encapsulate the model definition within an ordinary TorchScript model. Developers will prepare their model for execution on NNAPI after training. Then the saved model can be packaged in an Android app (or delivered over the network), then loaded and run using PyTorch Mobile’s Java API or the libtorch C++ API. For applications already using PyTorch Mobile, no code changes are required. Developers can simply replace their CPU model with an NNAPI model.
Accessing and utilizing Android’s NNAPI for PyTorch developers is especially appealing for always-on, real-time models, such as on-device computer vision. These models tend to be compute-intensive, latency-sensitive, and power-hungry. This trifecta of requirements is challenging to concurrently satisfy and makes these models great candidates to leverage hardware acceleration. This is one of the reasons that Facebook is interested in NNAPI. The AI model that enables the virtual background experience on our Portal devices is now being tested using NNAPI within the Messenger application to enable the immersive 360 backgrounds feature.
As seen in Table 1 below, utilizing Android’s NNAPI for this model on a Pixel 3 enables performance between single-core CPU and 2-core CPU, with the benefit of freeing up the CPU for non-ML application code. Additionally, on newer devices like the Pixel 4 and Pixel 5, NNAPI enables higher levels of performance, which could be harnessed by a more complex ML model.
Similarly, many real-time audio models are also being introduced in mobile applications. With similar concurrency, latency, and power requirements, features such as background noise reduction would also benefit from NNAPI-based hardware acceleration.
Converting machine learning models between frameworks or APIs is always tricky, and this was no exception. While PyTorch and NNAPI were both developed to run the same types of neural networks, lots of small differences in semantics need to be bridged when converting from one to the other. For example:
- NNAPI uses an integer bias for quantized convolution operations, while PyTorch uses floating point.
- PyTorch and NNAPI expect different memory ordering for weight tensors in convolutions.
- PyTorch previously had a complicated internal representation of upsampling operations, which had to be simplified for easier conversion to NNAPI.
- PyTorch and NNAPI have different representations of NHWC tensors. NNAPI only supports contiguous tensors, so an explicit NHWC representation is required. PyTorch supports strided tensors, so the convention is to always use NCHW semantics, but optionally combine with channels-last memory format to get NHWC behavior under the hood.
This first prototype release of NNAPI support in PyTorch enables well-known linear convolutional and MLP models, when deployed on Android 10 devices and above. Upcoming releases of PyTorch will add additional features, such as:
- Support for additional operators to unblock additional model types.
- Support for accelerated models based on the Mask R-CNN architecture. Learn more about Mask R-CNN on mobile at https://research.fb.com/blog/2018/01/enabling-full-body-ar-with-mask-r-cnn2go/.
- Support for the earlier Android versions, 8 (Oreo) and 9 (Pie).
- Support for models that utilize Control Flow semantics.
- Models that can utilize NNAPI when available on a user’s Android phone, but automatically fall back to execution on the CPU otherwise.
Using the same phones reflected in Table 1, Table 2 showcases that benchmarking the open-source MobileNetV2 model results in similar CPU-offload and even more significant performance benefits as seen in the 360 Background-Enabling model.
For more information on how to use PyTorch and Android NNAPI in your app, and reproducing the same MobileNet v2 based benchmark, please see our tutorial.
We’d like to thank the Android NNAPI team, without whom this work could not have been possible. Throughout our partnership, they have demonstrated a strong commitment to bringing accelerated neural networks to millions of Android users. We appreciate the work they have done to help make it possible for PyTorch to provide a path to NNAPI for our developers. We’d specifically like to thank Jean-Luc Brouillet, Miao Wang, Xusong Wang, Andy Dyer-Smith, and Oli Gaymond.