ML in Android -1 : Introduction to TensorflowLite

In this article we’ll go through the concepts of Tensorflow Lite’s converter , interpreter , delegates and optimization techniques.

Ayush Raj
The STEM
3 min readNov 1, 2021

--

ML in Android Part-1 (Made on online editor)

Prerequisite for AI in mobile

  • Light weight — If your model is very accurate but is very large in size , say 100 Mb, then you won’t be able to use a batch of such models on mobile easily. So, size of model should be small.
  • Low Latency — Able to process data with minimal delay.
  • Privacy & Offline — They go together. Since your data will not leave your device it ensures a great degree of privacy
  • Power Efficient — Mobile phone have a limited battery and cannot match with a supercomputer or even a desktop. So model should not consume a lot of power while running.
  • Pretrained Model — Convert your pre tranined Tensor Flow models from multiple formats into TensorFlow Lite.

Components in Tensorflow Lite

There are 2 major components in TensorFlow Lite Converter & Interpreter.

Difference between the two.( drawn on diagram software)

How TensorflowLite boosts performance?

Due to the device’s limited processing and power, running inference on compute-intensive machine learning models on mobile devices is resource-intensive.
To eliminate overhead and enable real-time applications, inference on these devices must be conducted very quickly. For this reason, tensorflow lite can use Hardware acceleration libraries or APIs for compatible devices.

  • Using Android’s neural network API to improve inference on Android devices is a software solution.
  • The employment of Edge TPUs, which are specifically designed to operate on deep learning models, can increase inference. They’re high-performing and have a tiny, low-power footprint.
  • Another option is to use Tensorflow lite delegate acceleration, which allows you to send your graph execution to hardware that is optimised for inference. Tensorflowlite uses GPU delegates for this.

TensorFlow Lite Delegates

Delegates facilitate TensorFlow Lite model hardware acceleration by utilising on-device accelerators namely the GPU and Digital Signal Processor (DSP).

TensorFlow Lite employs CPU kernels tuned for the ARM Neon instruction set by default. The CPU, is a multi-purpose processor that isn’t always tuned for the intensive arithmetic seen in Machine Learning models.

Choosing a Delegate

Based on two basic criteria: the Platform (Android or iOS?) you’re targeting, and the Model-type (floating-point or quantized?), there will usually be many delegates appropriate to your use-case.

  • Cross Platform => GPU Delegate ( 32-bit and 16-bit float based models on Android and IOS)
  • Android => (1) NNAPI delegate (API 27+ and GPU/DSP on device) (2)Hexagon delegate (Qualcomm Hexagon DSP on device)
  • IOS => (1) Core ML (32-bit or 16-bit floating-point models on newer iphones and ipads)
  • Delegates by model type (in image)
Delegates by model type( Made on google sheets)

Optimization Techniques

Mobile, embedded devices and edge devices have limited resources, so Machine learning models that are deployed must have the right model size, latency, and power consumption. there are many ways -

  • Quantization : Reduces the precision of the values used to express a model’s parameters (weights and biases ).The model size and calculation times are reduced. Parameters are 32-bit floating point integers by default.
Quantization methods (Table made on google sheet)
Choosing best optimization techniques
  • Weight pruning : Reduces the overall number of parameters
  • Model topology : Convert the model topology to get a more efficient model to begin with.

In next part we’ll see how to save model, convert them and optimize them. Coding begins in next part.

References

  1. https://www.tensorflow.org/lite/performance/gpu

--

--