Deep learning for Embedded Linux Series: Part 1 — Model Optimization

Published in

YonoHub

5 min readApr 25, 2020

Deep Learning is another type of machine learning algorithms that is based on artificial neural networks. It’s has been around since the 1980s but in the recent decade, it has achieved impressive results in different fields.
Deep Learning models were able to achieve higher accuracy than the other Machine learning approaches. But this comes at the price, as it requires large amounts of training data and computation power.

Why Deep Learning at the edge?

That’s why its hard to deploy every deep learning model on resources limited hardware like embedded systems and mobile devices.
Many applications use cloud infrastructure for deep learning inference but this is not always possible especially in poor internet connections area and limited bandwidth. and it comes also with some disadvantages like latency, privacy, and power consumption.
For these reasons, it’s highly demanding running deep learning models at the edge (on the mobile devices or the embedded systems) instead of the cloud.

TensorFlow Lite

TensorFlow Lite is a set of tools to help developers run TensorFlow models on mobile, embedded, and IoT devices. It enables on-device machine learning inference with low latency and a small binary size.

The TF Lite consists of two main components.

The TensorFlow Lite interpreter: It runs the optimized TF models on different types of hardware like embedded systems and mobile devices. Click here for all the supported platforms.
The TensorFlow Lite converter: A great tool to convert TF models and generates a TensorFlow Lite FlatBuffer file (.tflite). The converter supports SavedModel directories, tf.keras models, and concrete functions.

TF Lite Workflow https://www.tensorflow.org/lite/convert/index

Tensorflow offers a great collection of pretrained TFLite models ranging from image classification to question and answer. This can be found here. It’s always a good idea to explore the ready models before building your own.

Model optimization

Embedded systems and mobile devices have limited hardware resources like memory, storage, and computation power. TFLite offers a great tool for model optimization to reduce the model size and decrease the inference time by reducing the model complexity and making use of some hardware accelerators.

Smaller models mean less storage usage on user’s devices, smaller app size and download size, and less memory usage.
Latency reduction, on the other hand, reduces user’s device power consumption and processing time for inference.

But everything comes with a price. Models optimization can reduce the model accuracy by a small amount. So it’s important to test the model after optimization and making sure the accuracy lost is not so significant.

Optimization Techniques

TFLite supports two types of optimization techniques. Optimization via quantization and pruning.

Quantization

TF models use Float32 for model parameters. Quantization works by reducing the precision of model parameters from float32 to lower precision float16 or int8.

Post-training quantization

Post-training quantization quantizes the model parameters after training. TFLite offers several options for post-training quantization.

OPTIMIZE_FOR_SIZE: This option quantizes only the weights of the model and uses hybrid operations to carry out the computations. Hybrid operations perform certain operations in lower precession to increase speed and other operations in floating-point precession to reduce the loss of accuracy. This option is not suitable for all hardware especially hardware with no floating points operations support like Edge TPUs.
DEFAULT: This option quantizes the weights and activation outputs to 8 bits signed integers. makes it suitable to run on integer only hardware accelerators. but this technique can suffer from a great amount of accuracy loss. this can be fixed by quantization aware training.

This decision tree can help determine which post-training quantization method is best for your use case:

https://www.tensorflow.org/lite/performance/post_training_quantization

YonoArc TFLite Converter Block

Building a TFLite Converter block is necessary for our series and it’s an important part of our goal pipeline. This block takes a TF model and converts it to TFLite model with some optimizations techniques applied. You can get it from YonoStore for free.

Let’s first explain each property and the expected values.

Model Path: If we are converting from Tensorflow Saved model or Keras Saved model here we specify where the saved_model.pb is located.
Model Type: Choose the source model type.
Optimization: Which type of post-training optimization technique to apply on the TF lite model.
TFHub Model Path: This block supports converting models from TFhub. here we specify the model URL. Example https://tfhub.dev/google/imagenet/mobilenet_v1_100_224/feature_vector/4
TFHub Model’s Input Shape: Incase of converting from a TFHub model. please specify the model input shape.
TFLITE_BUILTINS / TFLITE_BUILTINS / TFLITE_BUILTINS : Select TensorFlow operators to use in TensorFlow Lite check here for more details.
Save To: Where to save the converted TF Lite model. e.x : MyDrive/tf_model.tflite.
Convert To Concrete Function?: Save models as graphs with a single signature.
Export C Array: Generate a C source file that contains the TensorFlow Lite model as a char array.

And Finally Pressing on Convert Now ! will convert the model for you. The block also accepts a triggering message to convert the model automatically when a message is received.

This concludes our first part of Deep learning for Embedded Linux Series. See you in part 2 where we are going to talk about TFLite C++ API and how to use the converted model through C++.

About YonoHub:

Yonohub is a web-based cloud system for development, evaluation, integration, and deployment of complex systems, including Artificial Intelligence, Autonomous Driving, and Robotics. Yonohub features a drag-and-drop tool to build complex systems, a marketplace to share and monetize blocks, a builder for custom development environments, and much more. YonoHub can be deployed on-premises and on-cloud.

Get $25 free credits when you sign up now. For researchers and labs, contact us to learn more about Yonohub sponsorship options. Yonohub: A Cloud Collaboration Platform for Autonomous Vehicles, Robotics, and AI Development. www.yonohub.com

If you liked this article, please consider following us on Twitter at @yonohub, email us directly or find us on LinkedIn. I’d love to hear from you if I can help you or your team with how to use YonoHub.

References:

[1] https://www.tensorflow.org/lite/performance/model_optimization

[2] https://blog.tensorflow.org/2020/04/how-tensorflow-lite-helps-you-from-prototype-to-product.html