What is Tensorflow?
TensorFlow is a platform for building and training neural networks, which allow detecting and deciphering patterns and correlations, analogous to the learning and reasoning used by humans.
TensorFlow’s flexible architecture enables developers to deploy computation to one or more CPUs or GPUs on desktops, servers, or mobile devices with a single API. It was originally developed by researchers and engineers working in the Google Brain Team, within the Machine Intelligence research department, for the purpose of conducting machine learning and deep neural network research.
TensorFlow Lite is the official framework for running TensorFlow model inference on edge devices. It runs on more than 4 billion active devices globally, on various platforms, including Android, iOS, and Linux-based IoT devices, and on bare metal microcontrollers.
What are the advantages of the “Lite” version?
TensorFlow Lite is designed to be lightweight, with a small binary size and fast initialization. It is also compatible with a variety of platforms, including Android and iOS. And to enhance the mobile experience, it is optimized for mobile devices with improved load times and hardware acceleration.
How does it work?
1- Choose a model: The TensorFlow Lite team provides a set of pre-trained models that solve a variety of machine learning problems. These models have been converted to work with TensorFlow Lite and are ready to use in your applications.
2- Convert the model: TensorFlow Lite is designed to run models efficiently on mobile and other embedded devices with limited memory and compute resources. Part of this efficiency comes from using a special format to store models. TensorFlow models must be converted to this format before TensorFlow Lite can use them.
3- Run inference with the model: Inference is the process of running data through a model to obtain recognitions. It requires a model, an interpreter, and input data.
4- Optimize your model: TensorFlow Lite provides tools to optimize the size and performance of your models, often with minimal impact on accuracy. Optimized models may require slightly more complex training, conversion, or integration.
If you want to know more about the capabilities of TensorFlow lite and start implementing your own models, I recommend you visit this post.
MobileNet models perform image classification — they take images as input and classify the major object in the image into a set of predefined classes. These models are also very efficient in terms of speed and size and hence are ideal for embedded and mobile applications.
About mobilenet efficiency
The MobileNet architecture is based on factoring the traditional convolutions into 2 types of layers, a first convolutional layer “depthwise” and a convolutional 1x1 layer “pointwise”. This division allows to reduce the computational cost and the size of the model.
The standard convolutional layers have a computational cost between 8 and 9 times greater than the computational cost of both the depthwise and poinwise layers. In addition, some parameters are added that allow reducing the size and speed of the neural network.
They are trained on ImageNet dataset which contains images from 1000 classes detailed in this gist.
For more information you can visit this paper.
Mobilenet in flutter for real-time image recognition
In this project I am going to implement the Mobilenet model using the tflite library, a Flutter plugin for accessing TensorFlow Lite API.
In your pubspec.yml, add:
Android: In android/app/build.gradle, add the following setting in android block.
iOS: Solutions to build errors on iOS
- ‘vector’ file not found:
Open ios/Runner.xcworkspace in Xcode, click Runner > Tagets > Runner > Build Settings, search Compile Sources As, change the value to Objective-C++
- ‘tensorflow/lite/kernels/register.h’ file not found:
The plugin assumes the tensorflow header files are located in path
However, for early versions of tensorflow the header path is
CONTRIB_PATHto toggle the path. Uncomment
//#define CONTRIB_PATHfrom here.
We will import the model into our application by downloading the files labels.txt and mobilenet_v1_1.0_224_quant.tflite from here, then we create the assets folder and place the label and model file in it.
Once Step 1 is completed, in pubspec.yaml add:
Then we will run
flutter pub get in the terminal.
Make a simple app that uses the camera. You can use the flutter documentation as a guide in the following link.
We create a service for tensor flow that will be used to manage the model.
When the application is initialized we will load the model so that it can start predicting, with the following code:
We use the camera controller to capture each frame, then we use them to generate a recognition and voilá! The model gives us the result of the recognition in each frame.
1- The result is a list ordered by confidence with the following structure:
2- When implementing it, I decided to leave a 1 second delay between each recognition to improve performance.
Then UI improvements can be made to display the data in a more nice-looking way 😄.
You can visit this github repository to see the full implementation.
MobileNet SSD: A.k.a. “Single Shot MultiBox Detector” model is used to detect multiple objects in the same image, assigning a confidence to each one.
Tiny YOLOv2: It has a different approach. We apply a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.
The Tiny-YOLO architecture is approximately 442% faster than it’s larger big brothers, achieving upwards of 244 FPS on a single GPU.
The small model size (<50MB) and fast inference speed make the Tiny-YOLO object detector naturally suited for embedded computer vision / deep learning devices.
Pix2Pix: These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations.
Deeplab: DeepLab is a state-of-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat and so on) to every pixel in the input image.