Part II-Tensorflow model training of a lego bricks image classifier using mobilenetv2

On the previous post, we got the dataset ready to be used for a deep learning algorithm to create an image classifier. Now we are in the process to choose the framework and the deep learning method to be used.

For the framework, we need it to be capable of run on limited resources devices and have support for GPU. This is cause we will run the inference model on a Jetson nano.

About the method, as we are using an image application we will use a Convolutional Neural Network (CNN), also in this case, we need a CNN the can give good accuracy without robust hardware.

TensorFlow

As the project is to be deployed in a Jetson nano we had to consider some aspects of the framework to be used.

TensorFlow is an end-to-end open-source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.

TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google’s Machine Intelligence Research organization to conduct machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

TensorFlow provides stable Python and C++ APIs, as well as non-guaranteed backward-compatible API for other languages.

Also, TensorFlow has support for GPU a very important feature as we the jetson nano have a GPU and we want to give better performance of the Deep Learning application using the processor.

GPU operation can be done through Nvidia drivers, Cuda Toolkit and on latest version also we ca use TensorRT which is is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference.

MobileNetV2

From a Deep learning benchmark done for the jetson nano, we got that the better performance is given for a MobileNetV2.

Performance of various deep learning inference networks with Jetson Nano and TensorRT, using FP16 precision and batch size 1

MobileNetV2 is a Convolutional Neural Network(CNN) developed by Google with the intention to create high performance and accuracy Neural Network for vision applications to be used on devices with limited resources as mobiles.

MobileNetV2 is an improvement of MobileNetV1, using depthwise separable convolution. Also, V2 introduces two new features to the architecture: linear bottlenecks between the layers and shortcut connections between the bottlenecks. The basic structure is shown below.

MobileNetV2 basic structure

Depthwise convolutions are a variation of normal convolution. In the regular 2D convolution it is performed over multiple input channels, the filter is as deep as the input and lets us freely mix channels to generate each element in the output. On the other hand, Depthwise convolutions each channel is kept separate.

For depthwise separable convolution after the depthwise convolution is done, and an additional step is performed: a 1x1 convolution across channels. This step can be repeated multiple times for different output channels. The output channels all take the output of the depthwise step and mix it up with different 1x1 convolutions.

Deepwise separable Convolution

Residual blocks connect the beginning and end of a convolutional block with a skip connection. By adding these two states the network has the opportunity of accessing earlier activations that weren’t modified in the convolutional block. This approach turned out to be essential in order to build networks of great depth.

The following figure gives the Inverted residual block. The diagonally hatched texture indicates layers that do not contain non-linearities. It provides a natural separation between the input/output domains of the building blocks (bottleneck layers), and the layer transformation — that is a non-linear function that converts the input to the output. The former can be seen as the capacity of the network at each layer, whereas the latter as the expressiveness.

The reason we use non-linear activation functions in neural networks is that multiple matrix multiplications cannot be reduced to a single numerical operation. It allows us to build neural networks that have multiple layers. At the same time the activation function ReLU, which is commonly used in neural networks, discards values that are smaller than 0. This loss of information can be tackled by increasing the number of channels in order to increase the capacity of the network.

With inverted residual blocks, we do the opposite and squeeze the layers where the skip connections are linked. This hurts the performance of the network. The authors introduced the idea of a linear bottleneck where the last convolution of a residual block has a linear output before it’s added to the initial activations. Putting this into code is super simple as we simply discard the last activation function of the convolutional block.

The structure of the MobileNetV2 is shown in the next image where each line describes a sequence of 1 or more identical (modulo stride) layers, repeated n times. All layers in the same sequence have the same number c of output channels. The first layer of each sequence has a stride s and all others use stride 1. All spatial convolutions use 3 × 3 kernels.

MobilenetV2 structure

Retrain a mobilenetV2 for a Lego Bricks image classifier

One of the constrain we had to implement the model was time and hardware resources. As we tried to implement the project in a short time (~2 weeks) and we didn’t count with a PC with hardware for high-performance operations we persued the option of doing an implementation based on transfer learning for the MobilenetV2.

Modern image recognition models have millions of parameters. Training them from scratch requires a lot of labeled training data and a lot of computing power (hundreds of GPU-hours or more). Transfer learning is a technique that shortcuts much of this by taking a piece of a model that has already been trained on a related task and reusing it in a new model. In this tutorial, we will reuse the feature extraction capabilities from powerful image classifiers trained on ImageNet and simply train a new classification layer on top

Though it’s not as good as training the full model, this is surprisingly effective for many applications, works with moderate amounts of training data (thousands, not millions of labeled images), and can be run in as little as thirty minutes on a laptop without a GPU.

For the implementation of the retrain on MobilenetV2 we used TensorFlow Hub. TensorFlow Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a TensorFlow graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Transfer learning can:

  • Train a model with a smaller dataset,
  • Improve generalization, and
  • Speed up training.

For installing TensorFlow hub we just need to run below command.

$ pip install tensorflow-hub

Retraining a MobilenetV2 for lego bricks

For retraining the net we used this script on Github the will take the images and will do the process to create the frozen graph and label file for the inference model.

Using this file we run the below command.

python retrain.py \
--image_dir ~/lego_bricks \
--tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/3 \
--random_crop=5 \
--random_brightness=5 \
--random_scale=5 \
--flip_left_right \
--summaries_dir=mobilenetv2 \
--learning_rate=0.02 \
--how_many_training_steps=4000 \

Here we are choosing the type of neural network by the switch tfhub_module. The other switches were used to put “noise” in the images using crop, brightness, random scale and flipping. We also adjusted the learning rate to 0.02 increasing the default value of 0.1. We run the process doing 4000 steps.

After the process we got the below results for the trained model.

Accuracy for the trained model
loss variation for the trained model.

Once we got this model we proceed to do the implementation on the jetson nano. This will be covert on part III.

References

Geitgey, A. (2017). Simple Door Camera with Face Recognition in Python 3.6 · GitHub. Retrieved August 23, 2019, from https://gist.github.com/ageitgey/84943a12dd0d9f54e90f824b94e4c2a9

Geitgey, A. (2019). Build a Hardware-based Face Recognition System for $150 with the Nvidia Jetson Nano and Python. Retrieved August 25, 2019, from https://medium.com/@ageitgey/build-a-hardware-based-face-recognition-system-for-150-with-the-nvidia-jetson-nano-and-python-a25cb8c891fd

Nvidia Accelerated Computing. (2019). Deep Learning Frameworks Documentation. Retrieved August 25, 2019, from https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#using-frozengraph

He, Q. (2018). Optimize frozen tensorflow graph using TensorRT · GitHub. Retrieved August 25, 2019, from https://gist.github.com/qinyao-he/28ddedb7f561bb3cb4ba880833f14a89

Alex Punnen. (2019). Optimizing any TensorFlow model using TensorFlow Transform Tools and using TensorRT. Retrieved August 25, 2019, from https://towardsdatascience.com/optimizing-any-tensorflow-model-using-tensorflow-transform-tools-and-using-tensorrt-1cc190cafe1f

Mc.ai. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks — mc.ai. Retrieved August 25, 2019, from https://mc.ai/mobilenetv2-inverted-residuals-and-linear-bottlenecks/

Developer, N. (n.d.). Jetson Nano: Deep Learning Inference Benchmarks | NVIDIA Developer. Retrieved August 25, 2019, from https://developer.nvidia.com/embedded/jetson-nano-dl-inference-benchmarks

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Retrieved from http://arxiv.org/abs/1801.04381

Bailey, P. (2019). Feature: TF Hub compatibility with TF 2.0. · Issue #25362 · tensorflow/tensorflow · GitHub. Retrieved August 25, 2019, from https://github.com/tensorflow/tensorflow/issues/25362

Tensorflow. (2019). How to Retrain an Image Classifier for New Categories | TensorFlow Hub. Retrieved August 25, 2019, from https://www.tensorflow.org/hub/tutorials/image_retraining

--

--