Image Classification Models on Arduino Nano 33 BLE Sense
--
In this blog post, you will learn the basics of deep learning for image classification. Afterwards, you will learn how to deploy your custom models on an embedded device. This article will use an Arduino Nano 33 BLE Sense, but you may have a separate supported device.
In my first blog post, I covered the basics of 2D convolution, which is quite useful for extracting features from images. Let’s quickly revisit that concept.
We can apply padding to extend the borders.
We need to flip the kernel, otherwise, it would be the correlation. This comment on ResearchGate explains the difference between the two of them.
The basic difference between Correlation and convolution is:
Correlation is a measurement of the similarity between two signals/sequences.
Convolution is a measurement of the effect of one signal on the other signal.
The mathematical calculation of correlation is the same as convolution in time domain, except that the signal is not reversed, before the multiplication process. If the filter is symmetric then the output of both the expression would be the same.
Convolutional layers are quite powerful to learn local patterns.
A Quick Introduction to Convolutional Neural Networks
We use Keras API to build and train our models. Let's take a look at a Keras Sequential model.
The sequential models act like stacks. For example, we can’t connect a layer to another that exists two layers below itself. This structure can also create bottlenecks and vanishing gradient problems, and the real-world models mostly act like a graph structure. We widely use functional model structure to build models. Since we must use small-scale models for TinyML projects, we don’t need to use complicated structures like ResNet. So we can use sequential models for simplicity.
As you can see, the 2D convolution layers are defined with Conv2D
class. For more information, you can take a look at the official Keras document.
The first parameter is the number of filters that determine the number of kernels that will convolve with the input volume. Each of these operations produces a 2D activation map.
The second parameter is the kernel_size
. This parameter defines the size of the kernel matrix. The smaller kernel can “learn” smaller objects or patterns on the image. 3x3 and 5x5 kernels are widely used.
The MaxPooling2D
layer downsamples feature maps. If you don’t use it, the final feature map will be too large, and the model will be unconducive to learn spatial hierarchy of features.
Dense
layers are also called fully connected layers. Dense layers work with flattened inputs. They are located at the end of CNN architectures and can be used to optimize goals like class scores. They don’t involve the locations of the objects on the images. So, Dense layers are not quite useful for tasks that require the object locations.
Building and Training Image Classification Models with TensorFlow
Now that you know some of the fundamentals of CNNs, I would like to explain my first GSoC project: A image classification project for Dermnet dataset. The original dataset consists of 23 classes. Since we have limited SRAM (256 KB)and eFlash (1 MB), 23 units for the last Dense layer could increase the size of the model. Thus, we reduced the number of classes to 4. Those are:
1- Acne and Rosacea
2- Eczema
3- Nail Fungus and other Nail Disease
4- Tinea Ringworm Candidiasis and other Fungal Infections
Also, the original dataset is not balanced. The number of samples (images) on each class (folder) must be the same. To ensure that, we use data augmentation techniques. We can use a Keras layer to apply data augmentation. However, since it’s not a supported layer for TensorFlow Lite Micro, we don’t prefer to define and use one. Here is an example script for manual image augmentation.
After you get a balanced dataset, we are ready to train a classifier. One thing to note here is that our dataset isn’t very large, which may cause overfitting. You can address this by using a K-fold approach.
We can use the image_dataset_from_directory() function from Keras to load the dataset into memory with specific features, such as batch size and image size.
Next it’s time to build the model. Since the Arduino Nano doesn’t include very much memory and could potentially run off of a battery, you will need to reduce memory footprint and power consumption. That limits the capabilities of the model since you can’t use many layers, filters and units, leading to the possibility of underfitting.
In my previous blog post, you can find basic information about depthwise convolutions. It’s basically a drop-in replacement for 2D convolutional layer. It helps to reduce model size and computations. “Separable convolution assumes that feature channels are largely independent. This assumption does not work for RGB images! Red, green, and blue color channels are actually highly correlated in natural images.” [1] So, we use a regular Conv2D
layer as the first layer of the model.
The rescaling layer is another unsupported layer when working with microcontrollers. That said, it is necessary for model training as it rescales all pixels to [0, 1] range.
Here are the loss and accuracy graphs after 50 epochs.
Figure 4 shows that our model overfits by the 49th epoch.
Saving the Trained Model for TensorFlow Lite Micro
There are a few more necessary steps for model optimization after training the model. Before handling these techniques, let’s reduce the model size by half.
1- Weight Quantization
The weights that are “learned” by the model are stored as float32 values, however you can store these values as int8. This is accomplished through quantization, which helps us to reduce the model size. 8-bit quantization approximates floating point values using the following formula:
2- Pruning
Weight pruning involves removing some connections between layers so you can reduce both memory footprint and number of computations.
You can find more about weight pruning here.
Saving TensorFlow Lite Models as C Headers
Since we use C++ with Arduino, we need to convert our models to C arrays. It might still be difficult for someone with no-to-little experience.
Deploying Our Model to Arduino
Instead of making a new project from scratch, I’m going to modify the Alarm project which comes with Harvard_TinyMLx examples.
First, I’m going to change the model array file with the one which I trained and converted. You can find the .cpp file here.
Since we have 4 classes, we need to modify the model_settings.cpp
If we take a look at the model_settings.h file, we can see that the kNumChannels equals to 1. Since we have 3 channels (red, green, blue), we must change it to 3. And, kCategoryCount must be 4.
After making modifications on the files, we are ready to deploy our project to the Arduino board. You can find the whole project and Arduino files here.
Further Readings:
References
[1] François Chollet, Deep Learning with Python, Second Edition, Page: 260