Machine Learning VS Deep Learning Insect Classifiers

Witenberg S R Souza
deeplearningbrasilia
8 min readNov 19, 2018
Where is my cats and dogs folder?

Classical machine learning and deep learning have fantastic applications. One of these applications is the multiclass classification where the last layer may have more than one node (or neuron) predictor. For instance, the most popular multiclass classifier in machine learning is the MNIST digits classifier whilst for deep learning, there is the must try version of MNIST which is the dogs and cats classifier.

Now, which one is the best to predict other types of object, insects for example? Thus, I took seven classes of insects (actually, it was six because spiders are not insects but arachnids. Sorry my Biologist friends). The seven chosen classes are: beetles, cockroaches, dragonflies, flies, spiders, termites, and thrips.

Lovely seven animals, aren’t they?

The dataset has a total of 1445 images for the training section with an unbalanced distribution, which means that there are some classes with more images than others. The validation set has 600 images which gives a 60 to 40 percent distribution between training set and validation set. All the images where obtained from the InsectImages website, so feel free to check more classes there. In the past, I have tried to apply a multiresolution feature extraction preprocessing and a single vector machine (SVM) classification– the one against all classifier. However, the SVM with multiresolution approach obtained such a poor result of only 24% in the test phase. The thrips were not included in the process and only six classes had been trained.

The release of new libraries for Ai practitioners such as Fastai opened new possibilities for us to train models with unstructured data as this one presented here. Fastai has both machine learning and deep learning libraries built on top of Pytorch and CUDA support. So with all of that capability, I tried to classify the dataset again making a comparison between a classical neural network with no more than 6 layers and the transfer learning approach based on resnet 34.

Preprocessing

Images can be in either gray scale or in some color scheme such as RGB, for instance. When an image is in grayscale, it has only one channel, that is, it is composed by NxM matrix alone. On the other hand, for an image with a color scheme, it has at least 3 channels –red, green and blue for the RGB case– these channels can be combined in a single three dimensions matrix, also called a tensor, of width, height and color channel information. For clarification, one dimension array is a rank-1 tensor, 2-D array or matrix is a rank-2 tensor (our gray scale images, for example), and 3D array or matrix is a rank-3 tensor.

In the machine learning phase, all the images which are rank-3 tensors were serialized. It means they were stretched in single row or rank-1 tensor. Thus, each pixel became a feature feeding the neural net and the folder structure became the labels of the classifier in a one-hot encoding where class names have been turned into numbers from 0 to 6, one for each class in the folder. Besides, the data set was also normalized according to the mean and standard deviation.

Preprocessing in machine learning. Serialization of the data.

The normalization causes a visual distortion to the dataset, but it makes the weights calculation easier as it removes outliers, thus fitting the values into controllable range.

Normalized images

The neural net implementation itself does not take more than the preprocessing step. We set the architecture, define the model data, metrics, and with a single line, start the training phase.

Neural network implementation.

However, the deep learning neural net or convolutional neural net uses a built-in Fastai preprocessing step called transforms. So, for the deep learning approach, the images are fed directly as they are, and some augmentation may be included such as shearing, zooming, and cropping by simply adding optional parameters to the model data through the object tfms_from_model(resnet34, sz, aug_tfms=transforms_side_on, max_zoom=1.1).

Deep learning: Only three lines made all training process.

Results

The machine learning model with input, a linear layer with a Log Softmax function had been able to reach 45% of accuracy in the validation phase. Although it seems to be a poor result, it is close to double the accuracy of my first attempt with SVM, that yielded only 24%. It is important to recall that it is a very simple implementation of a neural net classifying unstructured complex data. Moreover, the Fastai library has many more optimization options to fit the model to the desired target.

ML Neural Net results max 5 epochs only and 47% accuracy.
ML prediction after 64 iterations. The optimizer was SGD.

To confront the results we can compare with the actual classes:

Actual classes which should be predicted.

Also, we can check the confusion matrix to see how the neural net performed overall on the 600 validation images. Most of its correct predictions were to beetles (122/138–89%), while for the others the neural network performed as follows:

Cockroach — 23/72: 32%

Dragonflies — 41/96: 42%

Flies — 16/55: 29%

Spiders — 22/87: 25%

Termites — 61/121: 50%

Thrips — 1/32: 3%

The outcomes showed typical situation of overfitting during the training session as the loss approached values of 8e-9 (a local minima) increasing gradually until 6e-5 yet, it remains very satisfactory based on the short minutes taken on the attempt. Therefore, more fine tuning can be applied to make the model more effective such as embedding matrix in order to expand hidden features we have not explore on our data, but it is just an idea that can be done on a further experiment.

Overall results for the 600 validation images for the machine learning model.

Not surprisingly, the deep learning model achieved a much better result even with such a small dataset such as this one. After tuning the weights for parts of the resnet, it reached 77.7% accuracy. The base model is pretrained resnet 34 architecture with 34 layers. In the first pass, all activations of the network (weights) are kept originally and only the last layer is modified for our purpose. It is possible to be done by adding the parameter precompute = True to the learn object (the neural network on Fastai). After the first pass which has 3 epochs, the tuning phase starts by changing the learning rate from 0.1 to 0.01 so the second pass could be done with 3 epochs. The criteria for that was the previous experience on this type data set the learning rate should decay at least ten times, however, thanks to Fastai tools one does not need to apply gut feeling on the matter. The method learn.lr_find() and its plot with learn.sched_plot() help to define a much more appropriate learning for a target data. The new learning can be tested with small epochs from 1 to 5 to verify if there is a convergence by the decay of training loss and validation loss. The learning rate is chosen so that it is one order of magnitude higher than the one which yields a minimum training loss. If the describe strategy works, we can move one step further and use the method learn.unfreeze() to modify all activations (weights) by training different parts of the neural net with different learning rates. Thus, instead of parsing a single value, we can set an array of learning rates to train the model (lr=np.array([5e-4,1e-3,1e-2])). Finally, the test time augmentation technique learn.TTA() combined with stochastic gradient descent with restarts (cycle_len and cycle_mult)help to improve the model’s performance by adding various types of transformation to the data to optimize the generalization of the model. Although it may sound very technical, each of these steps are generally written on a single line of code which makes it easier to follow the process of tuning step-by-step.

Similarly to the neural net, we can verify he confusion matrix for our deep learning model on the 600 validation images. The results below are much more evenly distributed and accurate for all tested classes compared to the previous attempt :

Beetles — 130/138: 94%

Cockroach — 52/72: 72%

Dragonflies — 88/96: 91%

Flies — 22/55: 40%

Spiders — 22/87: 81%

Termites — 86/121: 71%

Thrips — 17/32: 53%

Final step on the deep learning network . The resnet 34 architecture was used with a transfer learning approach.
Overall results for the 600 validation images for the deep earning model.
Predictions for the deep learning model

These models, therefore presented great results with a very short amount of training and little tuning, thus both of them may be improved. Also all the training, validation and test happened on raw data and although a preprocessing had been applied, the images were fed as they were acquired with no background or foreground segmentation. Finally, the Fastai library is on its second release. In this work, the pre-alpha version has been applied (0.7X). The second version of Fastai (1.0X) has proven to be more effective.

These experiment has been done on top of two notebooks from Fastai courses: dogs vs cats Jupyter Notebook (DL 1 lesson 1) and MNIST (ML1 lesson 4). If you would like to test original files, please try the Fastai courses mentioned above.

Please refer to the python code used on this work below:

GitHub

You can tray Fastai Library locally or online:

Fastai Installation Threads

Explore other insect classes at Insect Images:

Insect Images

Witenberg S R Souza,

MS in Mechatronic Systems at University of Brasilia

https://www.linkedin.com/in/witenberg/

--

--

Witenberg S R Souza
deeplearningbrasilia

Machine Learning practitioner. BS in Electrical Engineering – IESB University Center. MS in Mechatronic Systems –UnB.