DeepSnakes — Part II

Hermes Ribeiro Sant' Anna
The Artificial Neuron
6 min readJun 17, 2018

--

· Can an AI distinguish venomous from non-venomous snakes?

· We improve upon logistic regression by using shallow neural networks

· After hyperparameter tuning, we increase the dev error to about 71%

How to build deep learning architectures to tackle computer vison problems? The Deep Snakes series will cover the pipeline of using deep learning techniques to solve this one problem:

Can an artificial intelligence tell a venomous snake from a non-venomous snake by only seeing pictures of serpents?

This is a report on the experiences and experiments of using increasingly complex NN models, starting with the simplest logistic regression up to the latest deep learning architectures.

In the previous post DeepSnakes — Part I, we described the construction and training of a logistic regression (LR) model to distinguish between pythons and rattlesnakes. In order to increase the model performance and achieve better prediction accuracy, we increase the model complexity by adding a hidden layer between the input and output layers of LR.

You can find the code behind this task on GitHub.

Architecture

The single hidden layer neural network model was very popular in the 90’s and brought along several technological and industrial progress. Its overall architecture contains an input layer, a hidden layer and an output layer. As the picture below demonstrates, the information flows through the network in a feed forward path. Moreover, each neuron in a given layer is connected to all neurons in the previous layer. Therefore, the long name for this type of architecture is feed-forward fully-connected single hidden layer artificial neural network. Since modern Neural Networks are deeper, meaning, they have many more hidden layers, we will call the here present architecture Shallow Neural Networks, or Shallow NN for shortness.

The computation inside each hidden neuron follows the same structure as the output layer computation on LR. First, one multiplies each pixel value by an individual weight, sums the results and adds a bias to calculate Z (affine transformation). Then, one applies an activation function to Z and compute the neuron’s activation. The output computation does the same computation, having the previous layer activation values as inputs to compute the probability of the picture displaying a python or a rattlesnake. Therefore, this problem has one set of parameters to transform the input pixels into hidden layer activation values and another set of parameters to transform the hidden layer activation values into output values giving a probability of each class. We used ReLU activation in the hidden neurons and sigmoid activation in the output neuron.

We carried out the training part using stochastic gradient descent over N epochs using a minibatch size of four. In other words, we trained the model showing it four images at a time over a number N of epochs, where each epoch comprises showing all images to the model.

In addition, we also included a regularization term to mitigate overfitting issues. Without going into further details, regularization keeps the model from specializing in the training set, and reduces the gap between train and dev set error (avoidable bias). In other words, we tried to develop a model with better generalization capacity.

Results

We ran our model with an initial set of hyperparameters displayed in the table below.

In the picture below, although the learning curves are a little oscillatory, the training was smoother than the previous logistic regression model (LR). Compared to LR, the train error was error was higher by about 0.1, while the dev error was slightly lower (around 0.01). However, this model renders a higher dev set accuracy. About 70% of the images in the dev set were correctly classified, instead of the previous 65%. This shows that the Shallow NN model was able to learn distinguishing features in the snake images. We can validate these results over another set of unused snake images, and we will cover validation on a later post.

We later conducted a hyperparameter tuning procedure. We selected 20 sets of random values for number of neurons, learning rate and regularization constant. We proceeded to train 20 different networks on them to determine which set of hyperparameters could improve the model even further. By the end of this procedure, the best set was: 420 neurons in the hidden layer, learning rate of 1.5 e-5 and regularization constant of 2.0. This model provided a dev error around 0.59 and a dev set accuracy of 71%. A drawback in this architecture is the total amount of parameters. There were 12,288 weights and 1 bias in LR, while there are 5,161,380 weights in the 421 biases in the Shallow NN. In other words, adding 420 neurons in the hidden layer increased the amount of parameters about 420 times. By induction, it is clear that adding more fully connected hidden units exponentially increases the amount of parameters in the network. Fortunately, as we shall see in the next post, Deep Learning schemes like convolutional neural networks not only increases the model’s predictive capacity, but also does it with an exponentially smaller amount of parameters (compared to fully connected deep learning).

Error analysis

So, where do we go from here? There are many possibilities to improve our model such as; gathering more data, using artificial data augmentation, increasing the resolution on the images or diving straight into Deep Learning. In order to support our decision, we used a valuable technique called error analysis. Although not fancy, it is very useful to draw some samples from the dev set and try to understand why each image is misclassified. Since we are working with a very small dataset, we are able to diagnose all images without wasting too much time. Here are the possible reasons for misclassification in the dev set.

  • 58% — Nothing — There seems to be nothing wrong with the image.
  • 25% — Low-res — Too pixelated given the image complexity.
  • 17% — Snake head — Pictures without or with only a small snake head.
  • 11% — Unclear — It is difficult for a human to tell which snake is it.

Since there can be more than one problem with each image, the percentages add up to over 100%. For most mislabeled images, there seems to be no visible qualitative problem. Low-resolution images came up as the second most prevalent cause for errors. This narrows our possible strategies to increasing the model complexity or increasing the image resolution. Since low-res is only the second most probable cause for mislabeling by a large margin, we will stick to improving the model before improving the dataset. Therefore, in the next post, we will report the improvements of using deep learning for this task. (Spoiler: huge improvement!)

Conclusion

In this second iteration, aiming to distinguish between python snakes from rattlesnakes, we were able to achieve a 5% higher dev set accuracy, compared to the logistic regression model. This information, combined with the fact that most mislabeled images presented a good health, points us towards adding more hidden layers as well as using deep learning schemes in order to increase the model’s predictive power.

--

--