Beware of the Lions: Build a Simple Image Classifier using Google’s Colab — Part 2

Alessandro Scoccia Pappagallo
Unkempt Thoughts
Published in
4 min readAug 5, 2018

You can find the first part of this two-part post here.

Okay, we left with a decent model that can predict with 82.5% whether something is a real lion or not.

Now, one of the issues (out of many) with our previous model is that we chose our hyper-parameters pretty much at random, including the optimizer. This time I wanted to be more thorough in my approach to big predators identification (anybody’s favorite branch of ML), so I decided to use Hyperopt to find optimal hyper-parameters for the model.

Have you ever noticed how amazing sklearn’s documentation is? Like, every class and parameter is explained in painstaking detail to the point where most of the times your questions may be answered before you even think about them? Well, forget about it because Hyperopt’s documentation is not like that. In fact, the documentation is fundamentally this one-pager here.

Anyway, the first thing to do is to define the hyper-parameter space:

A few things to note:

  • Every element of the space that maps to a hyperopt.hp function would need to have both a name and a label. Names are used to call for a specific element in the code (see the gist below), while labels are used at the end of training to retrieve the various hyper-parameters. In other words, every label maps to an hyper-parameter, while the same is not necessary true for names (see space[‘optimizer’][‘type’]). Labels need to be unique across the whole space.
  • Hyperopt offers many different functions to select hyper-parameters. As you probably imagined, hyperopt.hp.uniform draws values from a uniform distribution while hyperopt.hp.choice draws values from a user-provided list.
  • Some hyper-parameters are instantiated in certain cases but not others; for example space[‘optimizer’][‘momentum’] is only used when SGD is used (as Adam doesn’t have momentum).

We can then use the various space-placeholders in the main code this way:

We can then pass the hyper-parameter space and our newly created function to build models to Hyperopt’s main function, fmin:

If you run this code on Colab you may face some resource-related errors. Normally, restarting the runtime a few times solves it. If for any reason that doesn’t work, you can always disable GPU acceleration. The training will last longer but you won’t have any resource-related issues, although the runtime will stop after 10+ hours. If you realize you need more resources then it means you’ll likely need to use one of the many ML in the cloud services available (find comparisons here and here).

At the end of various trials, the best model got an accuracy on validation of 82.5%, not better than what we had in the first experiment. In total, only three models got ~80% or better, which seems to indicate we had been extremely lucky on our first experiment, especially as it seems that SGD performs much better than Adam (the optimizer we used before). The trials were not fruitless anyway, as we collected plenty of information circa the best performing parameters.

Accuracy for the various hyper-parameters.

We can also plot the total number of parameters versus accuracy to get a feel for whether more complex models tend to perform better:

Accuracy doesn’t seem to increase for more complex models.

A few notes about these plots:

  • In the first plot we can see what values tend to perform better. It’s important however to note that these charts are only reliable under the (weak) assumption hyper-parameters are completely independent. That said, one thing appears clear: SGD performs better than Adam.
  • In the second chart we’re plotting the number of parameters vs. accuracy, however there are many ways a model can be “complicated” and the total number of parameters is only one way to measure that.

Now that we have a better understanding for the hyper-parameters that may work best, we can train one last model. This time we are going to be using both test and validation tests for training (as we tuned the hyper-parameters already).

After 300 epochs, the final accuracy on test is 87.5%, which is a 5pp improvement compared to our initial model.

Let’s have a look at some of classified pictures:

The text specifies the outcome of the model, the color whether the prediction was correct or not.

Well, that cat on the left would have confused me too while the lion on the right may have been missed because of the snow (neural networks are not very interpretable so in many one can only guess). Luckily for us, of the 5 pictures misclassified by the model in the test set, only one was actually a lion (the one above).

The two Colab notebooks with the code can be found here and here.

That’s it for today! (:

Appendix

What could have we done to make the model better?

  • More data!
  • Have Hyperopt to search for an optimum _BATCH_SIZE value (see [1], p. 73, and [2] on why that could be so important). In general, we could have searched for a larger hyper-parameter space.
  • In the last experiment we made some additional changes (including the usage of keras.callbacks.ReduceLROnPlateau), which we could have considered for experiment #1 and #2.
  • More data. Like really, I can’t stress this enough.
  • We could have experimented with different network architectures.
  • It goes without saying that had we used one of the many pre-trained CNN we would have had much better results (but not nearly enough fun!).
  • A lot more data.

Bibliography

[1] https://arxiv.org/pdf/1707.09725.pdf#page=73

[2] https://arxiv.org/abs/1609.04836

--

--

Alessandro Scoccia Pappagallo
Unkempt Thoughts

T&S Manager @ Google | ML Enthusiast | People think that the human brain is in the head. Nothing of the sort; it is carried by the wind from the Caspian Sea.