This article has some code and graphs but is not highly technical.
This is the first of a two part series exploring what it’s like to train a neural network in 2019 as someone who didn’t believe they’d ever train one in 2018.
One of Kaggle’s machine learning competitions is trying to predict the amount of time it will take for a given pet to be adopted. It’s sponsored by PetFinder.my
They give you a ton of useful data, like how old the animal is, whether it’s a cat or dog, it’s location, color, etc etc.
There are also adoption photos for many of the pets, and included in the data are a bunch of features extracted from google’s vision API for each of the images.
Our own Neural Network
The general strategy for the Kaggle competition seems to be: combine all the tabular data, with the features from the images, the features from the descriptions, and run it through some model.
I was curious if we could predict adoption speed from the photos themselves. That is, without knowing anything about the age of the animal, where it is, how much adoption fees are, etc, can we predict how long it will take an animal to be adopted simply from the image associated with the animal?
Intuitively, this seems like an unsolvable problem, and it definitely is silly to try and solve without all the additional information, but the goal was to extract the features from this network and combine it with the previous best use of a model we could figure out how to work with.
For those curious, the work for part 1 (and some of part 2) is contained in this python jupyter notebook: https://github.com/gdoteof/neuralnet_stuff/blob/master/adoption_pictures_neural_nets.ipynb
So what did it find?
At first we were doing a little bit better than chance, which is kind of what was expected. We are getting the answer right about 1/3rd of the time and getting .2 on a quadratic kappa score. But it’s not really getting any better, at least not quickly or convincingly. (we want the blue line to go down)
This network is pre-trained using ImageNet, and knows how to differentiate between cats and dogs (and many other things), but the subtleties of knowing that a dog is, or a cat is: super primed for getting adopted, or, unlikely to ever be picked up; is nothing close to what ImageNet was created for.
So, we allow the network to learn the deeper parts of its structure, hoping it will specialize in these types of images. In the process, ruining its ability to understand all the ImageNet categories.
This is much better! Our error rate is going down ever so slightly, but our kappa score is improving meaningfully, meaning the errors our model was making are at least getting smaller. However at the end it looks like it is flattening out and the improvement in Kappa score is only ~3%
Our validation set is still way below our training, which means we have plenty of room to juice our model before we risk it over-fitting.
Research breakthrough of unprecedented relevancy.
Perhaps the learning rate is too high for the earlier parts of the network, so we throttle it quite a lot for the most primitive parts of the network, while still allowing the later parts to learn at the same rate.
If you’ve been following along, you know that this is great. Look how steep that blue line is. Our kappa score skyrockets, actual error rate drops significantly and we are starting to overfit, finally.
Our network definitely has learned something. It’s now getting the correct answer 36% of the time, a %10+ increase, and the kappa score (which is the Kaggle contest’s evaluation metric) has gone up %30+ as well.
Best of all, there is no sign of it stopping.
I was honestly surprised by this. Pretty quickly (in part 2 I’ll go over this) I was able to get a score almost as good as using the tabular data only. Which, should be much, much more rich.
So what did our network learn?
We trained the network to infer adoption speed from a photo. Specifically one of these categories:
0 — Pet was adopted on the same day as it was listed.
1 — Pet was adopted between 1 and 7 days (1st week) after being listed.
2 — Pet was adopted between 8 and 30 days (1st month) after being listed.
3 — Pet was adopted between 31 and 90 days (2nd & 3rd month) after being listed.
4 — No adoption after 100 days of being listed. (There are no pets in this dataset that waited between 90 and 100 days).
Taking a look at the confusion matrix, we can see our network guessed only a single photo was class 0. It was also correct.
Meaning, of all the images the network saw (over 10k) it only thought one of them was of an animal that would be adopted the same day it was listed.
of all the images the network saw (over 10k) it only thought one of them was of an animal that would be adopted the same day it was listed.
I can’t even tell you how I excited I was when I found this. So, without further adieu:
The photos most likely to be adopted the same day (class 0):
These are the photos of animals least likely to be adopted within 100 days (class 4):
And finally, the most likely to be adopted within 2–4 weeks (class 2):
We’ll take a look at how we further the network performance, as well as a broader and more detailed view of the way the network is classifying these images.
Additionally, we’ll talk about some of the implications for anyone who deals with pet adoptions and their photos.