I Trained a Neural Network on Pet Adoption Photos, and It Was A Little Sad But Mostly Cute. (Pt 1/2)

geoff golder
Mar 22 · 6 min read
actually least likely of all pets. could be dogs or cats.

This article has some code and graphs but is not highly technical.
This is the first of a two part series exploring what it’s like to train a neural network in 2019 as someone who didn’t believe they’d ever train one in 2018.

One of Kaggle’s machine learning competitions is trying to predict the amount of time it will take for a given pet to be adopted. It’s sponsored by PetFinder.my

They give you a ton of useful data, like how old the animal is, whether it’s a cat or dog, it’s location, color, etc etc.

Sample of adoption photos

There are also adoption photos for many of the pets, and included in the data are a bunch of features extracted from google’s vision API for each of the images.

there is a json file like this for every image.

Our own Neural Network

I was curious if we could predict adoption speed from the photos themselves. That is, without knowing anything about the age of the animal, where it is, how much adoption fees are, etc, can we predict how long it will take an animal to be adopted simply from the image associated with the animal?

Intuitively, this seems like an unsolvable problem, and it definitely is silly to try and solve without all the additional information, but the goal was to extract the features from this network and combine it with the previous best use of a model we could figure out how to work with.

For those curious, the work for part 1 (and some of part 2) is contained in this python jupyter notebook: https://github.com/gdoteof/neuralnet_stuff/blob/master/adoption_pictures_neural_nets.ipynb

So what did it find?

40 minutes of training on a gpu and kappa score fluctuating means we aren’t improving at all in a way that matters for the contest, and barely improving in the most general sense.

This network is pre-trained using ImageNet, and knows how to differentiate between cats and dogs (and many other things), but the subtleties of knowing that a dog is, or a cat is: super primed for getting adopted, or, unlikely to ever be picked up; is nothing close to what ImageNet was created for.

So, we allow the network to learn the deeper parts of its structure, hoping it will specialize in these types of images. In the process, ruining its ability to understand all the ImageNet categories.

This is much better! Our error rate is going down ever so slightly, but our kappa score is improving meaningfully, meaning the errors our model was making are at least getting smaller. However at the end it looks like it is flattening out and the improvement in Kappa score is only ~3%

Our validation set is still way below our training, which means we have plenty of room to juice our model before we risk it over-fitting.

Research breakthrough of unprecedented relevancy.

Perhaps the learning rate is too high for the earlier parts of the network, so we throttle it quite a lot for the most primitive parts of the network, while still allowing the later parts to learn at the same rate.

This was an hour and a half of full usage of a google colab GPU instance.

If you’ve been following along, you know that this is great. Look how steep that blue line is. Our kappa score skyrockets, actual error rate drops significantly and we are starting to overfit, finally.

Our network definitely has learned something. It’s now getting the correct answer 36% of the time, a %10+ increase, and the kappa score (which is the Kaggle contest’s evaluation metric) has gone up %30+ as well.

Best of all, there is no sign of it stopping.

I was honestly surprised by this. Pretty quickly (in part 2 I’ll go over this) I was able to get a score almost as good as using the tabular data only. Which, should be much, much more rich.

preview for part 2

So what did our network learn?

0 — Pet was adopted on the same day as it was listed.
1 — Pet was adopted between 1 and 7 days (1st week) after being listed.
2 — Pet was adopted between 8 and 30 days (1st month) after being listed.
3 — Pet was adopted between 31 and 90 days (2nd & 3rd month) after being listed.
4 — No adoption after 100 days of being listed. (There are no pets in this dataset that waited between 90 and 100 days).

Taking a look at the confusion matrix, we can see our network guessed only a single photo was class 0. It was also correct.

Reading the matrix is pretty straight forward: each cell of the table represents “number of times the network guesses PREDICTED and the answer was ACTUAL”. For example, there were 494 times that the network predicted an image was class1 but the image was actually class2.

Meaning, of all the images the network saw (over 10k) it only thought one of them was of an animal that would be adopted the same day it was listed.

of all the images the network saw (over 10k) it only thought one of them was of an animal that would be adopted the same day it was listed.

I can’t even tell you how I excited I was when I found this. So, without further adieu:

The photos most likely to be adopted the same day (class 0):

This is the one. The network predicted this one was adopted the same day it was listed, and it was correct.
This one had the second highest probability of being adopted the same day (but was more likely to be adopted within first week)
Same as previous, but third.

These are the photos of animals least likely to be adopted within 100 days (class 4):

And this is the saddest thing the network knows about. The least likely animal, purely by looking at it, to be adopted within the 100 day limit.
The second saddest

And finally, the most likely to be adopted within 2–4 weeks (class 2):

Part 2

Additionally, we’ll talk about some of the implications for anyone who deals with pet adoptions and their photos.