My first failure in real world Deep Learning

Our first try to implement deep learning in kaggle was a mess.

I want to break down the failure and try to find what went wrong, and i invite you to join the ride.

the data set was the right whale competition — 4,538 pictures of 447 different whales. The pictures were taken over the course of 10 years and hundreds of helicopter trips, which means that the pictures varies in quality,light conditions, angle of the whale, visibility of the whale’s heads and their callosity patterns and more. The head and the callosity pattern are the crucial parts to identify the whale

The callosity is not so visible
The callosity is clearly visible

I’ll try to explain why this is a bad starting point — just by looking at the average pictures per whale (10.1) we can see that we have few samples to learn each whale. But the average doesn't show the hole picture -24 whales have only 1 (!) picture. The data set isn’t balanced at all. Adding insult to injury, as we can see in those two pictures, most of the picture is just noise (the sea). In the winning team Blog Post they did manual annotation of the whales head to train the model, other teams trained NN model to extract the head position. So we took from Anil Thomas the head positions, cropped the images and started to work. Due to the data imbalance we decided to use Stratified Kfold validation, and added data augmentation to have more samples. the results were mediocre at best : validation log loss of 4.5 (the winning team got 0.5. After consulting our course instructor @Nathaniel Shimoni we decided to predict only 10 whales — the whales with the most pictures. And we got roughly the same results .. we managed to get a bit improvement with changing the cropped image resolution, but it wasn't significant.

What went wrong ? Honestly I don’t know yet. We might improve the data pre-processing with image normalization.

If you have any suggestions or insights I would like to hear them.
The code is in this git repository