Self-Supervised Learning for Image Classification.

Lars Vagnes
Analytics Vidhya
Published in
3 min readAug 15, 2020
Cifar10 Example Data

In this post we explore the benefits of applying self-supervised learning to the image classification problem in computer vision.

If you don’t have a clear idea of what self supervised learning is, see my short introduction to the concept here.

If you just want to access the (dirty) code, you can find the jupyter notebook here.

“Experiment” setup

Procedure

We create an augmented version of the cifar10 dataset with all images randomly rotated 0,90,180 or 270 degrees. Using this rotated dataset we train a ResNet based neural network to classify images into their appropriate levels of rotation. Below are the graphs of our loss and classification accuracy.

Notably, we don’t achieve a very high classification accuracy for this pretext task, which is curious. It seems that the rotation pretext task might be more difficult than our downstream task.

Having created our pre-trained model using self-supervision we now re-purpose it to solve our downstream task of image classification. We do this by replacing the the last layer of the model with a new task specific layer of 10 nodes, as we are now predicting one in 10 image classes, not one in four rotation variants as we did previously.

Then we proceed to train ResNet models with and without the self-supervised feature extractor on different portions of the data, to see how potential performance improvements depend on the amount of labeled data available. Below are the graphs of our loss and classification accuracy vs number of samples of labeled data used.

What we find is that the less labeled data we use the larger the performance gain from self-supervision. When using only 1000 labeled images we see a large performance gain in classification accuracy ~19 %, and a relative gain of 37%, however as we approach 10000 labeled samples, or 1000 labeled samples per class, the performance gain is only about 2 %.

Although the diminishing returns of self-supervision are a bit disappointing, this trend is expected. For the purpose of self-supervision is to learn general features from a large distribution of data that cannot be attained from a small labeled but non-representative subset of said data. However, as that subset gets larger and larger, it becomes more and more representative and thus more conducive to learning general features that boost performance on unseen data.

One final point: We have used a vanilla self-supervision setup, and despite our notable performance gains, far more profound improvements have been found when employing more advanced methods, in particular contrastive learning. If you want to read more about this I suggest looking at this paper.

Thanks!

And here’s the jupyter notebook.

--

--