Published in


Learnings from SimCLR: A framework Contrastive Learning for Visual Representations

The framework is simpler AND outperforms everything

Photo by Shahadat Rahman on Unsplash

Machine Learning is a diverse field with lots of different aspects. One of the chief concerns of ML is the learning of visual representations (pictures/diagrams etc corresponding to areas). This has applications in all kinds of problems, ranging from Computer Vision, Object Detection, to more futuristic applications like learning from and developing schematics. Recently, people have started to look into Contrastive Learning as an alternative to supervised and unsupervised learning. It involves teaching the model to learn the general features of a dataset without labels by teaching the model which data points are similar or different. The focus thus shifts to learning about what makes two samples different. I will be getting into more detail about this technique, so be sure to follow my content across the platforms to stay updated.

Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different.

The authors of “A Simple Framework for Contrastive Learning of Visual Representations” decided to look into the use of contrastive learning in learning visual representations. And naturally, I was excited to learn more. Self Supervised learning is a promising field with lots of potential applications. In this article, I will be sharing what I found most interesting about the research and what they discovered. As always an annotated version of the paper will be shared below so that you can take a look and read the paper yourself (along with my comments). If you thought I missed something, or have any other comments, be sure to share them with me. It’s always a pleasure hearing from you guys.

Takeaway 1: Simple Data Augmentation Works

It’s no secret I’m a huge fan of data augmentation. Both my Deepfake Detection research, and a lot of my content , often talk about how useful it is. However accoring to the authors, “While data augmentation has been widely used in both supervised and unsupervised representation learning (Krizhevsky et al., 2012; Hénaff et al., 2019; Bachman et al., 2019), it has not been considered as a systematic way to define the contrastive prediction task.” This paper shows how useful Data Augmentation can be in contrastive learning.

The kinds of Augmentation functions used

The above picture is a list of all the different transformations that could br applied to an original image for data augmentation (in the paper).

To understand the significance of the papers we should understand the two kinds of views (global to local and adjacent). Both involve predicting the image by focusing on a particular area (called receptive field). This can however be quite complex. But here’s the good news. “We show that this complexity can be avoided by performing simple random cropping (with resizing) of target images, which creates a family of predictive tasks subsuming the above mentioned two, as shown in Figure 3.” This “decouples the predictive task from other components such as the neural network architecture”. In simple words, we are able to create a large diversity of possible results by implementing a simple random cropping.

Color distortion improves SimCLR but hurts Supervised

We also find that Contrastive Learning requires much stronger augmentation. Higher augmentation intensity improves the performance of the contrastive learners. We also find that complex augmentation protocols are actually less effective than the simple cropping. “In this context, AutoAugment (Cubuk et al., 2019), a sophisticated augmentation policy found using supervised learning, does not work better than simple cropping+ (stronger) color distortion”. We see the superiority of the simple cropping (with strong augmentation) over complex protocols in the table above.

Takeaway 2: Bigger models with larger batches leads to better performance

While that statement might seem obvious, the Figure 7 tells us an interesting story. We see that contrastive learning starts catching up to supervised models as we increase model size. As the research team pointed out, “Figure 7 shows, perhaps unsurprisingly, that increasing depth and width both improve performance. While similar findings hold for supervised learning (He et al., 2016), we find the gap between supervised models and linear classifiers trained on unsupervised models shrinks as the model size increases, suggesting that unsupervised learning benefits more from bigger models than its supervised counterpart.” Here unsupervised learning refers to unsupervised contrastive learning.

With this we see another interesting trend. We see that contrastive learning models benefit more from large batch sizes/more training epochs.

“With more training steps/epochs, the gaps between different batch sizes decrease or disappear, provided the batches are randomly resampled. In contrast to supervised learning (Goyal et al., 2017), in contrastive learning, larger batch sizes provide more negative examples, facilitating convergence (i.e. taking fewer epochs and steps for a given accuracy). Training longer also provides more negative examples, improving the results.”

Takeaway 3: Contrastive Learning has potential

Above is the performance of SimCLR compared to some baseline learners. Section 6 of the paper goes over various tests in various contexts, comparing the performance of SimCLR to State of the Art models. I will not go over the full section since explaining the tests and various tasks will require a series by itself, but would recommend that you read the paper. If anyone has any doubts feel free to leave them in the comments (or reach out to me) and we can discuss. However, based of the results shared, we can conclude that Contrastive Learning can be a very powerful tool, and should be researched further.

Final Notes

There are other takeaways that you might find interesting in the paper. I chose to focus on these three, since I found them to be the ones that had the most potential to be generalized to more kinds of tasks. The performance of Contrastive Learning validates the author’s statements that it is undervalued. I believe that Contrastive Learning might help us solve problems traditionally difficult for supervised learning (problems where labelling can be very expensive for eg.) I would recommend keeping an eye out for more problems being tackled with Contrastive Learning.


As promised, here is the annotated version of the paper

Reach out to me

If that article got you interested in reaching out to me, then this section is for you. You can reach out to me on any of the platforms, or check out any of my other content. If you’d like to discuss something, text me on LinkedIn, IG, or Twitter. To support me, please use my referral link to Robinhood. It’s completely free, and we both get a free stock. Not using it is literally losing out on free money.

Check out my other articles on Medium. :

My YouTube:

Reach out to me on LinkedIn. Let’s connect:

My Instagram:

My Twitter:

My Substack:

Live conversations at twitch here:

Get a free stock on Robinhood:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store