Vision Transformers (ViT) for Self-Supervised Representation Learning (Part 1)

Ching (Chingis)
Deem.blogs
Published in
6 min readApr 11, 2022

--

Here I am summarizing the recent works done on Vision Transformers (ViT) in the Self-Supervised (SSL) and Unsupervised Learning field to keep you updated. Vision transformers are becoming very popular these days. They are being used in many fields, including object detection, segmentation and representation learning. Therefore, I think it is important to know what has been going on recently. Therefore, I am summarizing some of the works I found on the Internet. However, I cannot include all the details, such as experiments, because I am trying to include multiple related works together. I hope you find it useful.

I am mentioning some other work in SSL, such as MoCo or BYOL. If you are not familiar with these works, I am sharing my previous article on Self-Supervised Learning. Please take your time to read the following work since it is highly related to what is going on here.

Related Article

Emerging Properties in Self-Supervised Vision Transformers

DINO is proposed by Caron et al.

--

--

Ching (Chingis)
Deem.blogs

I am a passionate student. I enjoy studying and sharing my knowledge. Follow me/Connect with me and join my journey.