Is BERT the Future of Image Pretraining? ByteDance Team’s BERT-like Pretrained Vision Transformer iBOT Achieves New SOTAs
Masked language modelling (MLM) is a pretraining paradigm that tokenizes text into semantically meaningful pieces. Although MLM is the main contributor to the remarkable performance of transformers on natural language processing tasks, its potential application in the emerging visual…