An unofficial colab walkthrough of Vision Transformer
Let’s see how it works by running the codes!
1 min readDec 12, 2020
Hi I’m Hiroto Honda, a computer vision researcher. [homepage] [twitter]
Playing with codes is often much more effective to understand machine learning methods than reading papers.
This time I have created a colab notebook for the simple walkthrough of the Vision Transformer.
>>>>[colab notebook] <<<<
You can run the cells directly or make a copy of the notebook in your drive.
We hope you will be able to understand how it works by looking at the actual data flow during inference.
Have fun!
credit
- Paper: Alexey Dosovitskiy et al., “An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale”, https://arxiv.org/abs/2010.11929
- Model Implementation: the notebook loads (and is inspired by) Ross Wightman (@wightmanr)’s amazing module: https://github.com/rwightman/pytorch-image-models/tree/master/timm . For the detailed codes, please refer to the repo.
- The notebook was presented at the paper reading group meeting of DeNA Co., Ltd. & Mobility Technologies Co., Ltd. .