ViT — VisionTransformer, a Pytorch implementation

Alessandro Lamberti
Artificialis
Published in
5 min readAug 19, 2022

--

The Attention is all you need’s paper revolutionized the world of Natural Language Processing and Transformer-based architecture became the de-facto standard for natural language processing tasks.

It was only a matter of time before someone would actually try to reach the state of the art in Computer Vision, with attention mechanism and transformer architectures.

--

--