Member-only story
Training A Swin Transformers Model For Image Classification With GPU Acceleration Efficiently.
Training large Swin Transformers models for image classification, without using a ton of resources.
Swin Transformers is a type of vision transformer architecture designed to fix the high resolution limitations of previous vision transformers models. This article seeks to demonstrate of how to fine-tune the model efficiently in the situation access to large amounts of GPU resources are unavailable. This is not a guide on how Swin Transformers works, or a comparison of it to other methods of image classification.
System
For training I used:
- Windows 10
- Python 3.10
- Pytorch 2.6.0
- CUDA 12.4
- RTX 3090
Obtaining And Formatting The Dataset
For the dataset I used this freely available butterfly dataset, containing 75 different butterfly images.
The first step was taking the data and changing it into the format for training. The data was presented as…