Training Larger Models Over Your Average GPU With Gradient Checkpointing in PyTorch
Published in
8 min readJan 30
--
As machine learning practitioners, we often come across situations where we want to train a model which is relatively larger and our poor GPU is unable to train it because it doesn’t have enough memory. This problem often arrives while we are working in an environment where cloud computing is not allowed due to security reasons. In such an…