Training Larger Models Over Your Average GPU With Gradient Checkpointing in PyTorch

Vikas Kumar Ojha
Geek Culture
Published in
8 min readJan 30

Photo by Sumeet Singh on Unsplash

As machine learning practitioners, we often come across situations where we want to train a model which is relatively larger and our poor GPU is unable to train it because it doesn’t have enough memory. This problem often arrives while we are working in an environment where cloud computing is not allowed due to security reasons. In such an…

Vikas Kumar Ojha
Geek Culture

Deep Learning Engineer @Samsung Electro Mechanics