Microsoft & OpenAI’s µTransfer Zero-Shot Hyperparameter Transfer Method Tunes GPT-3’s Hyperparameters on a Single GPU
Hyperparameter (HP) tuning is a strenuous, time-consuming and expensive process for today’s deep neural networks (DNNs), which often scale up to billions of parameters. The recently proposed Maximal Update Parametrization method (µP) addresses this issue by enabling “maximal” feature learning…