Chinchilla: Training Compute-Optimal Large Language Models

Isaac Kargar
AIGuys
Published in
5 min readJan 14, 2023

--

The article investigates the optimal model size and the number of tokens for training a transformer language model under a given computation budget. They found that current large language models are undertrained, and by training over 400 language models with varying parameters and the number of tokens, they found that for compute-optimal training, the model size and the number of training tokens should be scaled equally. They test this hypothesis by training a model called Chinchilla which uses the same compute budget as Gopher but with 70B parameters and…

--

--

Isaac Kargar
AIGuys

Co-Founder and CIO @ Resoniks | Ph.D. candidate at the Intelligent Robotics Group at Aalto University | https://kargarisaac.github.io/