Model quality & compute budget

Jaideep Ray
Better ML
Published in
2 min readOct 16, 2023

A scaling law is a mathematical formula that describes how properties of a system change with the size/scale of a system. Recent work in scaling laws for transformer training (e.g Chinchilla, OpenAI scaling laws) is still under active research and debate. Still, nonetheless, it gives a practical framework to think about model quality, compute budget ($ required to run a training job), and what capabilities an ML platform should prioritize for large model training.

There are three parameters to increase model quality:

  1. Model size (M): The larger the model size, the more its capacity to learn patterns in data.
  2. Dataset size (D): With sufficient model capacity, model quality improves with increased dataset size.
  3. Training time (T): More training time may enhance the quality of the model given a fixed model and dataset.
  • The compute budget is a function of training time and resources (cpu/gpu/memory). Increasing the model size and dataset requires more resources and larger training time.
  • All three factors (M, D, T) reach a diminishing point of return beyond a specific limit. Understanding the scaling laws for these parameters for a particular ML training workload is a worthwhile exercise to improve model quality.
  • ML platform has its job cut out — to enable ML developers to determine the scaling laws most efficiently — compute and developer time-wise.

--

--