Model quality & compute budget
Published in
2 min readOct 16, 2023
A scaling law is a mathematical formula that describes how properties of a system change with the size/scale of a system. Recent work in scaling laws for transformer training (e.g Chinchilla, OpenAI scaling laws) is still under active research and debate. Still, nonetheless, it gives a practical framework to think about model quality, compute budget ($ required to run a training job), and what capabilities an ML platform should prioritize for large model training.
There are three parameters to increase model quality:
- Model size (M): The larger the model size, the more its capacity to learn patterns in data.
- Dataset size (D): With sufficient model capacity, model quality improves with increased dataset size.
- Training time (T): More training time may enhance the quality of the model given a fixed model and dataset.
- The compute budget is a function of training time and resources (cpu/gpu/memory). Increasing the model size and dataset requires more resources and larger training time.
- All three factors (M, D, T) reach a diminishing point of return beyond a specific limit. Understanding the scaling laws for these parameters for a particular ML training workload is a worthwhile exercise to improve model quality.
- ML platform has its job cut out — to enable ML developers to determine the scaling laws most efficiently — compute and developer time-wise.