Perf model cards

Jaideep Ray
Better ML
Published in
2 min readFeb 12, 2024

Model cards are metadata for trained ML models that provide benchmarked evaluation and performance characteristics. It is an effective framework for model reporting and holds a lot of value in reproducibility and discoverability of a model.

Much has been written on the role of model cards in responsible AI. However, the model cards also play a significant role in understanding the performance characteristics of models, especially in a world where highly capable models are being released every week.

The content of a perf model card can vary depending on the specific model and use case, but these are some general sections that are helpful to include for training & inference.

Training:

  1. Training hardware stack: e.g Number of GPUs and type (e.g 128 A100–80GB), network interconnects (NVLink).
  2. Duration: e.g Time taken to train the model for the specified number of steps/epoch. The unit can be GPU hours. (e.g: 4000 GPU hours)
  3. Dataset size: e.g Number of tokens for language models, Number of rows for Recsys models. (e.g: 2T tokens or 100B rows)
  4. Trainer software stack: e.g: Deepspeed with fp16

Inference:

  1. Inference hardware stack: e.g: Number of GPUs and type: 16 x A10G
  2. Model optimization/processing: fp16 quantized
  3. Serving software stack: Nvidia Triton
  4. Throughput: x examples/second or y tokens/second

Let’s take a concrete example of how model cards help in comparing models based on training cost and identifying techniques to improve efficiency. The following are a few well known small language models available in huggingface-hub.
By checking the dataset size and GPU hours to train across several Small Language Models (SLMs), Phi-2–2.7B model seems to be the quite efficient.

* AWS per hour A-100–80GB cost on 2/11/24 https://instances.vantage.sh/aws/ec2/p4de.24xlarge

Deep diving into model card details more, we see several factors that contribute to the training efficiency of Microsoft’s small language model Phi-2.

  1. Phi-2 was trained on Textbook quality data (augmented by synthetic datasets) as compared to standard processed web corpus.
  2. Moreover, there was scaled knowledge transfer from Phi-1.5 to speed up convergence.
  3. Finally, it was trained on a huge corpus (1.4 T) and a relatively smaller model (2.7B parameters).

Conclusion:

Model cards are useful for both open source models as well as inside an enterprise. It just speeds up informed experimentation, knowledge sharing amongst developers and contributes to better ML.

References:

[1] https://modelcards.withgoogle.com/about

--

--