Model Compression: an Introduction to Teacher-Student Knowledge Distillation

Alessandro Lamberti
Artificialis
Published in
5 min readOct 28, 2022

--

Source

There’s often the misconception that bigger and bigger models are always necessarily better. For instance, GPT-3 is trained on 570 GB of text and consists of 175 billion parameters.

However, whilst training large models helps improve state-of-the-art performance, deploying such cumbersome models (especially on edge devices) is not…

--

--