Knowledge Distillation in a Deep Neural Network
Distilling knowledge from a Teacher to a Student in a Deep Neural Network
In this article, you will learn.
- An easy to understand explanation of Teacher-Student knowledge distillation neural networks
- Benefits of Knowledge distillation
- Implementation of Knowledge distillation on the CIFAR-10 dataset
Overview of Knowledge Distillation
Large deep neural networks or an ensemble of deep neural networks built for better accuracy are computationally expensive, resulting in longer inference times which may not be ideal for near-real-time inference required at the Edge.
The training requirements for a model are different from requirements at the inference. During training, an Object recognition model must extract structure from very large, highly redundant datasets but, during inference, must operate in real-time with stringent requirements on latency and computational resources.
To address the latency, accuracy, and computational needs at the inference time, we can use model compression techniques like
- Model Quantization, low-precision arithmetic for inference, like converting a float to an unsigned…