Knowledge Distillation in a Deep Neural Network

Distilling knowledge from a Teacher to a Student in a Deep Neural Network

Renu Khandelwal
Analytics Vidhya

--

Photo by Mikael Kristenson on Unsplash

In this article, you will learn.

  • An easy to understand explanation of Teacher-Student knowledge distillation neural networks
  • Benefits of Knowledge distillation
  • Implementation of Knowledge distillation on the CIFAR-10 dataset

Overview of Knowledge Distillation

Large deep neural networks or an ensemble of deep neural networks built for better accuracy are computationally expensive, resulting in longer inference times which may not be ideal for near-real-time inference required at the Edge.

The training requirements for a model are different from requirements at the inference. During training, an Object recognition model must extract structure from very large, highly redundant datasets but, during inference, must operate in real-time with stringent requirements on latency and computational resources.

To address the latency, accuracy, and computational needs at the inference time, we can use model compression techniques like

  • Model Quantization, low-precision arithmetic for inference, like converting a float to an unsigned

--

--

Renu Khandelwal
Analytics Vidhya

A Technology Enthusiast who constantly seeks out new challenges by exploring cutting-edge technologies to make the world a better place!