Knowledge Distillation in a Deep Neural Network

Distilling knowledge from a Teacher to a Student in a Deep Neural Network

Renu Khandelwal

Published in

Analytics Vidhya

8 min readMar 6, 2021

In this article, you will learn.

An easy to understand explanation of Teacher-Student knowledge distillation neural networks
Benefits of Knowledge distillation
Implementation of Knowledge distillation on the CIFAR-10 dataset

Overview of Knowledge Distillation

Large deep neural networks or an ensemble of deep neural networks built for better accuracy are computationally expensive, resulting in longer inference times which may not be ideal for near-real-time inference required at the Edge.

The training requirements for a model are different from requirements at the inference. During training, an Object recognition model must extract structure from very large, highly redundant datasets but, during inference, must operate in real-time with stringent requirements on latency and computational resources.

To address the latency, accuracy, and computational needs at the inference time, we can use model compression techniques like

Model Quantization, low-precision arithmetic for inference, like converting a float to an unsigned…

Knowledge Distillation in a Deep Neural Network

Distilling knowledge from a Teacher to a Student in a Deep Neural Network

Overview of Knowledge Distillation

Written by Renu Khandelwal