Review — Model Distillation: Distilling the Knowledge in a Neural Network (Image Classification)

Smaller Models are Obtained Using Distillation. Faster Training for AlexNet on JFT Dataset.

Sik-Ho Tsang
Feb 28 · 6 min read
Higher Temperature for Distillation

In this story, Distilling the Knowledge in a Neural Network, by Google Inc., is briefly reviewed. This is a paper by Prof. Hinton.

Model ensembling is a simple way to improve the model performance. Yet, it can be computational expensive, especially if the individual models are large neural nets.

This is a paper in 2014 NIPS with over 5000 citations. (Sik-Ho Tsang @ Medium)

Outline

  1. Higher Temperature for Model Distillation
  2. Experimental Results

1. Higher Temperature for Model Distillation

1.1. Higher Temperature for Soft Targets

Using a higher value for T produces a softer probability distribution over classes. This is useful since much of the information about the learned function resides in the ratios of very small probabilities in the soft targets.

1.2. The Calculation of Gradients

2. Experimental Results

2.1. MNIST

This shows that soft targets can transfer a great deal of knowledge to the distilled model.

2.2. Speech Recognition

Frame classification accuracy and Word Error Rate (WER)

2.3. JFT

Classification accuracy (top 1) on the JFT development set

The idea is that the accuracy can be improved when we have more specialists covering a particular class. At the same time, the training time can be shorter since training independent specialist models is very easy to parallelize.

2.4. Soft Targets as Regularizers

Frame classification accuracy and Word Error Rate (WER)

Soft targets allow a new model to generalize well from only 3% of the training set.

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Sik-Ho Tsang

Written by

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn: https://www.linkedin.com/in/sh-tsang/, My Paper Reading List: https://bit.ly/33TDhxG

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.