DAIR.AI
Published in

DAIR.AI

DistillBERT — Half the price, same performance

Introduction

Knowledge distillation

Distillation loss
Temperature softmax. z is the output logit from the teacher model while T is the temperature. A larger T creates a smoother output distribution.

Architecture

Training

Results

Conclusion

Appendix

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Viktor Karlsson

Learning to write and writing to learn. Staying on top of current NLP research through sharing what I find interesting 🤖 www.linkedin.com/in/viktor2k/