ViktorinDAIR.AIThe Lottery Ticket HypothesisWhen randomness works in our favourAug 7, 2020Aug 7, 2020
ViktorinDAIR.AIWhen are contextual embeddings worth using?Contextual embeddings from BERT are expensive, and might not bring value in all situations.Aug 1, 2020Aug 1, 2020
ViktorinDAIR.AIPoor Man’s BERT — Why Pruning is Better than Knowledge Distillation ✂️Exploring the simple approach to model compressionJul 26, 20201Jul 26, 20201
ViktorinAnalytics VidhyaMobileBERT: BERT for Resource-Limited DevicesA BERT model small enough to run efficiently on a phone while matching BERT-base performance on GLUE!Jul 23, 2020Jul 23, 2020
ViktorinDAIR.AIMobileBERT — A task agnostic BERT for resource-limited devices ☎️A BERT model small enough to run efficiently on a phone while matching BERT-base performance on GLUE! 🚀Jul 19, 20202Jul 19, 20202
ViktorinLevel Up CodingThe Million-Dollar Matrices 💸When throwing more money at the problem has become the new norm, how do we progress?Jul 12, 2020Jul 12, 2020
ViktorinDAIR.AIMaking monolingual sentence embeddings multilingual using knowledge distillationAligning token representations across languages for a multilingual transformer model using knowledge distillation of SentenceBERTJul 12, 20201Jul 12, 20201
ViktorinDAIR.AIWhat does BERT look at?An in depth study of the language aspects captured by the attention heads of BERT.May 6, 2020May 6, 2020
ViktorinDAIR.AILongformer — The Long-Document Transformer 📝Processing longer forms of text with BERT-like models require us to rethink the attention mechanism in more than one way.Apr 30, 2020Apr 30, 2020
ViktorinDAIR.AIELECTRA — Addressing the flaws of BERT’s pre-training processAchieving higher performance with smaller models trained less. Sounds too good to be true. But is it?Apr 17, 2020Apr 17, 2020