ViktorinDAIR.AIThe Lottery Ticket HypothesisWhen randomness works in our favour6 min read·Aug 7, 2020----
ViktorinDAIR.AIWhen are contextual embeddings worth using?Contextual embeddings from BERT are expensive, and might not bring value in all situations.4 min read·Aug 1, 2020----
ViktorinDAIR.AIPoor Man’s BERT — Why Pruning is Better than Knowledge Distillation ✂️Exploring the simple approach to model compression6 min read·Jul 26, 2020--1--1
ViktorinAnalytics VidhyaMobileBERT: BERT for Resource-Limited DevicesA BERT model small enough to run efficiently on a phone while matching BERT-base performance on GLUE!·12 min read·Jul 23, 2020----
ViktorinDAIR.AIMobileBERT — A task agnostic BERT for resource-limited devices ☎️A BERT model small enough to run efficiently on a phone while matching BERT-base performance on GLUE! 🚀6 min read·Jul 19, 2020--2--2
ViktorinLevel Up CodingThe Million-Dollar Matrices 💸When throwing more money at the problem has become the new norm, how do we progress?·4 min read·Jul 12, 2020----
ViktorinDAIR.AIMaking monolingual sentence embeddings multilingual using knowledge distillationAligning token representations across languages for a multilingual transformer model using knowledge distillation of SentenceBERT8 min read·Jul 12, 2020--1--1
ViktorinDAIR.AIWhat does BERT look at?An in depth study of the language aspects captured by the attention heads of BERT.8 min read·May 6, 2020----
ViktorinDAIR.AILongformer — The Long-Document Transformer 📝Processing longer forms of text with BERT-like models require us to rethink the attention mechanism in more than one way.7 min read·Apr 30, 2020----
ViktorinDAIR.AIELECTRA — Addressing the flaws of BERT’s pre-training processAchieving higher performance with smaller models trained less. Sounds too good to be true. But is it?5 min read·Apr 17, 2020----