Viktor – Medium

Viktor
in
DAIR.AI

The Lottery Ticket Hypothesis

When randomness works in our favour

6 min readAug 7, 2020

--

The Lottery Ticket Hypothesis

--

Viktor
in
DAIR.AI

When are contextual embeddings worth using?

Contextual embeddings from BERT are expensive, and might not bring value in all situations.

4 min readAug 1, 2020

--

When are contextual embeddings worth using?

--

Viktor
in
DAIR.AI

Poor Man’s BERT — Why Pruning is Better than Knowledge Distillation ✂️

Exploring the simple approach to model compression

6 min readJul 26, 2020

--

1

Poor Man’s BERT — Why Pruning is Better than Knowledge Distillation ✂️

--

1

Viktor
in
Analytics Vidhya

MobileBERT: BERT for Resource-Limited Devices

A BERT model small enough to run efficiently on a phone while matching BERT-base performance on GLUE!

12 min readJul 23, 2020

--

MobileBERT: BERT for Resource-Limited Devices

--

Viktor
in
DAIR.AI

MobileBERT — A task agnostic BERT for resource-limited devices ☎️

A BERT model small enough to run efficiently on a phone while matching BERT-base performance on GLUE! 🚀

6 min readJul 19, 2020

--

2

MobileBERT — A task agnostic BERT for resource-limited devices ☎️

--

2

Viktor
in
Level Up Coding

The Million-Dollar Matrices 💸

When throwing more money at the problem has become the new norm, how do we progress?

4 min readJul 12, 2020

--

The Million-Dollar Matrices 💸

--

Viktor
in
DAIR.AI

Making monolingual sentence embeddings multilingual using knowledge distillation

Aligning token representations across languages for a multilingual transformer model using knowledge distillation of SentenceBERT

8 min readJul 12, 2020

--

1

Making monolingual sentence embeddings multilingual using knowledge distillation

--

1

Viktor
in
DAIR.AI

What does BERT look at?

An in depth study of the language aspects captured by the attention heads of BERT.

8 min readMay 6, 2020

--

What does BERT look at?

--

Viktor
in
DAIR.AI

Longformer — The Long-Document Transformer 📝

Processing longer forms of text with BERT-like models require us to rethink the attention mechanism in more than one way.

7 min readApr 30, 2020

--

Longformer — The Long-Document Transformer 📝

--

Viktor
in
DAIR.AI

ELECTRA — Addressing the flaws of BERT’s pre-training process

Achieving higher performance with smaller models trained less. Sounds too good to be true. But is it?

5 min readApr 17, 2020

--

ELECTRA — Addressing the flaws of BERT’s pre-training process

--

Viktor

Viktor

Learning to write and writing to learn. Staying on top of current NLP research through sharing what I find interesting 🤖

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams