A short analysis of the interpretability of BERT contextual word representations.
How we established state-of-the-art noise robustness using a simple loss function.