CMU & Google XLNet Tops BERT; Achieves SOTA Results on 18 NLP Tasks

Synced
Synced
Jun 24, 2019 · 4 min read
Image for post
Image for post

In 2018 Google released BERT (Bidirectional Encoder Representations from Transformers), a large-scale natural language pretraining model that achieved state-of-the-art performance on 11 NLP tasks and stimulated NLP research across academia and industry. A team of researchers from Carnegie Mellon University and Google Brain have now proposed XLNet, a new language model which outperforms BERT on 20 language tasks including SQuAD, GLUE, and RACE; and has achieved SOTA results on 18 of these tasks. XLNet’s training code and model have been open-sourced on .

Image for post
Image for post

The CMU and Google researchers suggest that pretraining models such as the BERT platform which are based on denoising auto-encoding can model bidirectional context better than pretraining methods based on auto-regressive language modeling. Models like BERT however mask part of the input, which can result in pretrain-finetune discrepancies between the pretraining generic model and the fine-tuned model with specific data and cases.

XLNet is a generalized autoregressive pretraining model that combines the advantages of auto-regressive (AR) language modeling and auto-encoding (AE) while avoiding the shortcomings of both (although existing unsupervised pretraining objectives each have their own advantages and disadvantages, AR and AE are the best among them). Instead of using the traditional fixed forward or backward factorization orders in AR-based models, XLNet maximizes all possible sequences of the factorization order to learn bidirectional contexts, which enable each position to learn contextual information from all positions, namely bidirectional context capturing.

Image for post
Image for post

As a generalized AR language model, XLNet does not rely on fragmented data and so won’t suffer from the aforementioned pretrain-finetune discrepancies like BERT does. At the same time, an AR objective uses the product rule to factorize joint probability of the predicted units, eliminating BERT’s independence assumption to improve the relevancy of contextual information.

Furthermore, XLNet also improves pretrained architecture design by integrating the relative positional encoding scheme and the segment recurrence mechanism of the SOTA autoregressive model Transformer-XL into pretraining. Experiments show that this approach tremendously improves XLNet performance on language tasks that contain long text sequences.

The above features allowed XLNet to surpass BERT’s performance on 20 tasks, with SOTA performance on 18 tasks, including question answering, natural language inference, sentiment analysis, and document ranking.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

The researchers say they plan to apply XLNet to a broader range of tasks in the future, such as computer vision and reinforcement learning.

The paper XLNet: Generalized Autoregressive Pretraining for Language Understanding is on .


Author: Herin Zhao | Editor: Michael Sarazen


is out!
Purchase a Kindle-formatted report on .
Apply for to get a complimentary full PDF report.

Image for post
Image for post

Follow us on Twitter for daily AI news!


We know you don’t want to miss any stories. Subscribe to our popular to get weekly AI updates.

Image for post
Image for post

SyncedReview

We produce professional, authoritative, and…

Synced

Written by

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Synced

Written by

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store