Baidu’s Enhanced Representation through kNowledge IntEgration: Explained

Arjun S
Neuro-labs
Published in
4 min readAug 21, 2019

“I know all those words, but that sentence makes no sense to me.”
Matt Groening

Preface

The purpose of writing this article is to usher in a culture of research in yourself, your organization, and your community. It is highly recommended that you still read the actual paper after reading this review.

Why Natural Language Understanding?

We often take for granted, all the amazing research and technological advancements made to build products that we use every day. Every complex search query, every seemingly natural conversation with a chatbot and every relevant quora answer clustered together to make your life so much easier — is driven by years of cutting edge research in Natural Language Understanding. NLU is what allows machines to understand the contextual meaning of natural language sentences.

Introduction

Baidu Research recently announced ERNIE 2.0 (Enhanced Representation through kNowledge IntEgration), their brand-new natural language understanding model that outperformed Google’s state of the art BERT and the new XLnet in 16 NLP tasks. Unsupervised NLU models like Google’s BERT, XLnet and ERNIE 1.0 are pre-trained on a large corpus of texts for tasks like question answering, sentiment analysis, named entity recognition, semantic similarity, and natural language inference. Training on such simple tasks allows the model to learn co-occurrences of words or sentences. Baidu Research’s ERNIE 2.0 is an improvised approach to solving NLU that taps into other valuable lexical, syntactic and semantic information in training corpora other than just co-occurrences. ERNIE 2.0 incrementally introduces customized tasks into the training process, hence allowing the model to learn from a diverse set of tasks. Instead of being laser-focused on optimizing the model for a particular metric, ERNIE 2.0 follows a continual pre-training pipeline for multiple tasks (multi-task learning).

Inspired by humans

Imagine a kid learning to read for the first time. She first learns the alphabets — how to spell them, how to visually isolate and identify them. She then learns to identify words as a whole, understand relations between words, and comprehend sentences. Every stage of this learning process is facilitated by what she had learned previously. While summarizing a process as complex as language comprehension in a few steps is no justice to the multitudes of years of evolution we went through before getting here. My point is that this pattern of continual learning is very common in us humans.

ERNIE 2.0 follows a similar continual learning process wherein the model is sequentially trained on many tasks such that the learnings from one task are remembered while learning from the next task. This helps the same model perform well on new tasks using its “acquired knowledge” from its previous training.

This continual learning is done in two stages — constructing a new unsupervised training task from big-data & prior knowledge, and training the model on the constructed task via multi-task learning.

Pre-training Tasks

This approach constructs 3 types of tasks for capturing different aspects of information:

  1. Word-aware Tasks: to capture the lexical information
  2. Structure-aware Tasks: to capture the syntactic information
  3. Semantic-aware Tasks: to capture the semantic information
Source: https://github.com/PaddlePaddle/ERNIE

While the latter two tasks share similar structures and hence similar results with its fellow models — BERT and XLNet, ERNIE’s word and phrase masking yields noticeably better results. ERNIE 1.0 brought about a new and improved strategy for Knowledge Masking Tasks. These tasks involve masking named entities and phrases of contextual importance. The model tries to predict the entire masked phrase. This helps the model learn the dependency information in both local contexts and global contexts.

There are many more interesting pre-training tasks the ENRIE 2.0 follows, especially the IR relevance Task which helps the model learn the relevance of a document for a given short text query (can be used to build powerful semantic search engines). I will soon be writing on a few of these pre-training tasks in detail. Stay tuned!

Conclusion

The Internet holds thousands of petabytes of unstructured data, most of which is text. Natural Language Understanding has already made significant impacts and still holds unprecedented potential in things like capturing meaningful insights, offering natural experiences through the next generation of chatbots, and helping make contextual queries on huge magnitudes of data. Today’s NLU systems are not nearly as accurate as humans and fail in scenarios that call for an exercise of common sense. These are exciting times to be alive, and be a part of all the amazing strides in research going on to making this world (and beyond) into a better place for everyone to live.

--

--