Summary: “Retrospective Reader for Machine Reading Comprehension”

Mintim
3 min readJun 26, 2020

Here is a short summary of the “Retrospective Reader for Machine Reading Comprehension” paper, one of the latest published papers about Machine Reading Comprehension (MRC) by Zhuosheng Zhang, Junjie Yang and Hai Zhao. In this paper they have evaluated their model with two benchmark datasets, SQuAD2.0 and NewsQA and they have achieved SOTA results over ALBERT and ELECTRA.

Machine Reading Comprehension (MRC) is a subfield of Natural Language Understanding (NLU) that aims at understanding a passage and then finding a correct answer to a question. Deciding whether a question is answerable or not, is one of the biggest challenges in this field that a verification module called verifier in MRC can help finding a solution for.

Zhang et al. (2020) presented Retro_Reader (ensemble) model which has focused on the verifier. The model’s goal is to extract an answer to a question when there is an answer available. The model returns a null string when there is no answer available to the question. The model is designed with a Pre-trained Language Model (PLM) as the encoder and a retrospective reader.

Reader overview

The above picture presents the model architecture in details. The Retrospective Reader includes two parallel modules named “Sketchy Reading Module” and “Intensive Reading Module”. The output of these two modules are combined and the final result is calculated.

First, the input sentence is tokenized and the subword tokens denoted as T = {t1, . . . , tn} are fed into the encoders. Then the encoders extract the sentence features and make sum of [the token embeddings, the positional embeddings and the segment embeddings] and outputs “The input embedding” denoted by X = {x1,…, xn}.

Then input embeddings are fed into the interaction layer which is a multi-layer bidirectional Transformer to extract the contextual representation of the sentence. The model uses Gaussian Error Linear Unit (GELU) activation function and the output of the last-layer hidden states are denoted as H = {h1, . . . , hn}.

The Embedding and Interaction layers are common between both the Sketchy and the Intensive modules. The Sketchy Reading Module makes a preliminary decision to detect unanswerable questions. The reader in this module is an External Front Verifier (E-FV) which passes [CLS] (the first hidden state) to a fully connected layer to get classification logits or regression score by using Cross entropy as training objective.

On the other hand, the Intensive Reading Module performs the following:

  • Verify the answerability
  • Produce candidate answer spans
  • Give the final answer prediction

Investigating two question-aware matching mechanisms by splitting H into HQ (question) and HP (Passage):

  • Transformer-style multi-head cross attention (CA): obtaining H′ by feeding HQ and H to a revised one-layer multi-head attention layer
  • Traditional matching attention (MA):

M = SoftMax(H(WpHQ + bp ⊗ eq)T)

H′ = MHQ

All the weights and biases are learnable parameters. H′ is the weighted sum of all the hidden states and H′ denotes the final prediction.

Like the other verifier, the output from previous layer is passed to a fully connected layer to get classification logits or regression score, but this time h1′ is passed.

Then the outputs of the both modules goes to the Decoder (Rear verification (RV)) which is sum of predicted probabilities of E-FV and I-FV to make the final decision.

The result shows a well designed verification machnisim can considerably improve the performance for MRC tasks.

“I hope you find this post helpful and as I am new in this field, I appreciate all your feedback.”

--

--