Unsupervised Question Answering
How to train a model to answer questions when you have no annotated data
Table of Contents
Introduction
Generating the questions
1. Cloze Generation
- Obtaining the context
- Defining the answers
- Obtaining cloze statements
2. Translating into natural questions
- Identity mapping
- Noisy clozes
- Unsupervised Neural Machine Translation (UNMT)
Training the QA model
1. The XLNet model
2. Results
Introduction
Question Answering
Question Answering models do exactly what the name suggests: given a paragraph of text and a question, the model looks for the answer in the paragraph. A subfield of Question Answering called Reading Comprehension is a rapidly progressing domain of Natural Language Processing. Indeed, several models have already surpassed human performance on the Stanford Question Answering Dataset (SQuAD).
Challenge of obtaining annotated data
These impressive results are made possible by a large amount of annotated data available in English. SQuAD, for instance, contains over 100 000 context-question-answer triplets. However, assembling such effective datasets requires significant human effort in determining the correct answers. Hence, corporate structures face huge challenges in gathering pertinent data to enrich their knowledge. What if we want a model to answer questions in another language? Or on a specific domain in the absence of annotated data?
Towards an unsupervised approach
Unsupervised and semi-supervised learning methods have led to drastic improvements in many NLP tasks. Language modelling, for instance, contributed to the…