Two minutes NLP — Quick intro to Question Answering

Extractive and Generative QA, Open and Close QA, SQuAD and SQuAD v2

Fabio Chiusano
NLPlanet
4 min readMar 4, 2022

--

Hello fellow NLP enthusiasts! Today we see a branch of NLP dedicated to answering questions using knowledge bases, which are often made of documents. This article is introductive and meant to be a starting point for the field. Enjoy! 😄

What is Question Answering

Question Answering models are able to retrieve the answer to a question from a given text. This is useful for searching for an answer in a document. Depending on the model used, the answer can be directly extracted from text or generated from scratch.

Use cases

Question Answering (QA) models are often used to automate the response to frequently asked questions by using a knowledge base (e.g. documents) as context. As such, they are useful for smart virtual assistants, employed in customer support or for enterprise FAQ bots (i.e. directed towards enterprise employees).

Moreover, many search systems augment their search results with instant answers, which provide the user with immediate access to information relevant to their query.

Question Answering variants

QA systems differ in the way answers are created.

  • Extractive QA: The model extracts the answer from a context and provides it directly to the user. It is usually solved with BERT-like models.
  • Generative QA: The model generates free text directly based on the context. It leverages Text Generation models.

Moreover, QA systems differ in where answers are taken from.

  • Open QA: The answer is taken from a context.
  • Closed QA: No context is provided and the answer is completely generated by a model.

Sample code

You can infer with QA models with the Hugging Face transformers library using the question-answering pipeline, which by default will be initialized with the distilbert-base-cased-distilled-squad model (which is a model for extractive open QA). This pipeline takes a question and a context from which the answer will be extracted and returned.

First, let’s install the transformers library using pip as usual.

Then, we create a pipeline object with the question-answering task, and use it by providing a question and a context.

The model returns a dictionary containing the keys:

  • answer: The text extracted from the context, which should contain the answer.
  • start: The index of the character in the context that corresponds to the start of the extracted answer.
  • end: The index of the character in the context that corresponds to the end of the extracted answer.
  • score: The confidence of the model in extracting the answer from the context.

Fast Question Answering over many documents

Running the QA model over many documents can be slow. To speed up the search, you can first use passage ranking models to see which documents might contain the answer to the question and iterate over them with the QA model.

Question Answering Datasets

The dataset that is used the most as an academic benchmark for extractive question answering is SQuAD (The Stanford Question Answering Dataset). SQuAD is a reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia articles, where the answer to every question is a segment of text from the corresponding reading passage. It contains 100,000+ question-answer pairs on 500+ articles.

This is the current leaderboard of SQuAD.

SQuAD leaderboard. Image from https://rajpurkar.github.io/SQuAD-explorer/.

There is also a harder SQuAD v2 benchmark, which includes questions that don’t have an answer. It combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowd-workers to look similar to answerable ones.

This is the current leaderboard of SQuAD v2.

SQuAD v2 leaderboard. Image from https://rajpurkar.github.io/SQuAD-explorer/.

Conclusions and next steps

In this article, we learned what is the Question Answering branch of NLP, what are some of its use cases and variants of the tasks. We saw a code example where we tested a trained question answering model and peeked into the leaderboard of the de-facto dataset of question answering, which is SQuAD.

Possible next steps are:

--

--

Fabio Chiusano
NLPlanet

Freelance data scientist — Top Medium writer in Artificial Intelligence