Question Answering with PyTorch Transformers: Part 1

Introduction

Paton Wongviboonsin
5 min readJan 1, 2020

In the first part of this series we’ll look at the problem of question answering and the SQUAD datasets. Then we’ll see how the Transformers’ pipeline API allows us to easily use pre-trained models to answer questions questions about specific paragraphs. Later articles in this series will build on that framework to provide a more general and useful system to answer questions from a knowledge base.

Everyone has a different idea of what true artificial intelligence will look like in the future, whether it is ominously menacing, god-like or wryly helpful. Intelligent machines have always been “just around the corner” since we’ve been manufacturing circuits but we may never see them in our lifetimes. However, technology has been advancing at an ever increasing pace and has made the machines we carry around and access through the Internet vastly more useful than we could have imagined decades ago. Rather than create computers to make decision for themselves, we use them to make better decisions ourselves.

Since the dawn of computing one of the great challenges has been how to organize and access the vast amounts of data that can be stored digitally. Over the years many approaches have been attempted to structure data in ways that are amenable to simple transformations and reductions. Some common examples are databases, spreadsheets, ontologies, etc. However, the vast amount of information that exists in the world and is being produced everyday is loosely structured or completely unstructured.

In the last decade neural networks have become our most promising tool to work with data that has been difficult for traditional methods. Unfortunately, the many idiosyncrasies and complexity of natural language has limited what we could accomplish. Initially, recurrent networks and their variants have been able to accomplish handle some tasks well, fell short in others. In the last year Google has released the BERT architecture which has kicked off a flurry of new developments that have helped close the gap.

One such task is reading comprehension. Given a passage of text, we can ask questions about the passage that can be answered by short excerpts from the text. For instance, if we were to ask about this paragraph, “how can a question be answered in a reading comprehension task” a possible answer would be “by short excerpts from the text.”

Stanford has built the SQUAD and SQUAD2.0 datasets for this task. The latter is comprised of ~19k paragraphs each with multiple questions. There are roughly 130k questions total, some of which cannot be answered with the given context, by design. Purportedly, this is to allow training of systems that can admit that they don’t know the answer.

For one passage for example:

During the period between 1582, when the first countries adopted the Gregorian calendar, and 1923, when the last European country adopted it, it was often necessary to indicate the date of some event in both the Julian calendar and in the Gregorian calendar, for example, "10/21 February 1750/51", where the dual year accounts for some countries…

And a corresponding question:

When did the last country to adopt the Gregorian calendar start using it?

You would expect an algorithm to answer “1923”.

A traditional search engine might be able to find this passage based on keyword matching and some might highlight “Gregorian calendar” and “adopt”. On the other hand, the purpose of question answering algorithms is to look beyond lexical similarities and ascertain the intent behind the question. Here, we want to know a temporal value associated with the action of adoption of the Gregorian calendar. Moreover, entity doing the adoption should be a country, but more specifically the last country to adopt.

Previous attempts at question answering have used explicit rules on the linguistic structure, looking at parts of speech, dependencies between nouns and verbs then verbs and nouns, etc. While such disciplined approaches take advantage of domain knowledge, dealing with the long tail of special cases in a language limits their usefulness.

The team at huggingface has created high quality implementations for these BERT-based architectures. BERT itself integrated many of the best ideas to emerge in recent years with respect to network architecture, training objectives and transfer learning. Derivative models offer improved performance on different tasks.

Huggingface provides pre-trained models that have been fine-tuned on different datasets, like SQUAD. They also provide some high-level wrappers that make integrating these models into your project ridiculously easy. Here’s a little example:

from transformers import pipeline
qapipe = pipeline('question-answering')
qapipe({
'question': """how can question answering service produce answers""",
'context': """One such task is reading comprehension. Given a passage of text, we can ask questions about the passage that can be answered by referencing short excerpts from the text. For instance, if we were to ask about this paragraph, "how can a question be answered in a reading comprehension task" ..."""
})

output:

{'score': 0.38941961529900837,
'start': 128,
'end': 169,
'answer': 'referencing short excerpts from the text.'}

An import and two lines of code. That’s it.

While there are a lot of common words between the question and the context, notice how the subject, structure and order of the sentences differ. I’m not saying that shallow learning techniques based on linguistic theory aren’t capable of handling this specific case. However with all the possible subtleties and variations previous methods have not been able to perform close to the level of average humans. That has changed in just the last year with advances in BERT-based architectures. Numerous teams around the world have produced models that match or exceed human-level accuracy.

You might be wondering about how useful this is for people who aren’t taking a standardized test on reading comprehension.

Let’s build a system that can answer questions without being told which specific passage to find the answer in. We will have it treat the 19k paragraphs from SQUAD2.0 as a knowledge base on which we can randomly pose questions like

Plato and Aristotle are known for systematically discussing what?

and receive the answer:

natural philosophy

In the next article we’ll build a simple indexing system to feed contexts into the pipeline which will then extract answers for us.

Continue to Part 2

*) Who am I kidding? **These days it’s “tweets by people pretending to be their pets”.

**) I’m going to exercise a little bit of restraint with sardonic humor in the article body, but footnotes are fair game.

--

--