Hot ‘CoQA’ and Marshmallows — An Intro to Stanford’s new QA dataset in 3 min

Published in

The Artificial Intelligence Journal

3 min readAug 26, 2018

What is the CoQA dataset?

CoQA stands for the Conversational Question Answering, it was released a couple days ago by the Stanford NLP group. I don’t usually write bite-sized articles but when I saw this come up on my twitter I just had to, especially given my past interest in creating a question generation study app.

Question answering datasets are an essential part of any NLP comprehension task, allowing a model to be exposed to a wide and diverse range of linguistic and semantic phenomenon. Therefore, the dataset you choose plays a crucial part in how well your model performs.

The CoQA dataset (pronounced coca) differs from the other Stanford dataset SQuAD in a couple of ways.

Firstly, the SQuAD dataset is not conversational, its answers are extractive. What this means is that the answer is taken directly (think copy paste) from the passage given, as opposed to an abstract concept being understood and translated into an answer. CoQA’s answers, on the other hand, are abstractive, with an extractive rationale; what this means is that the machine can highlight its justification, from the passage for its answer (FYI, its rationale can span multiple sentences). Don’t worry if you don’t quite get this, I’ll go through an example later that’ll help you make sense of it.

Before I go on, here’s a quick note on why abstractive datasets are preferable to extractive ones. Extractive summaries (or comprehension) cannot include words that were not part of the original text. Therefore, they are limited in both the scope of their complexity, and the ideas they can represent. Abstractive summaries, on the other hand, have neither of these drawbacks and allow the model to use a much broader set of vocabulary and concepts. Therefore, a model capable of abstract reasoning is considered superior to a purely extractive one. If you’re interested, this Quora thread explores the topic in a lot more detail.

The other difference is that it also contains unanswerable questions, which allows a model to learn when there simply isn’t enough information to answer. The ability for a model to say “this can’t be answered” is a useful one from a UX standpoint. Instead of the model forcing itself to spew poor quality garbage, it can simply recognise that there isn’t enough information and tell the user so, thus ensuring a high quality experience.

Before we wrap up, let’s take a quick look at some samples:

As you can see, each answer is not extractive but contains an extractive rationale. For example, take the last question in the picture. It asks which of the two candidates is winning, and you can see that the answer is “Terry McAuliffe”. If you take a look at the passage itself, it never says which of them is winning in those words (i.e. the word winning), yet it can be answered by a model that understands the abstract concept of Terry being in the lead from the sentence rationale given.

This dataset looks like it’s going to make a big splash in the NLP space. If the SQuAD dataset was anything to judge from, the solutions the deep learning community puts forward for this one will be exciting!!

https://arxiv.org/abs/1808.07042

If you enjoyed this article, be sure to give our publication a follow!! Check out some of our other write-ups:

Understanding how Facebook’s new AI translates between music genres — in 7 minutes

Imagine this: your friend’s been bugging you to listen to a song for weeks, even though you told them you don’t like Ed…

medium.com

Understand Google’s cutting-edge HDRnet in 10 minutes

HDR and mobile image enhancement has long been an area of interest within the realm of computer vision. With the…