ACL Best Paper: Tricky Stanford DataSet Adds Questions That Don’t Have Answers

Earlier this week the Association for Computational Linguistics (ACL) 2018 announced its Best Two Short Papers, neither of which had yet been published. Today the AI community got its first look at one of the winners when Know What You Don’t Know: Unanswerable Questions for SQuAD was released on arXiv. The paper is from a Stanford University research group including Pranav Rajpurkar, Robin Jia and Percy Liang, and shines a spotlight on the latest research progress in natural language understanding (NLU).

An ongoing AI research target is the creation of a question-answering machine which can understand and respond to complex, nuanced and out-of-context questions in natural language. Today’s state-of-the-art Q&A systems can perform at human level when retrieving answers from a context document, but cannot make sense of questions for which the correct answer is not stated in the context.

The paper provides an example of how a machine might deal with unanswerable questions. In the example, even a human reader might guess the “Plausible” but not certainly correct answer. The machine similarly locates the most relevant possible answers from the context, but does not know whether the answers are correct or not:

The paper introduces SQuAD 2.0, the latest version of the large-scale open-sourced reading comprehension dataset Stanford Question Answering Dataset (SQuAD). SQuAD 1.1 was created in 2016 and includes 100,000 questions on Wikipedia articles for which the answer can be directly extracted from a segment of text.

SQuAD2.0 combines the SQuAD1.1 questions with over 50,000 new, unanswerable questions written adversarially by crowdworkers to seem similar to answerable questions. SQuAD2.0 effectively tries to trick the machine, thus raising the bar for natural language AI system performance. A state-of-the-art AI system scoring 86% on SQuAD 1.1 test data achieves only 66% on SQuAD 2.0.

Stanford researchers believe SQuAD2.0 can help AI models to recognize when questions cannot be answered based on the provided textual data. It adds complexity that promises a huge performance training boost for researchers in the NLU field.

Check out this Github for more information.

Journalist: Tony Peng | Editor: Michael Sarazen

Subscribe here to get insightful tech news, reviews and analysis!

Synced and TalkingData will be jointly holding DTalk Episode One: Deploying AI in Mobile-First Customer-facing Financial Products: A Tale of Two Cycles. Jike Chong will share his ideas on employing AI techniques in FinTech business model. Scan the QR code to register! See you on June 21st in Silicon Valley.