QAit AI Learns by Interacting With Its Environment

Synced
SyncedReview
Published in
4 min readSep 11, 2019

For an AI system to acquire knowledge the way humans generally do it would need to interact with its surroundings and extract information through its own attention and analysis choices. That’s the idea behind a new paper from Microsoft Research, Polytechnique Montreal, MILA and and the University of Montreal. QAit — Question Answering with Interactive Text introduces an AI system that “learns” to answer questions by interacting with and gathering information from its environment.

QAit question answering game

The AI community has seen the emergence of countless machine reading comprehension (MRC) tasks in recent years. Most of these new MRC tasks however rely on preloaded knowledge sources, and answer questions either by extracting specific words from a knowledge source or by generating text strings.

There is nothing interactive about these traditional question answering models, which tend to behave more like shallow pattern matching: they need fully observed information to predict answers and only focus on declarative knowledge (facts that can be stated explicitly).

QAit is different in that the agent focuses on procedural knowledge, and can interact with its partially observed environment and generate training data in real time. The researchers built text-based games with relevant question-answer pairs. The question types include locations, existence and attribute. They created fixed maps and random maps, which are applied based on whether the layout of the environment and objects within it are fixed or random.

The researchers used QA-DQN as their baseline agent and trained with vanilla DQN, Double DQN and Rainbow.

Overall architecture of baseline agent

The agent consists of an encoder, aggregator, command generator and question answerer. The encoder transfers inputs (observations and questions) to hidden representations which the aggregator merges. The command generator then generates Q-values for all action, modifier and object words.

Training accuracy on the fixed map (left) and random map setup (right)
Agent performance when trained on 10 games, 500 games and “unlimited” games

In training with insufficient information, researchers observed that the agent will be able to master the training games, especially Vanilla DQN and DDQN, if the amount of training data is small. As the amount of training data increases, Rainbow becomes more effective.

Researchers also tested training under sufficient information, with the results shown below:

Test performance given sufficient information

Not surprisingly, performance improved significantly in experiments that involved sufficient information. The researchers conclude that the QAit can help train models to learn effectively given sufficient information, which suggests that interactive and more human-like information seeking models may be a research direction that can challenge simple word matching methods.

The paper The HSIC Bottleneck: Interactive Language Learning by Question Answering is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any stories. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report

--

--

Synced
SyncedReview

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global