From Ballerina to AI Researcher: Part VIII

Reinforcement learning in NLP/NLU and my eighth week as an OpenAI scholar

Published in

BuzzRobot

4 min readAug 1, 2018

Humans are like RL agents in the environment called “Life”

This past week I focused on learning more about LSTMs as I’m building one and will share this with you later. Also, I got into reinforcement learning targeting doing future research by applying RL for NLP tasks.

RL was inspired by behavioral psychology, and, from my perspective, we are agents to some degree.

Just think about it: whatever we do in our life, we are working towards maximizing a reward. When we play video games, our brain receives a reward when we win the game. This extends to gambling to working on getting a promotion at work to building a billion dollar company to cultivating happy personal relationships and any other kind of craving or goal-seeking behavior.

All these things “reward’’ us with money, recognition, a sense of belonging and love, fulfillment, etc.

Even in daily life we are driven by the goal of receiving a “reward” — all our social media interactions — checking Facebook and Twitter for updates and especially likes and positive comments.

That’s why humans are addicted to certain goal-seeking behaviors (e.g. I’m a chocolate addict — I simply can’t imagine my life without chocolate, though I recently made progress switching from milk chocolate to dark).

We are agents in the environment called ‘Life’ and it’s on us to figure out the rules of the ‘game’ and how maximize our ‘reward’.

My Eight Week as an OpenAI scholar

I see huge potential in RL, and in 5–7 years we will have practical applications of the technique. The recent OpenAI robot-hand release is great proof of how RL can be applied to real-world problems.

Last week, I read a bunch of papers on RL, especially those that are related to NLP to some extent.

One of the papers that caught my eye is called End-to-End Goal-Driven Web Navigation.

In this paper, authors propose goal-driven web navigation as a large-scale alternative to the text-based games for evaluating agents with natural language understanding and planning capability. The goal-driven web navigation consists of the whole website as a graph — the web pages are nodes and hyperlinks are directed edges.

An agent is given a query with one or more sentences taken from a randomly selected web page in the graph and navigates the network, starting from a predefined starting node, to find a target node in which the query appears.

The authors also released a software tool called WebNav that converts a given website into a goal-driven web navigation task. As an example of its use, they provide WikiNav built from the English Wikipedia. The designed agents called NeuAgents based on neural networks are trained with supervised learning.

Authors also extend the WikiNav with an additional set of queries that are constructed from Jeopardy! questions.

They evaluated the NeuAgents against the three search-based strategies: SimpleSearch, Apache Lucene and Google Search API. The result in terms of document recall indicates that the NeuAgents outperform those search-based strategies.

These papers resonate with my thoughts, because if we accept humans as general agents, we still don’t have information about everything. What I mean is we are very good at utilizing external knowledge bases and can work with information. This is what we have to teach agents do: Conduct a Google search, work with data knowledge bases to accomplish necessary tasks.

Another paper that I’d like to highlight is called Understanding Grounded Language Learning Agents. It describes neural network-based systems that locate the referents of words and phrases in images, answers questions about visual scenes, and executes symbolic instructions as first-person actors in partially-observable worlds.

For maximum control and generality, authors focused on a simple neural network-based language learning agent trained via policy-gradient methods to interpret synthetic linguistic instructions in a simulated 3D world (see below).

Credit to https://arxiv.org/pdf/1710.09867.pdf

Left: Schematic agent architecture. Right: The agent observes two 3D rotating objects and a language instruction and must select the object that matches the instruction. In this case, the instruction is a shape word (chair).

Using these methods, the authors explored how the training environment of their agent affects its learning outcomes and speed, measure the generality and robustness of its understanding of certain fundamental linguistic concepts, and test for biases in the decisions it takes once trained.

I think grounded language learning has a huge potential and it’s worth digging into more. If you are up to getting more insights on that, here is related work.

If you have any questions or comments, feel free to ping me. You can learn more about me at Twitter.

You can check out my previous articles here:

From Ballerina to AI Researcher: Part VII

From Ballerina to AI Researcher: Part VI

From Ballerina to AI Researcher: Part V

From Ballerina to AI Researcher: Part IV

From Ballerina to AI Researcher: Part III

From Ballerina to AI Researcher: Part II

From Ballerina to AI Researcher: Part I