#### How to use Reinforcement Learning for Natural Language Processing: Lesson 2

In the previous Lesson, we introduced the idea of training a RL-agent to perform linguistic parsing. The focus point of this Lesson is to introduce simple representations for actions and states appropriate for this task.

For simplicity, we will initially restrict attention to languages that can be fully described by a set of rewriting rules. I will call it “RuleSet”. Every rule in RuleSet consists of two sequences of symbols, called, respectively, Pattern Description and Label.

These two sequences are separated by an arrow (=>). The Pattern Description is written to the left of the arrow. The Label is written to the right of the arrow.

When a set of rewriting rules is specified, linguistic parsing is simply a search for a series of rewriting steps (all licensed by rewriting rules present in the RuleSet) that leads all the way from the input text to a Label.

A Linguistic Parser could be implemented with the help of machine learning, or it could be a regular program, not based on machine learning (in which case we call it “deterministic”). Such a “deterministic” Linguistic Parser is searching for the right series of rewriting steps simply by trying one rule after another until the right series is found.

### Defining the Action Space

Obviously, the RL Environment for training intelligent agents to perform “linguistic parsing” should have access to a Linguistic Parser. We will assume that our Environment has direct access to a top-down deterministic Parser. Of course, Environment also has a RuleSet that defines a language, and a set of phrases or sentences to work with.

Let *m* be the current symbol position in the text, and let *n* be a Label number. An attempt to apply the parsing algorithm starting with the position *m* and to match a Label number *n *will be considered **an elementary action** in this setting.

Such action can be conveniently represented by a pair of integers (*m, n*), or alternatively by a “one-hot matrix” (which has the value 1 only in the cell number *n* of the row number *m, *and has zeros everywhere else).

**Defining the State Space**

The state *s* consists of two matrices, Text Matrix **T** and History Matrix **H**.

The matrix **T** (for “text”) reflects the input text. The number of rows in **T** equals to the number of tokens in the input sequence.

Lets call the set of all the symbols found in R “alphabet”. The number of columns in **T** equals the number of symbols in the “alphabet”. This matrix **T** is not changing until the next input sequence is presented.

Each row in Text Matrix **T** represents a symbol in the input text, and is a “one-hot” vector of the same size as the size of the “alphabet”. The number of rows in the Text Matrix equals the number of symbols in the input sequence.

The number of rows in the History Matrix **H** is the same as in **T**. Each column corresponds to an individual rewriting rule in the RuleSet.

In this game, the action (*m, n*) simply selects a cell (*m*, *n*) in **H**, and changes it from 0 to 1. The History Matrix shows which actions took place until now. It is simply the sum of (matrix representations of) all the different actions that took place so far.

As soon as all the necessary cells in the matrixare set to 1 (that together lead to a correct parse), Environment gives reward to the learning agent,His reset to have zeros everywhere, and a new text input is introduced.H

**That’s all!**

In the next Lesson, I will describe how a RL-based agent can learn linguistic parsing using actions and states as described above.