NLP Lecture 12 @ CMU — A Watch👓 & Read Treat🍨

Generating Incremental Trees — Part 1 : by Graham Neubig

In this lecture, we learn how to adapt the structure of outputs of a neural net to predict some tree structure over an input sequence.

What kind of tree structure over the sequence ?

First, note that the words in a sentence, are related to other words in the sentence. Two words can be related , if one is the subject of the other, or one is the object of another, or in any other way. These relations usually end up making trees over the input sentence.

2 types of tree structure on the input sequence

But why do this ? What use is the structure over a sentence? Does it contain any more information than the original sentence?

Yes, it does. For example, the structure on the above sentence helps one determine whether, ‘I’ saw a girl who was carrying a telescope, or whether ‘I’ used a telescope to see a girl.

But it may not necessarily contain, more information, always. Sometimes, the structure is completely determined by the input sentence. But, still we might need it. In particular, once you have some pre-defined relations between parts of a sentence, and have a structure(with respect to these pre-defined relations) over the sentence, you can easily extract information from the sentence.

[[Think : Why not use BERT ? (Hint : BERT doesn’t do good, when you try to keep coherence, for long discourses)]]

TGrep is a tool you can use, to search through trees, for particular type of relations, or combinations thereof.

Shift Reduce Dependency Parsing

We figure out a clever way of generating trees incrementally, using the same kind of actions at each time step.

Arc-Standard Shift Reduce Parsing

Note how in the above algorithm, we have the same choices at each time step. And how even though that is the case, we can still generate any possible tree over a sentence.

Now, the question is what should we base our choice on ?

We should certainly send in the words in the stack and those in queue as input to our neural net, but should also send the tree structure that we built till now.. but how to do that? The input to a NN is generally of fixed size, & even if it is recurrent, how to send in the relations that we have made till now ?

Chen and Manning in 2014, chose to send in these features :

Top 3 words in buffer are the ones that are closest to being processed next. The children and grandchildren of words on stack, don’t reside on the stack. Words are removed from the stack, as soon as they get a parent.

But doing this kind of feature extraction doesn’t suit an ML engineer, when you have NNs that can predict these.

Just use RNNs to extract features.

Notice how the above is just a simple RNN, taking words one be one, just drawn differently. The parameters W can be different based on whether input word is Noun etc. or may even depend on the type of phrase the words captured till a point form.

We can also use LSTMs above, in which case, the forget gate values will correspond to ‘ignoring’ children. Like in above sentence ‘this’ doesn’t matter.. it could have been “I hate that movie”.

But, still how to use this in parsing?

We can use this for all words in the stack and for all the words in the buffer, surely. But how to do the above for the tree we have made till a point? Notice how the tree was made by following a sequence of steps; so we can just send in that sequence, as the representation of the complete tree formed till now!

Running LSTMs on stack, buffer and the sequence of steps selected till now, to make the feature vector (pt) for classification.

Part — II

Our aim :

Getting “meaning” from a natural sentence.

“Meaning” here is defined with respect to an automaton that will take the meaning representation and perform the desired task. There can be an underlying data(with some structure, possibly) like a knowledge graph, over which the automaton operates.

When the task is like to convert sentence to python program, there is no underlying data generally (See CoNaLa dataset). When the task is to convert sentence to say, an SQL query, underlying data in tables is present(See ATIS, Geo Query data, Spider Datasets).

The first basic approach for generating meaning representation from natural sentence, is to use a sequence to sequence model, but it may not always generate an output that is syntactically correct. We should try to use the rich syntactic structure to make our task easier. Like, in SQL program, we know there is a select clause, a from clause, etc. we only need to predict the arguments for these clauses. Moreover, there is structure over the arguments of these clauses too, for example, the arguments of select clause will be names of tables only.

The Core Research Question — How to add inductive bias(towards the structure/grammar of the output language) in the model ?

We can use different neural nets for all the clauses and generate each of them separately, and then our problem will be solved.

But how to do this for, more complex, say Python programs ? How to utilize their syntactic structure ?

One way to do this :

The tree structure on the right is generated layer-by-layer from the network shown on left.

But, what exactly is making the above model “structure aware” ?

The structure awareness is coming from the fact that we have hard-coded in our program, exactly, at what point, should the LSTM branch-off and how many steps the branch must go on for, based on the grammar of the programming language. For example, we know, the operation “>”, will have two operands, so as soon as the LSTM cell of current branch predicts a “>” ; we fork from that branch to produce two more outputs.

Another thing to note in the above model is that when given a language like Python, instead of writing a program that takes in particular values of arguments like “Dallas” or “16:00”; it is more useful & intuitive to just have variables in their place, and later on allow the model to assign values to those variables, based on the sentence. Which is exactly, what our next model does :

Breaking the task into steps to allow stronger guidance during training

The model above, in fact, takes the idea one step further, and generalizes it as breaking the generation of final program into any combination of coarser and finer details, which are filled in sequentially. Each step is conditioned on the initial input sentence, always. Note that this technique can be combined with the previous branching off one, or can be used independently.

Think : Will the technique of breaking program into a series of coarse to fine tasks become equivalent to branching-off, if we go on increasing the levels between the initial input and final output program? What exactly makes the techniques different?

Using Grammar of the Programming Language

We know each programming language corresponds to a formal grammar. And that every program in the programming language can be written as an “Abstract Syntax Tree(AST)” using the rules of the grammar, followed by filling in particular variable names/values, for the terminal nodes.

Our next model, generates this tree, incrementally :

The task of generating an AST is broken down into two type of actions. The sequential application of these actions can generate the whole tree.

In the above model, the tree is generated in a Depth-First manner, unlike the breadth first manner of the branching-off method.

TranX is an open source framework in which you can specify your own grammar, and use to generate AST, action sequences, corresponding to some program that is valid according to your grammar. You can also use their pre-trained models.

Side Note : It is important that your model learns to copy the value of variables from input sentence, as it is. If it isn’t able to do so, you can add a separate action corresponding to copying, like in CopyNet.

Main Problem : Very Less data available in the form of (intent, program)


“ Try to use weaker signals”

What are these weaker signals ?

Maybe.. use comments in the code as the intent of the corresponding line..

Or.. use the execution result..

This leads to a reinforcement learning framework for the problem. The agent has to guess the program which produces the correct program.

The agent keeps on trying different programs, against the same database. The reward is given when the program predicts the correct output.

As we can see above, there are programs, that produce correct output, but may not be correct themselves. Thus, the agent may get wrong signals. Think : How to remedy this ? [Hint : Use data augmentation]

The problems with using reinforcement learning for the above problem

Another way to solve the above, problems, is to bias the model, towards trying forms already used, in previous programs, so that they get confirmed. Other ways involve using heuristics :

The back-translation score for translating the code back to the query, would be much higher for the correct program. And hence, we will use that program.

Note that these reinforcement learning regimes are to be used in combination with the previous models.

Problems to think about

  1. Natural language and Code is usually, highly compositional; but, error propagates in LSTM’s as they become compositional or longer. Try to design models, that can infer as well as generate many levels of compositional structure, easily.
  2. Extend the model to operate nicely even when underlying data is in form of natural text; rather than databases.
Comparison of various tasks, on two directions of compositionality and breadth of domain.

Come join DA Labs on our quest to understand the workings of machines and how they learn ! Wonder and wander in the beautiful field , that is Deep Learning ! Any kind of feed-back/questions are always welcome & appreciated 😇 To join us in our projects/research, interact with us, or write with us, just chime in on our discord server ! :)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jeevesh Juneja

Jeevesh Juneja

Searching 🧐 for the forgotten and lost truths