Summary: Learning Neural Templates for Text Generation (EMNLP 2018)

Sameer Singh
UCI NLP
Published in
2 min readOct 23, 2018

Authors: Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

Consider the task of text generation conditioned on some factual information, e.g. a sentence that describes a restaurant, given a structured table its attributes. There are two existing approaches: (1) traditional template-based generation (“<name> is a <cuisine> restaurant in <location>”) that tends to express correct factual information, but is not fluent/natural, and (2) newer encoder/decoder models that generate fluent text, but sometimes hallucinate facts, and are difficult to control and interpret. This paper combines these two paradigms by introducing neural templates, where the template is defined by a sequence of discrete labels, while the instantiation is using neural emission for each label. The following figure gives an example.

Source: Original paper (https://arxiv.org/pdf/1808.10122.pdf)

The most interesting part of the paper is the model that combines Hidden Semi-Markov Models (HSMMs) with neural decoders. In HMMs, there is a latent sequence of z classes (one for each token) that each generates a single token. In HSMMs, the latent sequence z corresponds to a sequence of segments, where each z generates a segment containing multiple tokens. Forward-backward is still efficient for HSMMs, especially if you limit the maximum length of any segment. This paper extends this model to be neural by making the emission (of multi-token segments) to be RNNs. Figure time:

The model learns the latent templates, with only pairs of (x,y) available during training. There are a bunch of additional things that make it practical: backpropagating through the dynamic program, limiting maximum length, having an autoregressive component (only on y), factorizing the transition matrix, disallowing certain transitions, etc. The results, on E2E and WikiBio, are competitive with sequence decoder models, however, the neural template models definitely lead to more interpretable and controllable generation.

This was a fun paper to read; it combines the advantages of template-based intuition with the neural approaches in a clean, elegant manner. The dynamic programming of semi-Markov models is quite impressive (I was more familiar with the semi-CRF variant), and it’s great to see it applied again. Still, I feel the model is trying to capture a bit too much in its states, and may need many states (K) to cover the subtleties of language. I wonder how it would do on straight-up, unconditional language modeling, since finding controllable templates can be pretty useful in general (think dialogs!).

--

--