Abstractive text summary with Reinforcement Learning

MetaMind released their algorithm for abstractive text summary

Extractive models select relevant phrases of the input document and concatenate them to form a summary. They also cannot paraphrase like people sometimes do.

Abstractive models generate a summary based on the actual “abstracted” content which can use words that were not in the original input. This gives them a lot more potential to produce fluent and coherent summaries

The framework is called an encoder-decoder RNN (or Seq2Seq) and is the basis of our summarization model with a bidirectional encoder. This helps our model to have a better representation of the input context.

Temporal attention mechanism

This attention is then modulated to ensure that the model uses different parts of the input when generating the output text, hence increasing information coverage of the summary.

The intra-decoder attention function that can look back at previous hidden states of the decoder RNNs. Finally, the decoder combines the context vector from the temporal attention with the one from the intra-decoder attention to generate the next word in the output summary

Supervised Learning

To train this model on real-world data like news articles, a common way is to use the teacher forcing algorithm: a model generates a summary while using a reference summary, and the model is assigned a word-by-word error

Reinforcement Learning

A different kind of training called reinforcement learning (RL) can be applied for the abstractive text summarization.

we rely on an automated evaluation metric called ROUGE (Recall-Oriented Understudy for Gisting Evaluation). ROUGE works by comparing matching sub-phrases in the generated summaries against sub-phrases in the ground truth reference summaries, even if they are not perfectly aligned. Different variants of ROUGE (ROUGE-1, ROUGE-2, ROUGE-L) all work in the same fashion but use different sub-sequence lengths.

The results

The combination of our intra-decoder attention RNN model with joint supervised and RL training improves this score to 39.87, and 41.16 with RL only. The pure RL model has higher ROUGE scores, the supervised+RL model has a higher readability

This article is adopted on ‘MetaMind Research’ at https://metamind.io/research/your-tldr-by-an-ai-a-deep-reinforced-model-for-abstractive-summarization