Deep Reinforcement Learning (DeepRL) for Abstractive Text Summarization made easy (Tutorial 9)

amr zaki
Analytics Vidhya
Published in
6 min readOct 12, 2019

This tutorial is the 9th installment of the Abstractive Text Summarization made easy tutorial series, in this series we go through the newest approaches using deep learning to address the task of Abstractive Text Summarization, all the code for this series can be found here, which is written in tensorflow and runs seamlessly on google colab, and the data is found on google drive, so no need to neither download the data locally nor to run the code locally as all can be done for free on google colab.

Today we discuss a new approach that researchers have recently reached which is combining the powers of both Reinforcement Learning with Deep Learning to address multiple tasks, like in our case the NLP Abstractive Text Summarization task, so today we would discuss

  1. What is Reinforcement Learning & when to use it ?
  2. Why use it with Deep Learning ?

This tutorial is based on this amazing blog by Satwik Kansal and Brendan Martin which goes into details on Reinforcement learning and how to implement it.

Also we have used this amazing repo and their paper by yaserkl which builds a library using tensorflow to implement multiple approaches for Deep RL, and also we need to thank Paulus et al for his clear explanation of DeepRL techniques.

I would like to thank all of them for their great work.

0. About Series

This is a series of tutorials that would help you build an Abstractive text summarizion models using tensorflow in multiple approaches , we call it abstractive as we teach the neural network to generate words not to merely copy words

We have covered so far (code for this series can be found here)

0. Overview on the free ecosystem for deep learning (how to use google colab with google drive)

  1. Overview of the text summarization task and the different techniques for the task
  2. Data used and how it could be represented for our task (prerequisites for this tutorial)
  3. What is seq2seq for text summarization and why
  4. Multilayer Bidirectional LSTM/GRU
  5. Beam Search & Attention for text summarization
  6. Building a seq2seq model with attention & beam search
  7. Combination of Abstractive & Extractive methods for Text Summarization
  8. Teach seq2seq models to learn from their mistakes using deep curriculum learning

EazyMind free AI-As-a-service for Text Summarization

You can actually try generating your own summaries using the output of these series, through eazymind and see what you would eventually be able to build yourself. You can also call it through simple API calls, and through a python package, so that text summarization can be easily integrated into your application without the hassle of setting up the tensorflow environment. You can register for free, and enjoy using this API for free.

Let’s begin!

1. What is Reinforcement Learning & when to use it ?

Reinforcement Learning is a branch of machine learning capable of learning complex tasks that can’t be expressed in equations, meaning you learn by interacting with the learning environment.

Think of a self-driving car, the task (environment) can’t easily be expressed in an equation to be easily integrated into a machine learning algorithm, however a reinforcement learning algorithm would be capable of interacting with the environment so that it will be able to learn from it.

Reinforcement Learning is built on

  • the idea of trying out multiple actions in order to solve a problem
  • and learning when certain actions help solving the problem (the algorithm would receive a reward for this action) and when they don’t (the algorithm would receive a punishment for this action)

So we can say that in order for the Reinforcement Learning to learn to solve the task (environment) it must actively engage into the task (environment) and try out the different actions and see whether it receives a reward or a punishment, remember these actions and build upon them until reaching the ultimate goal

In this animation, we see a simple example built over the example by Satwik Kansal blog, which views a car trying to pick a person to his destination.

concept of animation from Satwik Kansal blog , graphics from freepik , animation by eazymind team

As we can see the car tries out different paths each iteration, and receives either a reward or a punishment, and store these results and build upon them till reaching the ultimate goal at the end.

There are some terminologies in Reinforcement Learning that would be helpful, so we call

  • the car : agent
  • the garage : environment
  • decision to either go (left,right,top,bottom) : action

so the above animation can be simplified to

You can dive deep on how to build your own reinforcement learning algorithm using python by following Satwik Kansal blog.

2- Why use it with Deep Learning ?

Recently there have been an approach of combining the reinforcement learning approaches with deep learning in what is called DeepRL, this actually turned to be extremely beneficial and successful especially in the field of NLP.

This came from the very fact that the way that deep networks try and optimize the NLP tasks is actually different from the way we tend to measure the accuracy of the output (as mentioned by yaserkl, his amazing repo and their paper and also this paper by Paulus et al)

They have pointed out that

  • we tend to use maximum likelihood estimation (MLE) metric to optimize the loss function in our deep architecture, as in any other task that we need to use deep learning to solve.
  • However we tend to use other metrics, specific for NLP, like BLEU and ROUGE metrics for evaluating the output

BLEU and ROUGE metrics are used in NLP tasks to measure the overlap between the words from the reference and from the output (i.e: the number of words that are seen in both the testing reference and the output sentence), as the overlap increases the score increase.

So we are actually trying to optimize something different from what we tend to evaluate the output with, this creates an inconstancy between training and testing metrics.

The real reason however for the deep learning methods to not be able to optimize the BLEU and ROUGE metrics directly comes to the fact that the deep architectures can only optimize differentiable metrics, while BLEU and ROUGE are non-differentiable metrics, they can’t simply be expressed in an equation to be integrated into a deep architecture.

Here comes the use of reinforcement learning, using its power to optimize non-differentiable metrics, a deep architecture can use it to optimize the BLEU and ROUGE algorithms.

In the next tutorial if GOD wills it, we would go into details on a Reinforcement learning algorithm that has been used for addressing the Abstractive Text Summarization task, it is called Policy Gradient, the code for this implementation can be found here.

I truly hope this tutorial has given you an insight on the worlds of Deep Learning and Reinforcement Learning, and hope to see you in next tutorials.

--

--