Combination of Abstractive & Extractive methods for Text Summarization (Tutorial 7)

amr zaki
HackerNoon.com
9 min readMay 18, 2019

--

Combining both Abstractive & Extractive methods for text summarization

This tutorial is the seventh one from a series of tutorials that would help you build an abstractive text summarizer using tensorflow .

Today we discover some novel ways of combining both abstractive & extractive methods of copying of words for text summarization , (code can be found here in jupyter notebook format for google colab ) , we would combine the concepts of generating new words , with copying of words from the given sentence , we would learn the reason this is important , and we would go through how it is actually done !!

EazyMind free Ai-As-a-service for text summarization

You can actually try generating your own summaries using this model easily through eazymind ( i have added this model to eazymind , so it can be called through simple api calls , and through a python package , so that this model can be easily integrated into your application without the hassle of setting up the tensorflow environment ) , you can register for free , and enjoy using this api for free .

Today we would go through concepts discussed in both these papers (Abstractive Text Summarization using Seq) & (Get To The Point: Summarization with Pointer-Generator Networks , their repo , their truly AMAZING blog post) , their work has been truly helpful , it has resulted in truly great results , I would really like to thank them for their amazing efforts

Today we would

  1. Discuss how we integrate both worlds of abstractive & extractive methods for text summarization.
  2. Quick overview on the code & preprocessing of the data (i have converted this model to a jupyter notebook to run seamlessly on google colab , and the data is found on google drive , so no need to download neither the code , nor the data , you only need a google colab session to run the code , and copy the data from my google drive to yours (more on this) , and connect google drive to your notebook of google colab).
  3. This model has been converted to an api (and a python package), that you can simply try it out in your projects without the hassle of actually setting up your tensorflow environment , you can Free Register on eazymind , and use this api for free now .

0- About Series

This is a series of tutorials that would help you build an abstractive text summarizer using tensorflow in multiple approaches , we call it abstractive as we teach the neural network to generate words not to merely copy words , today we would combine these concepts with extractive concepts to gain the benefits of the 2 worlds.

We have covered so far (code for this series can be found here)

0. Overview on the free ecosystem for deep learning (how to use google colab with google drive)

  1. Overview on the text summarization task and the different techniques for the task
  2. Data used and how it could be represented for our task (prerequisites for this tutorial)
  3. What is seq2seq for text summarization and why
  4. Mulitlayer Bidirectional LSTM/GRU
  5. Beam Search & Attention for text summarization
  6. Building a seq2seq model with attention & beam search

In these tutorials we have built the corner stone model that would enhance over it today , as all the newest approaches build upon this corner baseline model

to lets Begin !!

1- Why copying ?

Last tutorial , we have built a seq2seq model with attention and beam search capable of abstractive text summarization , the results were truly good , but it would suffer from some problem , out of vocabulary words (OOV),

1–1 out of vocabulary words

which are unseen words , actually this problem comes from the fact that we train our model with a limited vocab (as the vocab can never contain all English words) so in testing , our model would face new words that he didn't see before , normally we would model these words as <unk> , but actually this doesn’t generate good summaries !!

1–2 Wrong Factual Information

Another problem is that factual information are not generated accurately

given a sentence : In last night game , Germany beat Argentina 3–2

model would generate : Germany beat Argentina 2–1

this comes from the fact that the token 3–2 is actually unique , (not unknown , but unique) , harder for the model to regenerate , so it would be much easier if the model was able to copy the token 3–2 from the original sentence not generate it on its own .

1–3 Replacing names with similar wrong names

Also another problem can be seen with the exact names of people and countries , as our model would actually cluster same countries together using the concept of word embeddings , so we would see that the model actually sees both words (Delhi & Mumbai) the same , and would see names like ( Anna & Emily) the same , as they would have similar word embeddings .

So we would implement a model capable of copying of unique words from original sentence , as it is quite difficult for our model to regenerate these words by himself , this technique is called Pointer generator

2- What is Pointer generator ?

This is actually a neural network that is trained to learn when to generate novel words , and when to copy words from the original sentence .

It is called a pointer generator network as we use a pointer to point out to the word would be copied from the original sentence .

2–1 Our Basic structure

this graph has been borrowed from (Get To The Point: Summarization with Pointer-Generator Networks , their repo , their truly AMAZING blog post)

Basic structure is built as a seq2seq model (Mulitlayer Bidirectional LSTM Encode & a decoder with Beam Search & Attention) and to generate the output sentence , we use the output from both

  1. Decoder
  2. Attention (context vector) (i.e: attention actually tells us which words are important from our input )

from these 2 outputs , we would generate a probability distribution over all our vocab , this is called Vocabulary Distribution , this distribution helps us in generating the final output

so to keep in mind , we have 2 important distributions here :

1- A local distribution (Attention) which tells which words are important from input sentence

2- this local distribution (Attention) is used to calculate the Global distribution (Vocabulary Distribution) which tells the probability of relevance of output according to ALL words of the vocab

2–2 Now lets add our Pointer Generator network

Pointer Generator network here would be a neural network trained to choose from where to generate the output , either from

  1. the Global distribution (Vocabulary Distribution) , i.e : generate novel new words
  2. or from local distribution (Attention) , i.e : copy words from original sentence

this graph , and formula have been borrowed from (Get To The Point: Summarization with Pointer-Generator Networks , their repo , their truly AMAZING blog post)

so we would have a parameter Pgen , that would contain the probability of generating the word either from Vocab distribution (P vocab), or from Attention distribution (sum of attentions of words) , (i.e : either generate a new word , or copy the word from the sentence)

3- How to build a Pointer Generator network

There have been 2 main approaches for implementing this network , both rely on the same concept with a slight differentiation in the implementation ,

the main inputs would be

  1. Decoder inputs
  2. Attention inputs

Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond Paper

P here is Pgen , here we would get it by training a sigmoid layer ,

the inputs would be

  1. hi : hidden state of the decoder (output of decoder) → decoder parameter
  2. E[oi−1] : previous time step of decoder step → decoder parameter
  3. ci : attention-weighted context vector → attention inputs

Ws h ,Ws e ,Ws c , b s and v s are the learnable parameters.

Get To The Point: Summarization with Pointer-Generator Networks

1. st : the decoder state→ decoder parameter

2. xt : the decoder input→ decoder parameter

3. ht ∗ : context vector → attention inputs

where vectors wh ∗ , ws , wx and scalar bptr are learnable parameters

4- TensorFlow Implementation

abisee have implemented the paper Get To The Point: Summarization with Pointer-Generator Networks using tensorflow , his code is based on the TextSum code from Google Brain.

I have modified his code (my modification)

  • to run in a jupyter notebook to run seamlessly on google colab (more on this)
  • and have uploaded the data to google drive , to be easily integrated within google colab (more on this)

so no need to download neither the code , nor the data , you only need a google colab session to run the code , and copy the data from my google drive to yours (more on this) , and connect google drive to your notebook of google colab

5- Data representation

this model has been built on CNN/Daily Mail dataset , which is built to have multi summaries for the same story

the data is provided to the model through running it into a script that converts it into chunked binary files to be then provided to the model

I have modified this script (my modification) to be easier (in case you need to reprocess your own data)

the original script expects the data to be provided in a .story format , which is a data file contains both the text and its summary in the same file , so i just edited to be much simpler , now you can provide your data to my script in a csv format

I have also replaced the need to download a specific java script (Stanford CoreNLP ) for tokenization , with the simpler nltk tokenizer (hope this proves helpful)

6- EazyMind API

If you need to try out this model (before trying out the code) , you can easily do so through eazymind , which is a Free Ai-as-a-service platform , providing this pointer generator model for abstractive text summarization

You can also resister for free to call this model as an api through either curl

or through python package

then simply call it

Next Time if GOD wills it , we would go through

  • the new novel methods of combining reinforcement learning with seq2seq for abstractive text summarization

(more on different implementations for seq2seq for text summarization)

All the code for this tutorial is found as open source here .

I truly hope you have enjoyed reading this tutorial , and i hope i have made these concepts clear , all the code for this series of tutorials are found here , you can simply use google colab to run it , please review the tutorial and the code and tell me what do you think about it , don’t forget to try out eazymind for free text summarization generation , hope to see you again

--

--