Build an Abstractive Text Summarizer in 94 Lines of Tensorflow !! (Tutorial 6)

amr zaki
HackerNoon.com
12 min readApr 16, 2019

--

This tutorial is the sixth one from a series of tutorials that would help you build an abstractive text summarizer using tensorflow , today we would build an abstractive text summarizer in tensorflow in an optimized way .

Today we would go through one of the most optimized models that has been built for this task , this model has been written by dongjun-Lee , this is the link to his model , I have used his model model on different datasets (in different languages) and it resulted in truly amazing results , so I would truly like to thank him for his effort

I have made multiple modifications to the model to enable it to enable it to run seamlessly on google colab (link to my model) , and i have hosted the data onto google drive (more on how to link google drive to google colab) , so no need to download neither the code , nor the data , you only need a google colab session to run the code , and copy the data from my google drive to yours (more on this) , and connect google drive to your notebook of google colab

EazyMind free Ai-As-a-service for text summarization

I have added a text summarization model to a website eazymind so that you can actually try generating your own summaries yourself (and see what you would be able to build) , it can be called through simple api calls , and through a python package , so that text summarization can be easily integrated into your application without the hassle of setting up the tensorflow environment ) , you can register for free , and enjoy using this api for free .

0- Intro

0-A About Series

This is a series of tutorials that would help you build an abstractive text summarizer using tensorflow using multiple approaches , we call it abstractive as we teach the neural network to generate words not to merely copy words .

We have covered so far (code for this series can be found here)

0. Overview on the free ecosystem for deep learning (how to use google colab with google drive)

  1. Overview on the text summarization task and the different techniques for the task
  2. Data used and how it could be represented for our task (prerequisites for this tutorial)
  3. What is seq2seq for text summarization and why
  4. Mulitlayer Bidirectional LSTM/GRU
  5. Beam Search & Attention for text summarization

0-B About the Data Used

The data that would be used would be news and their headers , it can be found on my google drive , so you just copy it to your google drive without the need to download it (more on this)

We would represent the data using word embeddings , which is simply converting each word to a specific vector , we would create a dictionary for our words (more on this) (prerequisites for this tutorial)

0-C About the Model Used

There are different approaches for this task , they are built over a corner stone concept , and they keep on developing and building up .

Today we would start building this corner stone implementation which is a type of network called RNN , which is arranged in an Encoder/Decoder architecture called seq2seq (more on this), then we would build the seq2seq in a mulitlayer bidirectional structure , where the rnn cell would be a LSTM cell (more on this) , then we would add an attention mechanism to better interface the encoder with the decoder (more on this) , then to generate better output we use the ingenious concept of beam search (more on this)

the code for all these different approaches can be found here

so lets get started !!

Model Structure

our model can be seen to structured into different blocks these blocks are

Initialization Block :

Here we would initialize the needed tensorflow placeholders & variables , and here would define our RNN cell that would be used throughout the model

Embedding Block :

Here we would define the embedding matrix used in both the encoder & the decoder

Encoder Block :

Here we would define the multilayer bidirectional RNN (more on this) that forms the encoder part of our model , and we output the encoder state as an input to the decoder part

Decoder Block :

Here the decoder is actually portioned into 2 distinct parts

  1. Attention Mechanism (more on this) which is used to better interface the encoder with the decoder , this would be used in training phase
  2. BeamSearch (more on this) which is used to generate better output from our model , this would be used in testing phase

Loss Block :

This block would only be used in training phase , here we would apply clipping to our gradients , and we would actually run our optimizer (Adam Optimizer is used here) , and here is the place where we would apply our gradients to the optimizer.

1- Initialization Block

First we would need to import the libs that we would use

Before Building our Model Class we need to get define some tensorflow concepts first

So we tend to define placeholders like this

and for the variables we tend to define them as

Then lets build our Model Class

we would pass an obj called args that would actually contain multiple parameters from

  1. embedding size (size of word2vector)
  2. num_hidden (size of RNN)
  3. num_layers (layers of RNN) (more on this)
  4. Learning Rate
  5. BeamWidth (more on this)
  6. Keep Prob

we would also need to initialize the model with other paaremetrs like

  1. reversed dict (dict of keys , each key wich is a num points to a specific ord) (more on how to build your reversed dict)
  2. article_max_len & article_summary_len (max length of article sentence as input and max length of summary sentenceas output)
  3. Forward Only (bool value to indicate training or testing phase) (Forward Only = False → training phase)

then to continue the initalization

2- Embedding Block :

Here we would represent both our inputs articles that would be the embedded inputs and the decoder inputs using word2vector (more on this)

we would define our variables for embedding in a variable scope , we would name it embedding

3- Encoder Block :

Here we would actually define the multilayer bidirectional lstm for the encoder part of our seq2seq (more on this) , we would define our variables here in a name scope that we would call “encoder”.

Here we would use the concept of Dropout , we would use it after each cell in our architecture , it is used to randomly activate a subset of our net, and is used during training for regularization.

Now after defining the forward and backward cells , we would need to actually connect them together to form the bidirectional structure , so we would use stack_bidirectional_dynamic_rnn , which takes all of the following parameters as its inputs

  1. forward cells
  2. Backward Cells
  3. Encoder emb input (input articles in word2vector format)
  4. X_len (length of articles)
  5. Using time_major = True is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation.

Now we would need to actually use the output from this stack_bidirectional_dynamic_rnn function , we mainly need 2 main outputs

  1. encoder_output (would be used in attention calculation) (more on attention)
  2. encoder_state (would be used for the initial state of the decoder)

so to get encoder_output we simply

then to get encoder_state , we would combine both (encoder_state_c) & (encoder_state_h) of both the forward & backward using LSTMStateTuple

4- Decoder Block :

Here the decoder is divided into 2 parts

  1. Training part (to train attention model) (more on attention model)
  2. testing/running part (for attention & beam search) (more on beam search)

so lets first define out (name scope) & (variable scope) for both parts , we would also define a multilayer cell structure that would be also used for both parts

4.a Training Part (Attention Model)

First we need to prepare our attention structure , here we would use BahdanauAttention

encoder_output would be used inside the attention calculation (more on attention model)

then we would further define the decoder cell (as from the first step in decoder , we just defined the decoder cell as a simple multilayer lstm , now we would add attention) , to do this we would use AttentionWrapper , which combines attention_mechanism with decoder cell

Now we would need to define the inputs to the decoder cell , this input actually comes from 2 sources (more on seq2seq)

  1. encoder output (used within initial step)
  2. decoder input (summary sentence in the training phase)

so lets first define the initial state that would come from the encoder

now we would combine both the initial state with the decoder input (summary sentence) , here to use the BasicDecoder , we need to provide the decoder input through a helper , this helper would combine all of (decoder_emb_inp , decoder_len) together

now for the last step of the training phase , we would need to define the outputs (logits)from the decoder , to be used within the loss block for training

4.b Testing/Running part (Attention & Beam search)

Here in this phase , there are 2 main goals

  1. divide the encoder output & encoder states & x_len (article length) to parts to actually perform the beam search methodology (more on beam search)
  2. build a decoder independent on decoder input , as in the testing phase we don’t have the summary sentence as our input , so we would need to build the decoder in a different way than above

first lets divide encoder output & encoder states & x_len (article length) to parts to actually perform the beam search methodology , here we would use beam_width variable that was already defined above

then lets define the attention mechanism (just like before , but taking the tiled variables into consideration)

then lets define our decoder , but here we would use the BeamSearchDecoder , this takes into consideration all of

  1. Decoder cell (previously defined)
  2. Embedding word2vector (defined in embedding part)
  3. projection layer (defined in the beginning of class)
  4. decoder initial state (previously defined)
  5. beam_width (user defined)
  6. start token & end token

then all what is left to do , is to define the outputs , that would actually directly reflect to the real output from the whole seq2seq architecture , as this phase is where prediction is actually computed

5- Loss Block :

This block is where training actually occurs , here training actually occurs through multiple steps

  1. calculating loss (more on loss calculation)
  2. calculating gradients and applying clipping on gradients (more on exploding gradients)
  3. applying optimizer (here we would use Adam optimizer)

First we define our name scope , and we would specify that this block would only work through the training phase

Second we would calculate the loss (more on loss calculation)

Third we would calculate our gradients , and apply clipping on gradients to solve the problem of exploding gradients (more on exploding gradients)

(from tutorial 4)

Exploding Gradients : Occurs with deep networks (i.e: networks with many layers like in our case) , when we apply back propagation, the gradients would get too large . Actually this error can be solved rather easy , using the concept of gradient clipping , which is simply setting a specific threshold , that when the gradients exceed it , we would clip it to a certain value .

Forth we would apply our optimizer , here we would use Adam optimizer , here we would use the previously defined learning_rate

Next Time if GOD wills it , we would go through

  1. the code needed to divide our data into batches
  2. needed code to use this model for training

Then after we are done with this core model implementation , if GOD wills it , we would go other modern implementations for text summarization like

  1. pointer generator
  2. Using reinforcement learning with seq2seq

(more on different implementations for seq2seq for text summarization)

All the code for this tutorial is found as open source here .

I truly hope you have enjoyed reading this tutorial , and i hope i have made these concepts clear , all the code for this series of tutorials are found here , you can simply use google colab to run it , please review the tutorial and tell me what do you think about it , hope to see you again

Next Tutorials

--

--