Text summarizer using deep learning made easy
In this series we will discuss a truly exciting natural language processing topic that is using deep learning techniques to summarize text , the code for this series is open source , and is found in a jupyter notebook format , to allow it to run on google colab without the need to have a powerful gpu , in addition all data is open source , and you don’t have to download it , as you can connect google colab with google drive and put your data directly onto google drive , without the need to download it locally , read this blog to learn more about google colab with google drive .
To summarize text you have 2 main approaches (i truly like how it is explained in this blog)
- Extractive method , which is choosing specific main words from the input to generate the output , this model tends to work , but won’t output a correctly structured sentences , as it just selects words from input and copy them to the output , without actually understanding the sentences , think of it as a highlighter .
2. Abstractive method , which is building a neural network to truly workout the relation between the input and the output , not merely copying words , this series would go though this method , think of it like a pen.
this series is made for whomever feels excited to learn the power of building a deep network that is capable of
- analyzing sequences of input
- understanding text
- outputting sequences of output in form of summarizes
hence the name of seq2seq , sequence of inputs to sequence of outputs , which is the main algorithm that is used here .
This series would go into details on how to
- build your deep learning network online without the need to have a powerful computer
- Access your datasets online , without the need to download the datasets to your computer.
- Build a tensorflow networks to address the task
Multiple research has been done throughout the last couple of years , I am currently researching these new approaches , in this series we would go through some of these approaches.
This series implement its code using google colab , so no need to have a powerful computer to implement these ideas , I am currently working on converting the most recent researches to a google colab notebooks for researches too try them out without the need to have powerful gpus , also all the data can be used without the need to download them , as we would use google drive with google colab , read this blog to learn more about how you can work on google ecosystem for deep learning
All the code would be available on this github repo , which contains modifications on some open source implementations of text stigmatization
these researches mainly include
- implementations using a seq2seq encoder(bi directional lstm ) decoder (with attention)
this is a crucial implementation , as it is the cornerstone of any recent research for now i have collected different approaches that implement this concept
2. other implementation that i have found truly interesting is a combination of creating new sentences for summarization , with copying from source input , this method is called pointer generator , here is my modification in a google colab to the original implementation
3. other implementations that i am currently still researching , is the usage of reinforcement learning with deep learning
This series would be built to be easily understandable for any newbie like myself , as you might be the one that introduces the newest architecture to be used as the newest standard for text summarization , so lets begin !!
The following is a quick overview on the series , i hope you enjoy it
I have added a text summarization model to a website eazymind so that you can actually try generating your own summaries yourself (and see what you would be able to build) , it can be called through simple api calls , and through a python package , so that text summarization can be easily integrated into your application without the hassle of setting up the tensorflow environment ) , you can register for free , and enjoy using this api for free .
1 - Building your deep work online
we would be using google colab for our work , this would enable us to use their free gpu time to build our network , ( this blog would give you even more insights on the free ecosystem for your deep project)
you have 2 main options to build your google colab
- Build a new empty colab
- Build from github , you can use this repo , which is a collection of different
you can find the details on how to do this in this blog
having your code on google colab enables you to
- connect to google drive (put your datasets onto google drive )
- free gpu time
you can find how to connect to google drive in this blog
2- Lets represent words
since our task is a nlp task we would need a way to represent words ,this have 2 main approaches that we would discuses ,
- either providing the network with a representation for each word , this is called word embedding , which is simply representing a certain word by a an array of numbers , There are multiple already trained word embedding available online , one of them is Glove vectors
- or letting the network understand the representations by itslef
3- The used Datasets
For this task we would use a dataset in form of news and their headers , the most popular is using the CNN/Daily Mail dataset , the news body is used as the input for our model , while the header would be used as the summary target output .
These datasets could be found easily online , we would use 2 main approaches for using these datasets
- using the raw data itseld , and manually applying processing on them
- using a prepossessed version for the data , it is currently used in the most recent researches
4 - Models used
Here i would briefly talk about the models that would be included if GOD wills in the coming series , hope you enjoy
to implement this task , researchers use a deep learning model that consists of 2 parts , an encoder , that understands the input , and represent it in an internal representation , and feed it to another part of the network which is the decoder ,
The main deep learning network that is used for these 2 parts in a LSTM , which stands for long short term memory , which is a modification on the rnn
in the encoder we mainly use a multi-layer bidirectional LSTM , while in the decoder we use an attention mechanism , more on this later
But researchers found 2 main problems with the above implementation , like discussed in this ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks , they have a truly amazing blog you need to see
which is
- the inability of the network to copy Facts (like names , and match scores) as it doesn’t copy words , it generates them , so it sometimes incapable of generating facts correctly
- Repetition of words
this research builds on these 2 main problems and try to fix them , I have modified their repo to work inside a jupyter notebook on google colab
C. Using Reinforcement learning with deep learning
I am still researching on this work , but it is a truly interesting research , it is about combing two fields together , it actually uses the pointer generator in its work (like in implementation B ) , and uses the same prepossessed version of the data .
This is the research , it uses this repo for its code
they actually are trying to fix 2 main problems with the corner stone implementation which are
- the decoder in the training , uses the (1 output from the encoder) , (2 the actual summary) , (3 and then uses its current output for the next action) , while in testing it doesn’t have a ground truth , as we it is actually needed to be generated , so it only uses (1 output from the encoder) (2 and then uses its current output for the next action) , this causes an Exposure Problem
- the training of the network relies on a metric for measuring the loss , which is different from the metric used in testing , as the metric used in training is the cross entropy loss , while the metric for the testing (like discussed below) is non-differentiable measures such as BLEU and ROUGE
I am currently working on implementing this approach in a jupyter notebook , so if GOD wills it , you would see more updates concerning this in the near future .
4 — Summary Evaluation
to evaluate a summary , we use a non-differentiable measures such as BLEU and ROUGE , they simply try to find the common words between the input and the output , the more the better , most of the above approches score from 32 to 38 rouge scores
I hope you enjoyed this quick overview on the series , my main focus in these blogs is to present the topic of text summarization in easy and practical way , providing you with an actual code that is runnable on any computer , without the need to have a powerful GPU , and to connect you to the latest researches about this topic , please sow your support by clapping to this blog , and don’t forget to check out the code of these blogs
In the coming blogs if GOD wills it , i would go through the details to build the corner stone implementation , that actually all the modern researches are based apon it , we will use word embedding approach , and we would use the raw data , and manually apply preprocessing
While in later blogs if GOD wills it , we would go through modern approaches like how you would be able to create a pointer generator model , to fix the problems mentioned above , and using reinforcement learning with deep learning .
Next Tutorials
- Text Summarization made easy , Text Representation (tutorial 2)
- What is seq2seq for text summarization and why (tutorial 3)
- Multilayer Bidirectional LSTM/GRU for text summarization made easy (tutorial 4)
- Beam Search & Attention for text Summarization made Easy (Tutorial 5)
- Build an Abstractive Text Summarizer in 94 Lines of Tensorflow !! (Tutorial 6)
- Combination of Abstractive & Extractive methods for Text Summarization (Tutorial 7)