Text summarizer using deep learning made easy

Published in

HackerNoon.com

8 min readJan 11, 2019

In this series we will discuss a truly exciting natural language processing topic that is using deep learning techniques to summarize text , the code for this series is open source , and is found in a jupyter notebook format , to allow it to run on google colab without the need to have a powerful gpu , in addition all data is open source , and you don’t have to download it , as you can connect google colab with google drive and put your data directly onto google drive , without the need to download it locally , read this blog to learn more about google colab with google drive .

To summarize text you have 2 main approaches (i truly like how it is explained in this blog)

Extractive method , which is choosing specific main words from the input to generate the output , this model tends to work , but won’t output a correctly structured sentences , as it just selects words from input and copy them to the output , without actually understanding the sentences , think of it as a highlighter .

2. Abstractive method , which is building a neural network to truly workout the relation between the input and the output , not merely copying words , this series would go though this method , think of it like a pen.

this series is made for whomever feels excited to learn the power of building a deep network that is capable of

analyzing sequences of input
understanding text
outputting sequences of output in form of summarizes

hence the name of seq2seq , sequence of inputs to sequence of outputs , which is the main algorithm that is used here .

This series would go into details on how to

build your deep learning network online without the need to have a powerful computer
Access your datasets online , without the need to download the datasets to your computer.
Build a tensorflow networks to address the task

Multiple research has been done throughout the last couple of years , I am currently researching these new approaches , in this series we would go through some of these approaches.

This series implement its code using google colab , so no need to have a powerful computer to implement these ideas , I am currently working on converting the most recent researches to a google colab notebooks for researches too try them out without the need to have powerful gpus , also all the data can be used without the need to download them , as we would use google drive with google colab , read this blog to learn more about how you can work on google ecosystem for deep learning

All the code would be available on this github repo , which contains modifications on some open source implementations of text stigmatization

these researches mainly include

implementations using a seq2seq encoder(bi directional lstm ) decoder (with attention)

this is a crucial implementation , as it is the cornerstone of any recent research for now i have collected different approaches that implement this concept

2. other implementation that i have found truly interesting is a combination of creating new sentences for summarization , with copying from source input , this method is called pointer generator , here is my modification in a google colab to the original implementation

3. other implementations that i am currently still researching , is the usage of reinforcement learning with deep learning

This series would be built to be easily understandable for any newbie like myself , as you might be the one that introduces the newest architecture to be used as the newest standard for text summarization , so lets begin !!

The following is a quick overview on the series , i hope you enjoy it

EazyMind free Ai-As-a-service for text summarization

I have added a text summarization model to a website eazymind so that you can actually try generating your own summaries yourself (and see what you would be able to build) , it can be called through simple api calls , and through a python package , so that text summarization can be easily integrated into your application without the hassle of setting up the tensorflow environment ) , you can register for free , and enjoy using this api for free .

1 - Building your deep work online

we would be using google colab for our work , this would enable us to use their free gpu time to build our network , ( this blog would give you even more insights on the free ecosystem for your deep project)

you have 2 main options to build your google colab

Build a new empty colab
Build from github , you can use this repo , which is a collection of different

you can find the details on how to do this in this blog

having your code on google colab enables you to

connect to google drive (put your datasets onto google drive )
free gpu time

you can find how to connect to google drive in this blog

2- Lets represent words

since our task is a nlp task we would need a way to represent words ,this have 2 main approaches that we would discuses ,

either providing the network with a representation for each word , this is called word embedding , which is simply representing a certain word by a an array of numbers , There are multiple already trained word embedding available online , one of them is Glove vectors
or letting the network understand the representations by itslef

3- The used Datasets

For this task we would use a dataset in form of news and their headers , the most popular is using the CNN/Daily Mail dataset , the news body is used as the input for our model , while the header would be used as the summary target output .

These datasets could be found easily online , we would use 2 main approaches for using these datasets

using the raw data itseld , and manually applying processing on them
using a prepossessed version for the data , it is currently used in the most recent researches

4 - Models used

Here i would briefly talk about the models that would be included if GOD wills in the coming series , hope you enjoy

A .Corner Stone model

to implement this task , researchers use a deep learning model that consists of 2 parts , an encoder , that understands the input , and represent it in an internal representation , and feed it to another part of the network which is the decoder ,

The main deep learning network that is used for these 2 parts in a LSTM , which stands for long short term memory , which is a modification on the rnn

in the encoder we mainly use a multi-layer bidirectional LSTM , while in the decoder we use an attention mechanism , more on this later

B .Pointer Generator

But researchers found 2 main problems with the above implementation , like discussed in this ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks , they have a truly amazing blog you need to see

which is

the inability of the network to copy Facts (like names , and match scores) as it doesn’t copy words , it generates them , so it sometimes incapable of generating facts correctly
Repetition of words

this research builds on these 2 main problems and try to fix them , I have modified their repo to work inside a jupyter notebook on google colab

C. Using Reinforcement learning with deep learning

I am still researching on this work , but it is a truly interesting research , it is about combing two fields together , it actually uses the pointer generator in its work (like in implementation B ) , and uses the same prepossessed version of the data .

This is the research , it uses this repo for its code

they actually are trying to fix 2 main problems with the corner stone implementation which are

the decoder in the training , uses the (1 output from the encoder) , (2 the actual summary) , (3 and then uses its current output for the next action) , while in testing it doesn’t have a ground truth , as we it is actually needed to be generated , so it only uses (1 output from the encoder) (2 and then uses its current output for the next action) , this causes an Exposure Problem
the training of the network relies on a metric for measuring the loss , which is different from the metric used in testing , as the metric used in training is the cross entropy loss , while the metric for the testing (like discussed below) is non-differentiable measures such as BLEU and ROUGE

I am currently working on implementing this approach in a jupyter notebook , so if GOD wills it , you would see more updates concerning this in the near future .

4 — Summary Evaluation

to evaluate a summary , we use a non-differentiable measures such as BLEU and ROUGE , they simply try to find the common words between the input and the output , the more the better , most of the above approches score from 32 to 38 rouge scores

I hope you enjoyed this quick overview on the series , my main focus in these blogs is to present the topic of text summarization in easy and practical way , providing you with an actual code that is runnable on any computer , without the need to have a powerful GPU , and to connect you to the latest researches about this topic , please sow your support by clapping to this blog , and don’t forget to check out the code of these blogs

In the coming blogs if GOD wills it , i would go through the details to build the corner stone implementation , that actually all the modern researches are based apon it , we will use word embedding approach , and we would use the raw data , and manually apply preprocessing

While in later blogs if GOD wills it , we would go through modern approaches like how you would be able to create a pointer generator model , to fix the problems mentioned above , and using reinforcement learning with deep learning .