Exploring NLP Techniques for Text Summarization

In this blog post, we will explore the fascinating world of NLP techniques for text summarization and delve into the potential applications and challenges of this emerging field.

Ahmad Mizan Nur Haq
Data And Beyond
2 min readAug 5, 2023

--

What is it for ?

Text summarization involves condensing lengthy documents into concise summaries, making it easier for readers to grasp the main points without going through the entire text.

Lets build it 🔨

Study case : Text summarization on Article

Problem : How to summarize the article ?

Library that we are going to use here;

pip install bs4
pip install nltk
pip install urllib
pip install heapq

Now after we install the library lets import it in our code;

import bs4 as bs
import urllib.request
import re
import nltk
import heapq

nltk.download('stopwords')

Fetch and parsing the article data

src = urllib.request.urlopen('PUT THE ARTICLE URL HERE').read()
soup = bs.BeautifulSoup(src,'lxml')

txt = ""
for paragraph in soup.findAll('p'):
txt += paragraph.text

Preprocess The text

#Preprocess The text
txt = re.sub(r'\[[0-9]+\]', '', txt) # Remove square brackets with numbers inside
txt = re.sub(r'\+', '', txt) # Remove plus sign
txt = re.sub(r'\s+', ' ', txt) # Replace multiple spaces with a single space
txt = re.sub(r'\([^)]*\)', '', txt) # Remove content inside parentheses
txt = re.sub(r'\s+', ' ', txt) # Replace multiple spaces with a single space

Tokenize text into sentences and words

sentences = nltk.sent_tokenize(txt)
stop_words = nltk.corpus.stopwords.words('english')

countword = {}
for word in nltk.word_tokenize(txt):
if word not in stop_words:
if word not in countword.keys():
countword[word] = 1
else:
countword[word] += 1

for key in countword.keys():
countword[key] = countword[key]/max(countword.values())

Calculating the sentence score

score = {}
for sentence in sentences:
for word in nltk.word_tokenize(sentence.lower()):
if word in countword.keys():
if len(sentence.split(' ')) < 25: # Correction: should be ">" instead of "<"
if sentence not in score.keys():
score[sentence] = countword[word]
else:
score[sentence] += countword[word]

Get the summary

the_sum = heapq.nlargest(2, score,key=score.get)
for sentence in the_sum:
print(the_sum)

Full-Code

Conclusion

As we have explored in this blog post, NLP techniques for text summarization have the potential to revolutionize information processing and retrieval. You can try it yourself and do an experiment using other technique or methods.

Hey 👋 Enjoying the content? If you find it valuable, why not subscribe and follow Me on Medium for more content!

🔔 Subscribe & Follow ➡️ [Medium](https://medium.com/@itjustzanhavingfun)

By subscribing, you’ll get the latest updates, tips, and content delivered right to your inbox. Don’t miss out on what we share!

Thank you for your support! 🙌

In this post, I kindly want to see the effectiveness of Call to Action (CTA).

--

--