Exploring NLP Techniques for Text Summarization
In this blog post, we will explore the fascinating world of NLP techniques for text summarization and delve into the potential applications and challenges of this emerging field.
What is it for ?
Text summarization involves condensing lengthy documents into concise summaries, making it easier for readers to grasp the main points without going through the entire text.
Lets build it 🔨
Study case : Text summarization on Article
Problem : How to summarize the article ?
Library that we are going to use here;
pip install bs4
pip install nltk
pip install urllib
pip install heapq
Now after we install the library lets import it in our code;
import bs4 as bs
import urllib.request
import re
import nltk
import heapq
nltk.download('stopwords')
Fetch and parsing the article data
src = urllib.request.urlopen('PUT THE ARTICLE URL HERE').read()
soup = bs.BeautifulSoup(src,'lxml')
txt = ""
for paragraph in soup.findAll('p'):
txt += paragraph.text
Preprocess The text
#Preprocess The text
txt = re.sub(r'\[[0-9]+\]', '', txt) # Remove square brackets with numbers inside
txt = re.sub(r'\+', '', txt) # Remove plus sign
txt = re.sub(r'\s+', ' ', txt) # Replace multiple spaces with a single space
txt = re.sub(r'\([^)]*\)', '', txt) # Remove content inside parentheses
txt = re.sub(r'\s+', ' ', txt) # Replace multiple spaces with a single space
Tokenize text into sentences and words
sentences = nltk.sent_tokenize(txt)
stop_words = nltk.corpus.stopwords.words('english')
countword = {}
for word in nltk.word_tokenize(txt):
if word not in stop_words:
if word not in countword.keys():
countword[word] = 1
else:
countword[word] += 1
for key in countword.keys():
countword[key] = countword[key]/max(countword.values())
Calculating the sentence score
score = {}
for sentence in sentences:
for word in nltk.word_tokenize(sentence.lower()):
if word in countword.keys():
if len(sentence.split(' ')) < 25: # Correction: should be ">" instead of "<"
if sentence not in score.keys():
score[sentence] = countword[word]
else:
score[sentence] += countword[word]
Get the summary
the_sum = heapq.nlargest(2, score,key=score.get)
for sentence in the_sum:
print(the_sum)
Full-Code
Conclusion
As we have explored in this blog post, NLP techniques for text summarization have the potential to revolutionize information processing and retrieval. You can try it yourself and do an experiment using other technique or methods.
Hey 👋 Enjoying the content? If you find it valuable, why not subscribe and follow Me on Medium for more content!
🔔 Subscribe & Follow ➡️ [Medium](https://medium.com/@itjustzanhavingfun)
By subscribing, you’ll get the latest updates, tips, and content delivered right to your inbox. Don’t miss out on what we share!
Thank you for your support! 🙌
In this post, I kindly want to see the effectiveness of Call to Action (CTA).