6 Useful Text Summarization Algorithm in Python

Sarowar Jahan Saurav
5 min readOct 17, 2023

--

Are you fascinated by the magic of Python algorithms that can distill vast oceans of text into concise, insightful summaries? 📚✂️ Get ready to embark on an exhilarating journey into the realm of text summarization with Python, where words transform into meaningful insights at the speed of light! 🌟 In this comprehensive guide, we will unravel the secrets behind one of the most compelling applications of natural language processing (NLP). Whether you’re a coding enthusiast, a data science aficionado, or simply curious about the world of AI, this blog is your gateway to mastering the art of extracting essential information from mountains of data. 🧠💡 Join us as we dive deep into the intricacies of algorithms, explore cutting-edge libraries, and demystify the entire process step by step. By the end of this journey, you’ll wield the power to transform lengthy articles, research papers, and documents into concise, digestible gems. Ready to embark on this adventure? Let’s code our way to effective communication and knowledge extraction! 🚀👩‍💻👨‍💻

Text summarization have 2 different scenarios i.e. “Extractive” & “Abstractive” .

Extractive Text Summarization

As the name implies, extractive text summarizing ‘extracts’ significant information from enormous amounts of text and arranges it into clear and succinct summaries. The approach is simple in that it extracts texts based on factors such the text to be summarized, the most essential sentences (Top K), and the importance of each of these phrases to the overall subject. This, however, implies that the approach is constrained to specified parameters, which might lead to biased retrieved text under certain scenarios.
Extractive text summarizing is the most often utilized approach by automated text summarizers due to its simplicity in most use scenarios.

Abstractive Text Summarization

Abstractive text summarization creates readable sentences from the complete text input. Large volumes of text are rewritten by producing acceptable representations, which are then analyzed and summarized using natural language processing. What distinguishes this technology is its almost AI-like capacity to parse text utilizing a machine’s semantic capabilities and iron out wrinkles using NLP.
Although it is not as straightforward to utilize as the extractive technique, abstract summary is significantly more beneficial in many cases. In many ways, it is a forerunner to full-fledged AI authoring tools. This is not to say that extractive summarization is unnecessary.

6 techniques for text summarization in Python

Here are five approaches to text summarization using both abstractive and extractive methods.

1. SUMY

Sumy is a library and command line utility for extracting summary from HTML pages or plain texts. It provides several algorithms for summarization including LSA, Luhn, Edmundson, and more.

Here’s an example of how to use Sumy with the LSA algorithm.First, install the Sumy library using pip:

pip install sumy
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

# Input text to be summarized
input_text = """
Your input text goes here. It can be a long paragraph or multiple paragraphs.
"""

# Parse the input text
parser = PlaintextParser.from_string(input_text, Tokenizer("english"))

# Create an LSA summarizer
summarizer = LsaSummarizer()

# Generate the summary
summary = summarizer(parser.document, sentences_count=3) # You can adjust the number of sentences in the summary

# Output the summary
print("Original Text:")
print(input_text)
print("\nSummary:")
for sentence in summary:
print(sentence)

2. BERT Extractive Summarization

BERT (Bidirectional Encoder Representations from Transformers) can also be used for extractive summarization, where sentences are ranked based on their importance and the top sentences form the summary. The bert-extractive-summarizer library provides a simple interface for BERT-based extractive summarization.

First, install the library using pip:

pip install bert-extractive-summarizer

Here’s an example of how to use BERT for extractive summarization:

from summarizer import Summarizer

# Input text to be summarized
input_text = """
Your input text goes here. It can be a long paragraph or multiple paragraphs.
"""

# Create a BERT extractive summarizer
summarizer = Summarizer()

# Generate the summary
summary = summarizer(input_text, min_length=50, max_length=150) # You can adjust the min_length and max_length parameters

# Output the summary
print("Original Text:")
print(input_text)
print("\nSummary:")
print(summary)

3. BART Abstractive Summarization

In addition to extractive summarization, BART can also be used for abstractive summarization. Here’s how you can use BART for abstractive summarization using the transformers library:

from transformers import BartTokenizer, BartForConditionalGeneration

# Load pre-trained BART model and tokenizer
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

# Input text to be summarized
input_text = """
Your input text goes here. It can be a long paragraph or multiple paragraphs.
"""

# Tokenize and summarize the input text using BART
inputs = tokenizer.encode("summarize: " + input_text, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = model.generate(inputs, max_length=100, min_length=50, length_penalty=2.0, num_beams=4, early_stopping=True)

# Decode and output the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Original Text:")
print(input_text)
print("\nSummary:")
print(summary)

4. T5 Abstractive Summarization

T5 (Text-to-Text Transfer Transformer) is a versatile transformer model that can be applied to various NLP tasks, including summarization. Here’s how you can use T5 for abstractive summarization using the transformers library:

from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load pre-trained T5 model and tokenizer
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Input text to be summarized
input_text = """
Your input text goes here. It can be a long paragraph or multiple paragraphs.
"""

# Tokenize and summarize the input text using T5
inputs = tokenizer.encode("summarize: " + input_text, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = model.generate(inputs, max_length=150, min_length=50, length_penalty=2.0, num_beams=4, early_stopping=True)

# Decode and output the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Original Text:")
print(input_text)
print("\nSummary:")
print(summary)

5. Gensim

Gensim is a Python library for topic modeling and document similarity analysis. It also provides a simple implementation of TextRank, an unsupervised algorithm based on graph theory.

First, make sure to install the Gensim library if you haven’t already:

pip install gensim
from gensim.summarization import summarize

# Input text to be summarized
input_text = """
Your input text goes here. It can be a long paragraph or multiple paragraphs.
"""

# Generate the summary using TextRank algorithm
summary = summarize(input_text, ratio=0.3) # You can adjust the ratio parameter based on the summary length you desire

# Output the summary
print("Original Text:")
print(input_text)
print("\nSummary:")
print(summary)

6. TextTeaser

TextTeaser is an automatic summarization algorithm that takes an article and provides a summary. It’s based on the TextRank algorithm and works well for generating concise summaries. TextTeaser is not available as a standalone Python library, but you can use the TextTeaser API. First, you’ll need to make an API request to generate a summary:

import requests

# Input text to be summarized
input_text = """
Your input text goes here. It can be a long paragraph or multiple paragraphs.
"""

# Make a POST request to the TextTeaser API
response = requests.post("http://www.textteaser.com/api", data={"text": input_text})

# Extract and output the summary from the API response
summary = response.text
print("Original Text:")
print(input_text)
print("\nSummary:")
print(summary)

Which technique to choose really comes down to preference and the use-case for each of these summarizers. But in theory, AI-based summarizers will prove better in the long run as they will constantly learn and provide superior results.

--

--

Sarowar Jahan Saurav

I'm truly passionate to be a changemaker and always eager to help my community using Technology