Text Summarization Models: A Comparative Study

Annarhysa Albert
5 min readAug 24, 2023

--

Research on Rare Word Handling Strategies for better summarization

NLP ON MIND

Introduction
Text summarization is a critical aspect of information extraction, enabling efficient access to voluminous text documents. The challenge of handling rare words in summarization has gained prominence due to their contextual significance. This thesis presents an in-depth exploration of this challenge by introducing two distinct models — an extractive approach based on cosine similarity and an abstractive approach utilizing the BART model. I have comprehensively analyzed the efficacy of these models in addressing rare word challenges, offering an extensive comparison through pseudocode, summarization examples, and rigorous evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics. This study aims to provide valuable insights into the capabilities of these models and their implications for the field of Natural Language Processing (NLP).

Dataset
The data set used to train and test both the models is CNN/Daily Mail dataset. This contains over 300k unique news articles as written by journalists at CNN and the Daily Mail.

VISUAL REPRESENTATION OF THE DATASET

The current version supports both extractive and abstractive summarization, though the original version was created for machine reading and comprehension and abstractive question answering. Hence, making this dataset my prime choice for this research topic.

Model 1 — Extractive Approach with Cosine Similarity
Let’s delve into the first model, an extractive approach utilizing cosine similarity. Algorithm 1 (Appendix A) provides step-by-step insights into preprocessing, stop word removal, and sentence tokenization.

I have elucidated the mechanics of cosine similarity-based sentence ranking, highlighting how crucial sentences are identified. A comprehensive walkthrough of the model’s architecture and the algorithm captures the intricacies of the entire process. Experimental results showcase both its strengths and limitations in capturing rare word context, through ROUGE Score.

Model 2 — Abstractive Approach using BART
The second model adopts an abstractive strategy by harnessing the power of the BART (Bidirectional and Auto-Regressive Transformers) model. It is a type of transformer architecture that is available in the Hugging Face Transformers library. A deep dive into pre-training and fine-tuning of BART for summarization is undertaken.

Algorithm 2 and Algorithm 3 (Appendix A) outlines the training loop, input transformation, and model execution, shedding light on the mechanics of abstractive summarization. The inherent flexibility of the abstractive approach holds promise for handling rare words more effectively. Extensive experimental evaluation, accompanied by summarization examples, unveils the model’s potential and inherent limitations.

Comparative Analysis
A meticulous comparative analysis of the two models follows, showcasing their divergent strategies. Algorithms and examples are juxtaposed to illuminate the nuances in their methodologies. I have conducted extensive ROUGE evaluations, encompassing ROUGE-1, ROUGE-2, and ROUGE-L scores, providing a robust quantitative assessment of summarization quality. Mathematically, ROUGE score can be written as follows:

𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑐𝑜𝑢𝑛𝑡(𝑁-𝑔𝑟𝑎𝑚𝑡𝑒𝑥𝑡1 ∩ 𝑁-𝑔𝑟𝑎𝑚𝑡𝑒𝑥𝑡2 ) 𝑐𝑜𝑢𝑛𝑡(𝑁-𝑔𝑟𝑎𝑚𝑡𝑒𝑥𝑡2 ) 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑐𝑜𝑢𝑛𝑡(𝑁-𝑔𝑟𝑎𝑚𝑡𝑒𝑥𝑡1 ∩ 𝑁-𝑔𝑟𝑎𝑚𝑡𝑒𝑥𝑡2 ) 𝑐𝑜𝑢𝑛𝑡(𝑁-𝑔𝑟𝑎𝑚𝑡𝑒𝑥𝑡1 )

𝑅𝑂𝑈𝐺𝐸𝐹1 = 2 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

My analysis scrutinizes the models’ performance concerning rare words, unveiling their distinct competencies. BART being a pre-trained model, without any doubt performs better than the extractive approach.

Results and Discussion
The thesis presents a comprehensive overview of the comparative analysis. I have delved into the ROUGE scores, elucidating how each model performs in the realm of rare words. Specific examples exemplify instances where one model outperforms the other, substantiating the findings with concrete summarization outputs.

ROUGE SCORE COMPARISON

It is evident that the BART model performs approximately twice as well as the extractive model. Different versions of ROUGE, such as ROUGE-1, ROUGE-2, and ROUGE-L, measure different aspects of overlap and matching between the generated and reference texts. ROUGE scores range from 0 to 1, with higher scores indicating better alignment between the generated and reference texts.

Implications and Future Research
The research findings carry implications for diverse NLP applications and the comparative study only helps us choose the better model for summarization. The model performance is measured by how high the output summary’s ROUGE score for a given article is when compared to the highlight as written by the original article author. So, the primary future prospect for such a model would be to increase the ROUGE score and make it more efficient.

These summarization techniques can also have another approach called “Simplified version of a sequence-to-sequence model with a pointer-generator mechanism,” which I am currently working on in the same repository as referred to in this paper.

Conclusion
In the journey to comprehend rare word handling strategies in text summarization, this thesis contributes an insightful analysis. Through detailed pseudocode, meticulous comparative analysis, and extensive ROUGE evaluations, we unravel the strengths and limitations of each model. This work contributes to the evolution of text summarization techniques, steering the course for innovative solutions that address the challenges of rare words. Do checkout Appendix B for new terminologies or jargons.

Appendices
Appendix A: Algorithms

Appendix B: Glossary

  1. HuggingFace: A large open-source community that builds tools to enable users to build, train, and deploy machine learning models based on open-source code and technologies.
  2. BART: A model architecture designed for sequence-to-sequence tasks, including text generation and summarization.
  3. ROUGE score: Set of metrics used to evaluate the quality of machine-generated text, such as summaries or translations, by comparing them to reference (human-generated) texts.
  4. Extractive Approach: The goal is to select and extract existing sentences or phrases directly from the original text to form the summary.
  5. Abstractive Approach: The summary is generated by composing new sentences that may not exist in the original text.

References
Sources
1. https://www.sciencedirect.com/science/article/pii/S2949719123000110
2.
Movie Review Summarization Using Supervised Learning and Graph-Based Ranking Algorithm (hindawi.com)
3.
https://huggingface.co/docs/transformers/model_doc/bart

Dataset
CNN-DailyMail News Text Summarization | Kaggle

Source Code
Annarhysa/Rare-Word-Handling-NLP: A summarizer built using nltk, Torch and HuggingFace Transformers (github.com)

--

--