ROUGE your NLP Results!

Priyanka
6 min readNov 21, 2022

The next metric being discussed in this series of posts about metrics used in Natural Language Processing is the Recall-Oriented Understudy for Gisting Evaluation(ROUGE).

ROUGE is an evaluation metric used to assess the quality of NLP tasks such as text summarization and machine translation. Unlike BLEU, the ROUGE uses both recall and precision to compare model generated summaries known as candidates against a set of human generated summaries known as references. It measures how many of the n-grams in the references are in the predicted candidate.

First, let us calculate recall = True Positives/(True Positives + False Negatives). Here, True Positives are the matching n-grams between References and Candidates. The False Negatives can be thought of as the n-grams that are in the actual sentences(references) but not in the candidate. The False Positives are then the n-grams that are present in the Candidate but not in the References. Check out this blog post Recall, Precision, F1-Score if you need a refresher on concepts such as True Positives, False Positives, Recall and Precision etc.,

Hence, the recall and precision can be obtained as shown below:

Recall and Precision can be complementary sometimes and hence it is necessary to combine both of them to achieve a balance. For this reason, the ROUGE is calculated as F1-score in…

--

--