E15 : Chain of Density Prompting
Published in
5 min readJan 6, 2024
Generating a sparse initial summary followed by iteratively adding more entities but keeping the overall token limit fixed, provides more informative, qualitative and readable summary
Paper Name : From Sparse to Dense : GPT-4 Summarization with Chain of Density Prompting
Paper URL : https://arxiv.org/abs/2309.04269
Authors : Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad
Please find the annotated paper here
Problem Statement :
- Generating concise, yet informative and dense summaries that are not very dense but at the same time not less dense as well is a challenging task.
- Packing meaningful information within a specified token limit is another key challenge
Solution :
- Adding more entities to the summary will increase the density of information in the generated summary.
- Keeping the overall summary within a fixed token limit, but still trying to add more entities can indirectly induce to improve abstraction, fusion and compression of information to include so that the additional entities can be included within the fixed token limit.
Approach :
- Given a passage to be summarised, LLM is prompted with CoD prompt to generate a concise sparse summary of the passage, within a specified token limit.
- LLM is also prompted to identify the missing entities from the above summary that are present in the article.
A missing entity must be ,
relevant - to the main story
specific - descriptive but concise (not more than 5 words)
novel - missing in previous summary
faithful - must be in the article
anywhere - located anywhere in the article - The missing entities are then included in the new summary without changing the maximum allowed token limit of the summary.
- Steps 2 and 3 are repeated for specified number of times (5 in this paper).
Experimentation :
- LLM - GPT-4
- Chain of Density (CoD) number of steps - 5
- Test dataset - CNN/Daily Mail news - 100 articles
- Comparison - Vanilla summary prompt Vs Human Summary Vs CoD prompt
- Evaluation Method - Human preference, Automatic evaluation using GPT-4
- Evaluated metrics
Direct statistics - tokens, entities, entity density
Indirect statistics - abstraction, fusion, content distribution
Observations :
- When prompted with CoD prompts, GPT-4 was able to identify on an average of 9.9 entities in 3rd step compared to 8.8 entities in human summary.
- CoD prompted summaries exhibit an average entity density (entities/no. of tokens) of 0.158 in 4th step which is greater than 0.151 of human level summaries.
- Improvement in abstraction of summaries generated, was measured using extractive density - a measure that measures the length of extractive fragments(sentences).
- With increasing steps (step 4) of CoD, the extractive density decreases sharply thus exhibiting improved abstraction, whereas the extractive density remains constant and far above for human and vanilla prompt summaries.
- Improvement in fusion of summaries generated is identified using ROGUE Gain method - a method that aligns each of the target sentences (summary sentences) with their source/origin sentences (article sentences) until the gain is positive.
- With increasing steps (step 2) of CoD, the fusion increases and clearly outperforms human and vanilla GPT-4 summary prompts.
- Content distribution helps identify from which part of the article,the entities are identified.
- Results show that CoD prompts tend to be lead bias initially - pick entities from the start of the article, in the beginning steps (step 1 and 2). But as the step increases, CoD prompts tend to pick entities from mid and end parts of the article as well.
- Also this shows, with higher number of steps CoD accesses more entities than human.
- Human level evaluation show that step 2 of CoD received an aggregate of 30.8% vote (for 100 questions) which was the highest for individual step of CoD.
- Steps 3,4 and 5 of CoD cumulatively showed an aggregate of 61% vote (for 100 questions). The preferred step of CoD summary is 3 which shows an entity density of 0.148 which is almost similar to human level entity density of 0.151
- Automatic evaluation using GPT-4 was performed by prompting GPT-4 to give a score on a scale of 1–5 in five dimensions - informative, quality, coherent, attributable, overall.
- Results showed that the average informative score peaked to 4.74 at step 4 of CoD which also corresponds to an entity density of 0.158. This shows that increase in entity results in more informativeness in summaries.
- On contrary, with increase in average entity density (0.167 in step 5 of CoD) of CoD summaries, the average quality and coherent score dropped to its lowest (4.65 and 4.61 respectively). This shows that an attempt to include more entities, can lead to reduction in quality and coherence of summary generated.
- On average, steps 1 and step 5 of CoD are least preferred as they are less and highly dense (containing more entities) compared to steps 2,3 and 4 of CoD.
Limitations :
- The approach has been tested only on news articles.
- The approach tests only GPT-4 model with CoD prompt.
Conclusion :
- Entities plays a major role in generating summaries, as they can add more meaningful information to the summary.
- But adding too much entities as well, can lead to over informativeness in the summary thus making them unreadable.
- As results showed, accessing more entities from different parts of the article, led CoD generated summaries to be over informative as well.
- CoD with steps 2–4 can be a good starting point to implement it for other domains as well.
- There clearly exists a trade-off between informativeness (more entities) and clarity (lesser entities) of the summary generated.