Photo by Stefan Steinbauer on Unsplash

E15 : Chain of Density Prompting

Praveen Thenraj
Research Papers Summarized
5 min readJan 6, 2024

--

Generating a sparse initial summary followed by iteratively adding more entities but keeping the overall token limit fixed, provides more informative, qualitative and readable summary

Paper Name : From Sparse to Dense : GPT-4 Summarization with Chain of Density Prompting

Paper URL : https://arxiv.org/abs/2309.04269

Authors : Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Eric Lehman, Noémie Elhadad

Please find the annotated paper here

Problem Statement :

  • Generating concise, yet informative and dense summaries that are not very dense but at the same time not less dense as well is a challenging task.
  • Packing meaningful information within a specified token limit is another key challenge

Solution :

  • Adding more entities to the summary will increase the density of information in the generated summary.
  • Keeping the overall summary within a fixed token limit, but still trying to add more entities can indirectly induce to improve abstraction, fusion and compression of information to include so that the additional entities can be included within the fixed token limit.
CoD prompt to generate an initial sparse summary followed by iteratively adding entities to enhance abstraction,fusion and compression of the final summary

Approach :

  1. Given a passage to be summarised, LLM is prompted with CoD prompt to generate a concise sparse summary of the passage, within a specified token limit.
  2. LLM is also prompted to identify the missing entities from the above summary that are present in the article.
    A missing entity must be ,
    relevant - to the main story
    specific - descriptive but concise (not more than 5 words)
    novel - missing in previous summary
    faithful - must be in the article
    anywhere - located anywhere in the article
  3. The missing entities are then included in the new summary without changing the maximum allowed token limit of the summary.
  4. Steps 2 and 3 are repeated for specified number of times (5 in this paper).

Experimentation :

  • LLM - GPT-4
  • Chain of Density (CoD) number of steps - 5
  • Test dataset - CNN/Daily Mail news - 100 articles
  • Comparison - Vanilla summary prompt Vs Human Summary Vs CoD prompt
  • Evaluation Method - Human preference, Automatic evaluation using GPT-4
  • Evaluated metrics
    Direct statistics - tokens, entities, entity density
    Indirect statistics - abstraction, fusion, content distribution

Observations :

  • When prompted with CoD prompts, GPT-4 was able to identify on an average of 9.9 entities in 3rd step compared to 8.8 entities in human summary.
  • CoD prompted summaries exhibit an average entity density (entities/no. of tokens) of 0.158 in 4th step which is greater than 0.151 of human level summaries.
Average entity density of CoD prompted Summary Vs Human Summary Vs Vanilla GPT-4 Summary
  • Improvement in abstraction of summaries generated, was measured using extractive density - a measure that measures the length of extractive fragments(sentences).
  • With increasing steps (step 4) of CoD, the extractive density decreases sharply thus exhibiting improved abstraction, whereas the extractive density remains constant and far above for human and vanilla prompt summaries.
  • Improvement in fusion of summaries generated is identified using ROGUE Gain method - a method that aligns each of the target sentences (summary sentences) with their source/origin sentences (article sentences) until the gain is positive.
  • With increasing steps (step 2) of CoD, the fusion increases and clearly outperforms human and vanilla GPT-4 summary prompts.
Abstraction,Fusion,Content Distribution of CoD prompt summary Vs Human summary Vs Vanilla GPT-4 summary
  • Content distribution helps identify from which part of the article,the entities are identified.
  • Results show that CoD prompts tend to be lead bias initially - pick entities from the start of the article, in the beginning steps (step 1 and 2). But as the step increases, CoD prompts tend to pick entities from mid and end parts of the article as well.
  • Also this shows, with higher number of steps CoD accesses more entities than human.
  • Human level evaluation show that step 2 of CoD received an aggregate of 30.8% vote (for 100 questions) which was the highest for individual step of CoD.
  • Steps 3,4 and 5 of CoD cumulatively showed an aggregate of 61% vote (for 100 questions). The preferred step of CoD summary is 3 which shows an entity density of 0.148 which is almost similar to human level entity density of 0.151
  • Automatic evaluation using GPT-4 was performed by prompting GPT-4 to give a score on a scale of 1–5 in five dimensions - informative, quality, coherent, attributable, overall.
  • Results showed that the average informative score peaked to 4.74 at step 4 of CoD which also corresponds to an entity density of 0.158. This shows that increase in entity results in more informativeness in summaries.
  • On contrary, with increase in average entity density (0.167 in step 5 of CoD) of CoD summaries, the average quality and coherent score dropped to its lowest (4.65 and 4.61 respectively). This shows that an attempt to include more entities, can lead to reduction in quality and coherence of summary generated.
  • On average, steps 1 and step 5 of CoD are least preferred as they are less and highly dense (containing more entities) compared to steps 2,3 and 4 of CoD.

Limitations :

  • The approach has been tested only on news articles.
  • The approach tests only GPT-4 model with CoD prompt.

Conclusion :

  • Entities plays a major role in generating summaries, as they can add more meaningful information to the summary.
  • But adding too much entities as well, can lead to over informativeness in the summary thus making them unreadable.
  • As results showed, accessing more entities from different parts of the article, led CoD generated summaries to be over informative as well.
  • CoD with steps 2–4 can be a good starting point to implement it for other domains as well.
  • There clearly exists a trade-off between informativeness (more entities) and clarity (lesser entities) of the summary generated.

--

--