Do your research: the STORM method for improving long-form document generation

Will G
One Cool Thing
Published in
3 min readSep 2, 2024
Photo by Tom Barrett on Unsplash

Link to paper here. Code available here.

Executive Summary for Leaders/Managers:

What is it?: STORM is a new method for generating high-quality, coherent, long-form documents using large language models, developed at Stanford University.

Why should you care?: If you’ve seen early versions of LLM text output, then you’re aware that they can have a tendency to wander as it becomes longer. Eventually, they just output gibberish. Newer models do better, but if you wanted to use an LLM to output something like a report, it probably would still lose focus.

A method like STORM can help with gathering relevant material, collating that information into an outline (a useful outcome on its own), and even producing a draft.

What questions should you ask your data scientists?:

  • In which areas of our business could STORM most effectively streamline our document creation process, and what text already exists for supporting that writing?
  • If we applied STORM to improve the consistency and quality of our customer-facing documentation or internal reports, what are some ways it might fail and how would you address those shortcomings?
  • What resources would be required to implement STORM in our current content workflow?

Summary for Data Scientists:

What is the technical innovation of the method? STORM uses a novel approach to document generation that begins by planning what content should be retrieved and combining that retrieved information into a document outline. Once the outline is complete, the model moves to section writing, and iterative refinement with other language models.

Why is that cool?: most people involved in the creation of an article or report want to be hands on in the drafting. This can pose a significant impediment to the adoption of LLMs by creative professionals. But the fact that STORM inherently incorporates research and outline generation into its document generation process both lines up with common writing practice and provides a useful foundation on which writers can build, hurdling over that impediment.

A second cool thing is how STORM simulates conversations between stakeholders with different perspectives to improve the quality of the overall draft, which is particularly helpful for topics that may not have a settled consensus.

What questions should we be asking about the method?:

  • How does STORM’s performance compare to other document generation methods we’ve considered or to our current creative process for article/report drafting?
  • What are its limitations and potential areas for improvement?
  • Computational cost: How resource-intensive is STORM compared to simpler methods?
  • Domain adaptation: How well do we expect STORM to perform across different subject areas or document types?
  • Human-in-the-loop integration: Where in the writing process should we combine STORM with human expertise for optimal results?

Shah, Y., et al. (2024). Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models. arXiv preprint 2402.14207v2.

The outline for this post was drafted with help from Claude.

--

--

Will G
One Cool Thing

I write about the joys of fatherhood and motoring, and some cool things in the world of AI/ML