A new perspective on prompts: Guiding LLMs with diegetic and non-diegetic prompts

by Daniel Buschek (University of Bayreuth, Germany), Hai Dang (University of Bayreuth, Germany), Sven Goller (University of Bayreuth, Germany), Florian Lehmann (University of Bayreuth, Germany)

Daniel Buschek
Human-Centered AI

--

We propose a conceptual perspective on how users can write prompts in interactions with large language models (LLMs), which takes inspiration from research on writing and narrative media. This article is based on our CHI’23 paper, which reports on an empirical study with this perspective. Here, we focus on the concept: Concretely, we distinguish between diegetic prompts and non-diegetic prompts.

  • A diegetic prompt is part of the narrative. For example, a user might write “Once upon a time, I saw a fox” to request a text suggestion from an LLM. A good suggestion would then complete this sentence, making the original prompt a part of the result. For instance: “Once upon a time, I saw a fox who lived in a burrow by the river.”
  • In contrast, a non-diegetic prompt is not part of the narrative but external to it. In the example, the user could instead prompt the model to “Write about the adventures of a fox”. A good AI response here would generate a fitting piece of text. The original prompt, however, would not be considered a part of the artifact resulting from the interaction (e.g. the story).
A screenshot contrasting diegetic and non-diegetic prompts. Three elements are highlighted: 1. Diegetic prompts (i.e. written text that is part of the output text) 2. Non-diegetic prompts (i.e. instructions to the LLM) 3. suggestions (e.g. single ones shown inline or multiple ones shown in a list)
Overview of diegetic vs. non-diegetic prompts in our prototype UI. Note that users can always provide diegetic prompts, i.e. the written narrative. A text box appears above the current cursor where users can enter a non-diegetic prompt. A single suggestion is shown inline and multiple suggestions are shown as a list.

Note that this distinction is meaningful for non-fiction writing as well (e.g. news articles, scientific papers, travel blogs). In technical terms, both types of prompts are simply strings processed by the model. That said, diegetic prompts match the next word prediction task which has become a core element of training language models over the last years. In addition, non-diegetic prompts appear in recent training and finetuning methods, too (e.g. reinforcement learning from human feedback, training on instructions).

Here, we use this distinction to cast a human-centered perspective: What does it mean for users of such systems to interact via diegetic and/or non-diegetic prompts?

This perspective is useful in at least three ways:

  • First, it allows researchers and practitioners to make explicit a key design factor that is present in many user interfaces for generative AI — yet it has remained largely unarticulated and thus unreflected so far.
  • Second, it allows Human-Computer Interaction (HCI) researchers to better understand how people interact with such systems.
  • Third, it provides a conceptual foundation for researchers in HCI and AI to discuss insights and interaction design across modalities, including text and images.

Applying the perspective to analyze generative AI systems

Many recent user interfaces have made implicit design decisions around diegetic vs. non-diegetic prompting, without explicitly surfacing this as a design factor. Here, we apply our proposed conceptual lens to reveal these decisions. In doing so, we hope to inform decision-making for future designs. To structure this, we first discuss three types of example systems for writing:

Text completion systems use diegetic prompting: These originate from augmentative and alternative communication (AAC) work to save keystrokes. Examples include word suggestions on smartphones, phrase suggestions in emails and in other text editors (e.g. Chen et al., 2019; Buschek et al., 2021; Lee et al., 2022).

Recent writing assistant systems (also) allow for non-diegetic prompting: Some recent systems have integrated instructions. For example, Sparks by Gero et al. (Gero et al., 2022, Fig. 2) lets users enter prompts to generate short “sparks” of inspirations for scientific writing. These prompts are not a part of the final text and therefore non-diegetic.

Screenshot of Sparks, a system that supports creative writing.
A screenshot of Sparks, a system that supports creative writing by generating writing sparks for authors. Reproduced from Gero et al. (2002).

Some systems use a mix of diegetic and non-diegetic information for prompting: One example of this is Wordcraft, which allows users to select text in their draft (i.e. diegetic information) and enter a prompt in a sidebar (i.e. non-diegetic information) to specify how the selection should be changed.

A screenshot of Wordcraft. Two side-by-side windows show the UI of Wordcraft.
A screenshot of Wordcraft, a system that supports story writing with large language models. Users trigger predefined revision tasks with keyboard shortcuts or by using the UI elements in the sidebar. Reproduced from Yuan et al. (2022).

In our own study, our prototype also afforded combinations of both types (text suggestions based on preceding text, plus opportunity to enter additional instructions).

Another interesting example is Github Copilot : It could be seen as a text completion system that uses the existing code (i.e. diegetic information) to generate more. However, code might be special here in that it also has comments. One could argue that comments are non-diegetic if the artifact (script) is seen as only encompassing the runnable code. If interpreted in this way, developers use non-diegetic prompting in cases where they write comments to prompt the system to write code.

Diegetic and non-diegetic prompts provide a meaningful perspective beyond text. To capture this, we next discuss concrete examples involving images:

  • Visual-to-text: In TaleBrush (Chung et al., 2022), users can sketch the rise and fall of a character’s fortune (or other story attributes) to influence an LLM’s text generation. The sketch can be considered a non-diegetic prompt to the model.
  • Text-to-visual: Many recent image generation models can be prompted with text. Very rarely would users expect to see the prompt text in the actual image. Thus, the prompt is non-diegetic. Exceptions include parts of a prompt that describe text to include in the image (e.g. “A shop with a sign saying ‘Open for business’”). In such cases, the prompt includes diegetic information.
  • Visual-to-visual: Some systems use models that transform images, for example, to turn sketches or marked areas into high-fidelity renderings (e.g. Bau et al., 2018). These sketches can be considered non-diegetic prompts, as they are a kind of visual instruction, not intended to be a part of the final outcome. In contrast, other systems add to e.g. a painting started by the user (e.g. Ha and Eck, 2017). The user’s start is a diegetic prompt, as it is indeed intended to be a part of the depicted “world”.

Discussion and outlook

We have proposed to distinguish between two types of prompts that people use to control generative AI: With diegetic prompts, users intend their prompts to be a part of the final artifact and expect the AI to continue their input. With non-diegetic prompts, users intend to instruct the AI and expect that their input will not be a part of the final artifact.

With this conceptual lens and terminology, we hope to contribute to a better understanding of how people interact with generative AI systems, to inform decision-making for future designs, and to provide a conceptual foundation to discuss insights and interaction design across modalities.

Looking ahead, with the current rise of LLMs trained to respond well to instructions, many user interfaces will likely integrate more and more non-diegetic prompting. Nevertheless, as we found in our study, people to some extent also strategically consider diegetic information to guide LLMs. Leveraging their combination thus provides a promising direction for future work.

Finally, a key insight from our study is that switching between the two types of prompts can be challenging for people. This is because writing on the draft is different from writing instructions. Diegetic prompts are easier to handle within the flow of writing (cf. text completion systems). In contrast, non-diegetic prompts require the writer to mentally step out of the draft to instead think about how to instruct the AI. Crucially, this is a human factor and thus not even a “perfect” future LLM can remove human cognitive costs of switching between diegetic and non-diegetic prompting.

Recent work found that non-experts already struggle in writing good instructions (Zamfirescu-Pereira et al., 2023). We expect that this might be even more difficult if a task and UI design require users to (frequently) switch between diegetic and non-diegetic writing. As more and more applications are extended with prompt-based UI features, we expect that such switches become more likely and frequent.

To address such challenges, future work can build on the presented perspective to explore and reflect on the two prompt types explicitly in the design of generative AI tools.

References

  • David Bau, Jun-Yan Zhu, Hendrik Strobelt, Bolei Zhou, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2018. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. arXiv. https://doi.org/10.48550/arXiv.1811.10597
  • Daniel Buschek, Martin Zürn, and Malin Eiband. 2021. The Impact of Multiple Parallel Phrase Suggestions on Email Input and Composition Behaviour of Native and Non-Native English Writers. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI ‘21). Association for Computing Machinery, New York, NY, USA, Article 732, 1–13. https://doi.org/10.1145/3411764.3445372
  • Mia Xu Chen, Benjamin N. Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang, Andrew M. Dai, Zhifeng Chen, Timothy Sohn, and Yonghui Wu. 2019. Gmail Smart Compose: Real-Time Assisted Writing. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ‘19). Association for Computing Machinery, New York, NY, USA, 2287–2295. https://doi.org/10.1145/3292500.3330723
  • John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching Stories with Generative Pretrained Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ‘22). Association for Computing Machinery, New York, NY, USA, Article 209, 1–19. https://doi.org/10.1145/3491102.3501819
  • Hai Dang, Sven Goller, Florian Lehmann, and Daniel Buschek. 2023. Choice Over Control: How Users Write with Large Language Models using Diegetic and Non-Diegetic Prompting. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ‘23). Association for Computing Machinery, New York, NY, USA, Article 408, 1–17. https://doi.org/10.1145/3544548.3580969
  • Katy Ilonka Gero, Vivian Liu, and Lydia Chilton. 2022. Sparks: Inspiration for Science Writing using Language Models. In Designing Interactive Systems Conference (DIS ‘22). Association for Computing Machinery, New York, NY, USA, 1002–1019. https://doi.org/10.1145/3532106.3533533
  • David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. arXiv. http://arxiv.org/abs/1704.03477
  • Mina Lee, Percy Liang, and Qian Yang. 2022. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ‘22). Association for Computing Machinery, New York, NY, USA, Article 388, 1–19. https://doi.org/10.1145/3491102.3502030
  • J.D. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, and Qian Yang. 2023. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ‘23). Association for Computing Machinery, New York, NY, USA, Article 437, 1–21. https://doi.org/10.1145/3544548.3581388

--

--

Daniel Buschek
Human-Centered AI

Professor at University of Bayreuth, Germany. Human-computer interaction, intelligent user interfaces, interactive AI.