Computational Narrative Intelligence: Past, Present, and Future

This post is based on the keynote talk I gave at the 10th Workshop on Intelligent Narrative Technologies. The slides are here. In the talk, I presented a history of the workshop series, but this post will focus on broader themes.

Introduction

Storytelling is an important part of how we, as humans, communicate, entertain, and teach each other. We tell stories dozens of times a day: around the dinner table to share experiences; through fables to teach values; through journalism to communicate important events, and in entertainment movies, novels, and computer games for fun. Stories also motivate people to learn, which is why they form the backbone of training scenarios and case studies at school or work.

Narrative intelligence is the ability to craft, tell, understand, and respond affectively to stories. Research in computational narrative intelligence seeks to instill narrative intelligence into computers. In doing so, the goal of developing computational narrative intelligence is to make computers better communicators, educators, entertainers, and more capable of relating to us by genuinely understanding our needs. Computational narrative intelligence is as much about human-computer interaction as it is about solving hard artificial intelligence problems.

This post (1) lays out the history of the field of research on computational narrative intelligence, (2) describes some of the ways that computational narrative intelligence can solve beneficial real-world problems, and (3) presents some future trends and research challenges

Flashback

I trace the investigation of computational narrative intelligence to 1843 and the design of Babbage’s Analytical Engine.

His protégé, Ada Lovelace, is the first person in written record to ask whether computers can be creative. We do not know whether Lovelace thought about story generation as a topic of computational creativity.

However, the question of whether machines can create and tell stories has been one of the topics that have fascinated humankind for longer than we have had computers. Below is an image of an article from Popular Mechanics, 1931.

Popular Mechanics, 1931.

Unlike other forms of computational creativity such as painting, music, and poetry generation, telling and listening to stories happens very frequently in everyday life and has tangible, practical applications.

Grimes’ fairy tale generator.

The first system, a fairy-tale generation system developed by Grimes in 1960 (rediscovered by James Ryan) might be the first story generation system to use artificial intelligence (grammar-based generation).

Computational narrative intelligence traces its roots back to the 1960s and 1970s research in natural language understanding. Humans talk so computers, through artificial intelligence, should be able to talk too. At the time, the corpora easily available for natural language understanding researchers were: children’s stories, fairy tales, and news articles — all types of storytelling. Early natural language understanding researchers did research on story understanding.

A story generated by TALESPIN.

The most well-known story generation system is TALESPIN (1977), which came out of flipping a story understanding system to generate stories instead. It used something called conceptual dependency theory, an abstraction of natural language. It is regarded as the system that launched a succession of story generation systems through the 1970s and 1980s: Author, Universe, Minstrel, etc.

Then something interesting happened: research in natural language understanding came to a fork in the road. Stories can be viewed in two different ways. Stories can exist as written artifacts (text) but also as the cognitive structures that form when one reads or watches narrative content. The bottom-up focus on learning how to understand natural language by analyzing text forms the basis of the modern-day field of natural language processing (NLP). The top-down focus on cognitive representations of narrative formed the basis for the field of automated story generation. The assumption of the top-down approach to story generation is that once a story is generated in high-level abstraction, the story content can be transformed into text, animation, etc. These two fields of study went their separate ways with very little intersection. Until recently.

The Woggles, part of the Oz Project.

The AI Winter dampened progress in computational narrative intelligence research — story understanding and story generation. When the field of artificial intelligence emerged from the AI Winter in the 1990s, there was another force in computing was emerging as well: computer graphics and computer games. Researchers started experimenting with interactive characters in virtual worlds, as exemplified by the Oz Project at Carnegie Mellon University.

The work on virtual, interactive characters quickly expanded to the desire to join virtual characters in a fictional world and to interact with them in the context of an unfolding story. Interactive narrative is a form of digital interactive experience in which users create or influence a dramatic storyline through actions, either by assuming the role of a character in a fictional virtual world, issuing commands to computer-controlled characters, or directly manipulating the fictional world state. It is often considered the “holy grail” of game design.

Interactive narrative systems. From left to right and top to bottom: Automated Story Director, Prom Week, Haunt II, Façade, Crystal Island, PaSSAGE, C-DraGer, Merchant of VeniceMimesis.

Interactive narrative is most often considered as a form of interactive entertainment, but can also be used for serious applications such as education and training. The most common form of interactive narrative involves the user taking on the role of the protagonist in an unfolding storyline. The user can also be a disembodied observer — as if watching a movie — but capable of making changes to the world or talking to the characters. The goal of interactive narrative is thus to immerse the user in a virtual world such that he or she believes that they are an integral part of an unfolding story and that their actions have meaningful consequences. For more information on interactive narrative, see the AI Magazine survey paper.

Interactive narrative drove a lot of computational narrative intelligence work for a number of years.

Meanwhile, the field of natural language processing has progressed to the point that it has started addressing semantic and some of the higher-level constructs to language. We are starting to see a growing interest in story understanding and story generation across the broader machine learning, artificial intelligence, and natural language processing communities. The computational narrative intelligence and natural language processing research communities may soon find themselves re-integrating.

Why Should We Care About Computational Narrative Intelligence?

Despite the importance of storytelling as part of the human experience, computers still cannot reliably create and tell novel stories, nor understand stories told by humans. When computers do tell stories, via an eBook or computer game, they simply regurgitate something written by a human. They do not partake in the culture we are immersed in, as manifested through journalistic news articles, the movies we watch, or the books we read.

Why does it matter that computers cannot create, tell, or understand stories? Artificial intelligence has become more prevalent in our everyday lives. Soon, it will not be unusual for us to interact with more advanced forms of Siri or Cor- tana on a daily basis. However, when we use those systems today, we find it to be an alien sort of intelligence. The AI makes decisions that sometimes can be hard for us to make sense of. Their failures are often due to the fact that they cannot make sense of what we are trying to accomplish or why.

In the next few sections, I will present a number of applications of computational narrative intelligence and the significance of solving those problems.

Story Understanding

Story understanding is one of the earliest research topics in artificial intelligence. Consider the following story:

John entered the restaurant and ordered food. He looked across the room and saw an old friend, Sally. They put their tables together. Later that evening, John and Sally paid and left together.

Did you understand it? How do I know you understood it? I can ask you questions. For example, I could ask you “Did John and Sally arrive together?” Nowhere in the text is that discussed, yet most people will agree that John and Sally did not arrive together. How did you know? Story understanding requires commonsense knowledge, the set of social and culturally shared beliefs about how the world works. Commonsense reasoning is often cited as one of the grand challenges of artificial intelligence (see for instance this seminar by Barbara Grosz).

Why should we care about solving story understanding? Shared commonsense knowledge makes communication efficient. One can say just enough to trigger shared mental models in the communicative recipient. Thus, storytelling can be seen as a form of information compression. We slip naturally into communicating procedural and episodic information in narrative form. Thus intelligent dialogue agents might need to recognize when the human finds it easier to communicate in narrative and to extract the latent knowledge.

For example, consider the scenario of going to a doctor when hurt. The first thing people will often do is tell the story of how they got hurt. This may or may not help in diagnosis and selection of treatment. But it does provide the doctor with additional context. It also creates rapport between patient and doctor. If one is interacting with a medical robot or agent, one might also want to be able to extract additional context and to empathize with the story in order to build rapport and trust.

[Additional reading: Reading Between the Lines: Using Plot Graphs to Draw Inferences From Stories.]

Automated Story Generation

This flip side of story understanding is story generation. In theory the knowledge required to undersand a story can be used to create stories.

Consider the story above. Was it written by a human or a computer? The answer is… a computer system called Scheherazade (See here and here).

The most obvious application of story generation is entertainment. However, story generation can also be used as a testbed for natural language understanding systems. To fully claim that an intelligent system understands something, we can ask it to tell a story using its understanding. Question-answering is a good measure of intelligence, but is susceptible to guessing. It is immediately clear what the extent of a systems understanding of a concept is when it tries to apply it in a generative process.

Going farther, story generation can have applications to education, such as teaching or testing human literacy skills. In business, medicine, and law, story generation could be used to create fictional case studies to analyze. More on this in the next section.

Another application of story generation is in robots and virtual agents. Returning to the theme of rapport, human-like conversational skills theoretically put human users at ease. A hypothetical scenario is that of a medical healthcare coach that might need to check in and interact with users over long periods of time to help with prescription adherence and generally monitor user health. A study suggests that agents that can tell autobiographical stories — even when fictional — increase the amount of time people are willing to interact with agents. This can be hugely beneficial in medical and education domains, but should increase the desirability of other assistive agents such as Siri, Cortana, or Alexa.

Interactive Narrative

Automated story generation in virtual worlds results in interactive narrative. The user’s behavior in the virtual world can be observed and the storyline can be adjusted dynamically. As with story generation, entertainment is the most obvious appliation.

The Automated Story Director
The REACT system for teaching social scenarios.

Interactive narrative can be applied to education and training. In education, inquiry based learning can be facilitated by observing how students (or teams of students) solve a problem that doesn’t have a single solution and construct follow-on challenges in the context of an unfolding narrative.

Military and business training involve scenarios; scenario generation is a form of story generation and the scenario can be adjusted to afford practice of different skills in differing contexts.

Interactive narrative can be used to teach social skills and social conventions to children and young adults with high-functioning autism spectrum disorders by allowing them to practice scenarios in a safe environment.

[Additional reading: Interactive Narrative: An Intelligent Systems Approach.]

Affective Response

Here is another story, often attributed to Hemingway:

For sale: baby shoes. Never worn.

It’s only six words long, but many believe it invokes very strong emotions. Stories are often used to induce strong emotional responses in an audience. How can a system predict when it will induce an emotional response in an audience, and which emotion will it invoke, if any?

Naturally, we want to build story generation systems that can choose when and how to invoke emotional responses. Automated journalism may also benefit from understanding when choices will invoke emotional responses, or when it will be perceived as impartial.

For many types of affective response, especially suspense, there is little correlation between surface features — words — and the measure of suspense. We may need to build sophisticated cognitive models or audience models in order to predict the presence or absence of combinations of features that will induce emotional responses.

[Additional reading: Dramatis: A Computational Model of Suspense.]

Explanation

Eventually we will have robots and autonomous systems working with us and along side us. This could be in the form of healthcare robots, butler robots, or self-driving cars. They will invariably make mistakes and our first instinct will be to ask them “why” they did what the did.

Natural language explanations — answers to why questions — is an abductive process. Abduction is story telling. It may be more natural for humans to receive explanations as narratives about the causes and effects of the robot’s behavior.

One might ask whether explanations need to be 100% factual. When AlphaGo beat one of the top human Go players in the world, the event was televised. In addition to watching the game, commentators speculated on strategies and possible moves both sides could take. They played out little stories about what might happen next. The commentators had little to no understanding of how the algorithm worked, but their commentary and explanations helped me understand the game and the strategies that AlphaGo was using.

AlphaGo match.

In short, the commentators were rationalizing— they were explaining the algorithm’s decisions as if it were human without understanding what was going on inside the “black box” of the algorithm. Humans rationalize all the time when they explain their own behaviors. It is only natural; we do not understand how our own brains work. Indeed, the explanations of neuron firing patterns would be unsatisfactory in human explanations. We hypothesize that rationalization might be an effective means of helping non-technical end-users understand and build trust in the autonomous systems they work with.

[Additional reading: Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations.]

Machine Enculturation

Machine enculturation is the process of giving computational systems and understanding of human social and cultural values, norms, and conventions. In the future we can expect robots and intelligent agents to interact with us in increasingly social settings, whether it be as healthcare robots or personal butlers. These robots and intelligent agents will need to understand the values, norms, and conventions of the society it operates in. Social values, norms, and conventions exist as a common understanding between people of the same culture on how to reduce person-to-person conflict. We will expect robots and intelligent agents to follow our norms and conventions and agents that do not will unintentionally come into conflict with humans.

Robot from “Robot and Frank”
Microsoft’s Tay chatbot.

If robots sound too futuristic, consider that most of us probably have an AI in our pockets and maybe also in our homes. We call them Siri, Cortana, Google Assistant, and Alexa. Can disembodied agents cause conflict and harm? In 2016 we got a glimpse of how it could happen. Microsoft deployed a chatbot called “Tay” that learned to say racist and inflammatory things and was shut down within 24 hours. Imagine if Tay had said something to a vulnerable and insecure teenager or stoked violent ambitions.

The question that arises: how to teach artificial intelligence systems about values, norms, and conventions? I hypothesize that we can teach agents sociocultural understanding with stories. Human cultural values are implicitly encoded in stories told by members of a culture. Protagonists tend to exemplify our ideals and antagonists tend to exemplify traits we do not desire. If computers could comprehend stories then humans can transfer complex values to computers by telling stories.

Johnny 5 from “Short Circuit”

What can we learn from stories? Allegorical tales such as that in the U.S. about George Washington cutting down the cherry tree exemplify honesty. Aesop’s Fables likewise illustrate virtues such as avoiding vanity. Even contemporary movies, TV, and novels can teach about values, norms, and conventions. The choices protagonists make come with a reward signal: the consequences. Many TV and movie shows feature proper everyday behavior, such as what to do in a restaurant.

To give an example, it is fairly easy to set up a situation where an intelligent system performs a task correctly but violates social norms. In the following video, we show an agent that was instructed to pick up a prescription drug and return home.

In the video, the agent proceeds directly to the pharmacy, grabs the prescription drug, and walks out without paying. It did exactly what we asked without failure. The problem is, it also stole.

The problems is that although we told the agent what we wanted, we did not give it a complete specification of desired behavior. Namely, we didn’t tell the agent to perform the task in a socially acceptable fashion. We omit this information when talking to other humans because we expect them to understand this ahead of time. If we treat computers like we treat humans, we can trigger commonsense goal failures. It is the failure of the operator, not the agent.

The Quixote system is a reinforcement learning agent that learns a reward function by reading stories about everyday typical situations. The intuition behind this project is to reward the agent for performing actions that mimic those of the protagonist in stories. Below is a video where Quixote is presented with the same pharmacy scenario, but after reading stories about what people do when they go to pharmacies.

The Case for Learning Narrative Intelligence

For many, the case for machine learning will be taken for granted. However, the vast majority of research in computational narrative intelligence — previous and current — has not embraced machine learning. There is good reason for this. Computer games are micro-world. They consist of well-defined rules that govern their simulations. The characters are known; their abilities are known; as well as locations and objects. When one has a well-defined micro world it is possible to do amazing things with computational narrative intelligence. But if we want our technologies to tell and understand stories out in the real world, which is complex and rife with uncertainty, then we run into challenges that we only know how to address with machine learning: robustness and scalability.

Robustness

Earlier I showed an example story generated by TALESPIN. One of the best parts of the TALESPIN paper was on how the generator fails (try publishing a paper like that today!). In each “mis-spun” tale, the author talks about how the failure arose from an error in knowledge engineering.

Some mis-spun tales.

Takeaway: the system was brittle because it relied on precisely defined knowledge about the entities, facts, and rules of the world. This is a pattern that has played out time and again across artificial intelligence.

Scalability

In my dissertation work in the 2000s, I built symbolic story planning algorithm — Fabulist — informed and inspired by cognitive theories. The stories generated could be quite long and involved, such as the fairy tale shown to the left.

A story generated by Fabulist

To get that story, I had to provide rules about how the fictional world works:

I had to provide facts about the entities in the world and their abilities:

I had to provide the system with the outcome situation:

Something always bothered me about this. I couldn’t separate my creativity from the algorithm’s creativity. Maybe the algorithm wasn’t creative at all and successful generation of stories was just a reflection of my abilities as a knowledge engineer.

What I really wanted was to build a story generation system that could generate a novel story about any topic conceivable. Just like humans can. I call this the open story generation problem. My first attempt to solve the open story generation problem was Scheherazade, a story generation system that crowdsourced stories about given topics, learned a generalized model, and used the model to generate novel stories. Whenever confronted with a topic it didn’t know about, it could generate and post crowdsourcing queries. Here is a story automatically generated by Scheherazade:

A story generated by Scheherazade.

Machine learning comes with its own problems: how to generalize and how to handle noisy data. Scheherazade’s stories are much shorter and structurally simpler than those of Fabulist. We had to trade story complexity for scalability.

A recurrent neural net tries to predict the next sentence in a story.

Crowdsourcing knowledge doesn’t scale that well either — it basically outsources the knowledge engineering to many people who do a little bit of easy work. More recently we have experimented with learning to tell stories about any topic by digesting large story corpora such as movie plot description in Wikipedia. To handle the size of these corpora, we are using deep recurrent neural networks. Recurrent neural networks don’t yet work very well for story generation. The research is ongoing and we are making progress on reigning in the randomness due to the sheer complexity of learning patterns from story corpora. Sometimes we are rewarded, other times not so much.

Open Problems

The following are grand challenges that persist.

Interactive Narrative

We still don’t have the Holodeck, the virtual reality environment introduced in Star Trek: The Next Generation. Holograms aside, the Holodeck is a storytelling medium in which we see Star Trek characters take on the role of fictional characters and interact with virtual characters.

The Holodeck

Unlike traditional storytelling forms, interactive narratives can change based on what the human participant does and says. However, the human author of the experience cannot be present to micromanage it. Interactive narrative requires a Drama Manager, an artificial, disembodied entity that monitors the virtual environment and makes changes to the structure of the narrative and/or the virtual characters. A drama manager is a surrogate manifestation of the human author.

One open challenge in interactive narrative is non-programmer authorial intent. How do we make it possible for non-technical storytellers to instill their authorial intent, goals, and beliefs about good story experiences? Story generation must always take its cues from humans. In interactive narrative, entire story worlds must be built and populated with virtual characters. This is non-trivial for AI researchers. It is currently out of the scope of possibility for non-programmers and non-AI experts.

The drama manager is a surrogate for the human author, but it should also act as a surrogate for the human participant. The participant may have preferences over different experiences and a drama manager that models the participant’s preferences and desires. The challenge is that the participant is unable to express their preferences and desires except tacitly through their actions.

Finally, interactive narratives have not been truly open worlds where the participant can express full agency to do whatever he or she desires.

[Additional reading: Personalized Interactive Narratives via Sequential Recommendation of Plot Points]

Audience Modeling

We don’t have good models of how narrative content changes the cognitive and affective state of a reader, watcher, or interactive participant. If we desire to build automated story generation systems that are effective, they must make content generation decisions based on how it will affect the audience. For example, suspense is an affective stress response that results from a complex interplay between the content of a story (portraying the actions of a protagonist) and the expectations of the audience. The potential for suspense cannot be easily detected by learning patterns of textual surface features. Audience modeling would likewise be necessary to generate educational stories.

Improvisational Storytelling

Improvisational storytelling involves one or more people interacting in real-time to create a story without advanced notice of topic or theme. Improvisational storytelling is often found in improv theatre, where two or more performers receive suggestions of theme from the audience. Improvisational storytelling can also happen in informal settings such as between a parent and a child or in table-top role-playing games.

Who’s Line is It Anyway?

While improvisational storytelling is related to interactive narrative, it differs in three significant ways:

  1. Improvisational storytelling occurs in open worlds. That is, the set of possible actions that a character can perform is the space of all possible thoughts that a human can conceptualize and express through natural language.
  2. Improvisational storytelling relaxes the requirement that actions are strictly logical. Since there is no underlying environment other than human imagination, characters’ actions can violate the laws of causality and physics, or simply skip over boring parts. However, no action proposed by human or agent should be a complete non sequitur.
  3. Character actions are conveyed through language and gesture.

Improvisational storytelling in which a human interacts with an AI partner shares some traits with AI dialogue agents (chatbots). However, dialogue agents tend to be task oriented. In improvisational storytelling the goals of both participants can differ, goals can change, long-term context matters (making reference to things that have happened in the past), and a sense of progression is often desired (driving the story towards interesting plot points or conclusions even when knowing that the goals may be changed). In this regard I like to think of improvisational storytelling as the “AlphaGo of natural language processing.” Furthermore, metaphor, humor, tropes, cultural knowledge, and sense of dramatic structure are important aspects of good improvisational storytelling.

The next game show that artificial intelligence researchers work on should be Who’s Line Is It Anyway? After all: everything is made up and the points don’t matter.

[Additional reading: Improvisational Computational Storytelling in Open Worlds]

Real World Storytelling

Augmented Reality and Alternate Reality place virtual/fictional assets in the real world. Augmented reality uses mobile devices (head-mounted displays or hand-held devices) to place virtual graphical assets in the real world. Alternate Reality is a fictional overlay relying on imagination or physical assets.

Up until now, we have assumed that generated stories and interactive narratives play out in a virtual graphical environment or in the mind of an audience. There are few constraints on what can happen in these stories. The real world places constraints on generated story content because it must mesh with real world locations and real world objects.

Can Augmented Reality Games Like Pokémon Go Ever Have Stories?

Movie Generation

An AI grand challenge:

An intelligent system autonomously generates a feature-length animated movie that receives at least 20% on RottenTomatoes.com.

(I chose 20% because that would beat Transformers: Revenge of the Fallen.)

We have research on automated story generation. We also have research on AI movie directing and cinematography. An example of a movie generated by an AI director is show below. However, it required a fully specified script in a special command language.

Movie generated by the Cambot system.

The work on automated story generation typically focuses on plot-level content — high level descriptions of what characters do. We don’t have any AI research on how to bridge the gap from plot to character dialogue, scene directions, and non-canned character animation and behavior.

Concluding Thoughts

  1. Computational narrative intelligence is an important part of creating human-like AI systems.
  2. Computational narrative intelligence is the key to many real world applications. It is hard to envision interactive AI agents and robots that are immersed in our society that are not able to interact more naturally with humans and behave in accordance with social conventions.
  3. Narrative intelligence is about commonsense and socio-cultural knowledge — it may make AI agents and robots safer.
  4. The larger AI research community is starting to become more interested in storytelling and interested in returning to the original AI challenges of story understanding.
  5. There is still a long way to go!