The Rise of Generative Agents in Interactive Simulations

10 min readAug 24, 2023

From Natural Language to Structured Environments: A Deep Dive into AI Reasoning by Generative Agents

The paper Generative Agents: Interactive Simulacra of Human Behavior by Park et al. (2023), proposes a system where a group of “generative agents” live life as human beings. These agents carry out their personal daily tasks in a simulated environment, including making decisions, having conversations, doing house chores, etc.

In this post, I shall go over the key concepts mentioned in this research by explaining what they are with the aid of visuals and/or code. This post follows the structure below:

An introduction to Generative AI and Large Language Models
A technical explanation of the paper and its core components

The paper is available on arxiv, a pre-trained replay of the simulation here, and also the code on GitHub.

What is Generative AI?

Generative Artificial Intelligence is the process of creating new outputs, yet not a direct copy of past data. These models aim to replicate the existing data by following learned patterns during the training phase of the whole process. The trends and patterns are found by Artificial Neural Networks (ANN), and more specifically, generative neural nets.

Generative AI has many applications, such as text/image/video generation, data augmentation, and Neural Style Transfer (NST). You can follow this tutorial by TensorFlow to get a better understanding of NST.

Artistic style transfer with TensorFlow — source

For more information, you can read this article on Generative AI. (link will be provided)

What is an LLM?

In the field of Natural Language Processing (NLP), a Large Language Model (LLM) is referred to as a model that can understand and generate human language. These models require a vast amount of textual data to be trained on at first. Then, they can be employed for different tasks such as text generation, question-answering, content summarization, language translation, and many more.

For more information about NLP, visit the newsletter of IBM.

In the past few years, language models have experienced a surge in development and application. Some of the notable examples of these LLMs are:

GPT by OpenAI
BERT by Google
Llama by Meta AI

Inside language models (from GPT-4 to PaLM) — source

For more information, you can read this article on Large Language Models. (link will be provided)

Going Back to the Paper

The researchers of this paper came up with a new approach for enabling the agents in the sandbox to become autonomous and act more human-like. This feat is achieved by designing an architecture for the agents that generates believable behavior using an LLM. The researchers utilized OpenAI’s ChatGPT API (GPT 3.5) for the architecture of their project.

Furthermore, the way that the agents interact with each other also plays a huge role in the impressiveness of this project. As explained by the authors:

“If an end user or developer wanted the town to host an in-game Valentine’s Day party, for example, traditional game environments would require scripting tens of characters’ behavior manually. We demonstrate that, with generative agents, it is sufficient to simply tell one agent that she wants to throw a party. Despite many potential points of failure, our agents succeed. They spread the word about the party and then show up, with one agent even asking another on a date to the party, all from a single user-generated seed suggestion.”

The Valentine’s Day party — from the paper

In a nutshell, this paper brings several key contributions to the table:

Human-like generative agents
Innovative architecture for generative agents
Believable interaction with the surrounding environment

In the next part, I will explain each of these 3 points in detail, and provide more information about their inner workings.

1. Agent Behavior & Interaction

Literature Review

Previously, most AI agents would be trained for adversarial environments, where the objective of the game or its reward functions were clearly defined, and the model could maximize its performance based on those metrics. For example:

AlphaStar by DeepMind for Starcraft (video game)
OpenAI Five by OpenAI for Dota 2 (video game)

In both instances, the AI models outperformed the professional human players and won the human-vs-AI games by a landslide.

However, not many models have addressed the challenge of creating agents that behave realistically in an open-world setting, where there are no strict measurements on performance. In addition to that, previous projects focused more on competition, rather than collaboration. Therefore, the authors of this paper have argued that:

“Today, creating believable agents as described in its original definition remains an open problem... Our argument is that large language models offer an opportunity to re-examine these questions, provided that we can craft an effective architecture to synthesize memories into believable behavior. We offer a step toward such an architecture in this paper.”

Current Approach

In order to showcase the capabilities of their generative agents, the researchers have integrated them as characters in a sandbox world inspired by The Sims (video game). This virtual world, named Smallville, resembles a small-town setting, where 25 agents reside.

For starters, each agent is initiated with a detailed description of their identity (occupation, relationships, seed memories) that correlates with their actions. For example, the character John Lin as described in the paper has the following description:

John Lin is a pharmacy shopkeeper at the Willow Market and Pharmacy who loves to help people. He is always looking for ways to make the process of getting medication easier for his customers; …

Types of Interactions

In their simulation, each agent can have 2 types of interactions:

Interaction with the world by taking specific actions
Interaction with other agents through natural language

Meanwhile, each agent outputs a statement that clearly explains what it is doing at the time of that event. These textual outputs are then stored for later reference.

To see the demonstration of the simulation, you can visit this website.

A Day in the Life

Furthermore, all agents live their lives based on a personal schedule, also known as “a day in the life” mechanism. These agents plan their days and take action according to the time of day and what they have to do. Each action is logged as a natural language statement and then gets translated into the sprites (characters) moving in the world.

The daily schedule of an agent— from the paper

2. Agent Architecture

The inspiration for the framework of this project comes from how human beings function, where perception, memory, and action are combined together. Also, these agents use natural language to keep records of their memories and to retrieve certain information for future planning.

The architecture of the generative agents can be boiled down to 3 key components:

Memory and retrieval
Reflection
Planning and reacting

Generative agent architecture — from the paper

In the upcoming sections, I will go over each of these 3 features, and provide the code for them, which can be found on the GitHub repository of the project.

GitHub - joonspk-research/generative_agents: Generative Agents: Interactive Simulacra of Human…

Generative Agents: Interactive Simulacra of Human Behavior - GitHub - joonspk-research/generative_agents: Generative…

github.com

2.1. Memory & Retriaval

Keeping track of all past events in the memory has always been an issue for language models. While great advancements have been made in increasing the input window of their prompts, they are still not as efficient as desired. As of now, the most promising attempt at solving this issue is LongNet by Microsoft.

LongNet: To 1 Billion Tokens and Beyond

LongNet is a new variant of the Transformer model that enables the modeling of extremely long sequences of text, with a…

medium.com

For more information about Microsoft’s LongNet, you can read this article by yours truly.

To address this issue, the researchers have invented the memory stream, a system that records all experiences, actions taken with timestamps, and the timestamps for the most recent access requests of each agent.

The memory stream approach — from the paper

# from the project's GitHub repository, `associative_memory.py`

class ConceptNode: 
  def __init__(self,
               node_id, node_count, type_count, node_type, depth,
               created, expiration, 
               s, p, o, 
               description, embedding_key, poignancy, keywords, filling): 
    self.node_id = node_id
    self.node_count = node_count
    self.type_count = type_count
    self.type = node_type # thought / event / chat
    self.depth = depth

    self.created = created
    self.expiration = expiration
    self.last_accessed = self.created

    self.subject = s
    self.predicate = p
    self.object = o

    self.description = description
    self.embedding_key = embedding_key
    self.poignancy = poignancy
    self.keywords = keywords
    self.filling = filling

2.2. Reflection

Making inferences and generalizations is challenging for generative agents when they only have raw observational memory. Say 2 agents share the same passion for research while working in different fields. Classic methods would not be able to determine the level of “passion” and “sharing the same activity” just by looking at the textual data of separate agents.

However, by developing the reflection mechanism, the agents can now construct more abstract thoughts on a higher level. Reflections are evoked when the sum of the importance scores for the latest events perceived by the agents exceeds a threshold.

The reflection mechanism — from the paper

# from the project's GitHub repository, `reflect.py`

def run_reflect(persona):
  focal_points = generate_focal_points(persona, 3)
  retrieved = new_retrieve(persona, focal_points)

  for focal_pt, nodes in retrieved.items(): 
    xx = [i.embedding_key for i in nodes]
    for xxx in xx: print (xxx)

    thoughts = generate_insights_and_evidence(persona, nodes, 5)
    for thought, evidence in thoughts.items(): 
      created = persona.scratch.curr_time
      expiration = persona.scratch.curr_time + datetime.timedelta(days=30)
      s, p, o = generate_action_event_triple(thought, persona)
      keywords = set([s, p, o])
      thought_poignancy = generate_poig_score(persona, "thought", thought)
      thought_embedding_pair = (thought, get_embedding(thought))

      persona.a_mem.add_thought(created, expiration, s, p, o, 
                                thought, keywords, thought_poignancy, 
                                thought_embedding_pair, evidence)

2.3. Planning & Reacting

Without planning over a long time horizon, an agent might commit the same action more than once in a given sequence, which negatively impacts the believability of the simulation.

By introducing planning, an agent can keep a consistent behavior over a longer period of time with its sequence of actions. The memory stream stores the plans of each agent as well.

# from the project's GitHub repository, `plan.py`

def plan(persona, maze, personas, new_day, retrieved): 
  if new_day: 
    _long_term_planning(persona, new_day)

  if persona.scratch.act_check_finished(): 
    _determine_action(persona, maze)

  focused_event = False
  if retrieved.keys(): 
    focused_event = _choose_retrieved(persona, retrieved)

  if focused_event: 
    reaction_mode = _should_react(persona, focused_event, personas)
    if reaction_mode: 
      if reaction_mode[:9] == "chat with":
        _chat_react(maze, persona, focused_event, reaction_mode, personas)
      elif reaction_mode[:4] == "wait": 
        _wait_react(persona, reaction_mode)
 
  if persona.scratch.act_event[1] != "chat with":
    persona.scratch.chatting_with = None
    persona.scratch.chat = None
    persona.scratch.chatting_end_time = None
  
  curr_persona_chat_buffer = persona.scratch.chatting_with_buffer
  for persona_name, buffer_count in curr_persona_chat_buffer.items():
    if persona_name != persona.scratch.chatting_with: 
      persona.scratch.chatting_with_buffer[persona_name] -= 1

  return persona.scratch.act_address

On top of that, by perceiving the world around them, each agent can react to the events happening and replan their schedule.

3. The Environment

Since the architecture of each generative agent operates using a language model, whilst its actions are happening in the environment, there needs to be a mechanism that grounds the agent’s reasoning to the sandbox. Hence, the researchers utilized a tree data structure to represent the environment (areas and objects) and then they passed the natural language form of the tree to the language model (GPT 3.5 in this case).

For instance, in the area above, the “Table” child node is an object of the “Common Room” parent node, which then gets rendered as “there is a table in the common room.”

Structured World ⇌ Natural Language

During their navigations, the agents build individual tree representations of the environment, which are subgraphs of the overall sandbox environment tree. Each agent is initialized with an environment tree capturing the spaces and objects that the agent should be aware of, such as the rooms and objects (e.g., living quarters, their workplace, and commonly visited stores). As the agents navigate the sandbox world, they update this tree to reflect newly perceived areas.

# from the project's GitHub repository, `spatial_memory.py`

class MemoryTree: 
  def __init__(self, f_saved): 
    self.tree = {}
    if check_if_file_exists(f_saved): 
      self.tree = json.load(open(f_saved))

  def print_tree(self): 
    def _print_tree(tree, depth):
      dash = " >" * depth
      if type(tree) == type(list()): 
        if tree:
          print (dash, tree)
        return 

      for key, val in tree.items(): 
        if key: 
          print (dash, key)
        _print_tree(val, depth+1)
    
    _print_tree(self.tree, 0)

  def save(self, out_json):
    with open(out_json, "w") as outfile:
      json.dump(self.tree, outfile)

The most suitable area to visit can be found by recursively traversing from the root node of the agent’s environment tree.

# from the project's GitHub repository, `perceive.py`

def perceive(persona, maze): 
 
  nearby_tiles = maze.get_nearby_tiles(persona.scratch.curr_tile, 
                                       persona.scratch.vision_r)

  # Note that the s_mem of the persona is
  # in the form of a tree constructed using dictionaries. 
  for i in nearby_tiles: 
    i = maze.access_tile(i)
    if i["world"]: 
      if (i["world"] not in persona.s_mem.tree): 
        persona.s_mem.tree[i["world"]] = {}
    if i["sector"]: 
      if (i["sector"] not in persona.s_mem.tree[i["world"]]): 
        persona.s_mem.tree[i["world"]][i["sector"]] = {}
    if i["arena"]: 
      if (i["arena"] not in persona.s_mem.tree[i["world"]]
                                              [i["sector"]]): 
        persona.s_mem.tree[i["world"]][i["sector"]][i["arena"]] = []
    if i["game_object"]: 
      if (i["game_object"] not in persona.s_mem.tree[i["world"]]
                                                    [i["sector"]]
                                                    [i["arena"]]): 
        persona.s_mem.tree[i["world"]][i["sector"]][i["arena"]] += [i["game_object"]]

To Recap

In this article, we went over the definition of Generative AI and LLms. Thereafter, the fundamental modules of the project were broken down into smaller units, such as the interactions and architecture of the generative agents. Hopefully, you have now gained a better understanding of this research paper and the concepts mentioned.

For those interested in having your own instance of the simulation, you can check out this video by WorldofAI on YouTube, which shows how you can install and play around with Smallville like a video game:

You can find me on LinkedIn or GitHub.