Understand the LLM Agent Orchestration

Haiping Chen
SciSharp STACK
Published in
6 min readOct 15, 2023

Explain in detail the components of Profile, Memory, Planning, and Action components.

In the field of artificial intelligence, people’s expectations for Agents are growing day by day. Whenever a new open source tool or product based on Agent appears, it can trigger heated discussions, such as the previous AutoGPT. For those who are interested in Agent, I recommend a paper, which comprehensively introduces the architecture of Agent and is of great value for understanding the overall situation of Agent.

This paper explains in detail the concept, development history and recent research hotspots of Agent. In addition to these basic knowledge, I think the most valuable part is that it summarizes the architecture of Agent based on large language model (LLM), allowing us to design our own Agent according to a certain standard paradigm.

This article of mine mainly explains the construction strategy of Agent based on LLM from two key aspects: designing the Agent architecture to better utilize the capabilities of LLM, and how to give the Agent the ability to complete different tasks.

In terms of Agent architecture design, this paper proposes a unified framework, including Profile module, Memory module, Planning module and Action module.

Profile module

Define and manage the characteristics and behaviors of Agent roles. It contains a series of parameters and rules that describe various attributes of the Agent, such as roles, goals, abilities, knowledge, and behavior. These properties determine how the agent interacts with the environment, how it understands and responds to tasks, and how it makes decisions and plans. This module proposes three Agent role generation methods, including LLM generation method, data set alignment method and combination method.

1. LLM generation method: Use a large language model to automatically generate the personal characteristics of the agent, such as age, gender, personal preferences and other background information. The specific method is: first set the composition rules of agents and clarify the attributes that agents in the target population should have; then specify several manually created seed configuration files as examples; and finally use language models to generate a large number of agent configuration files. This approach can quickly generate configuration files in batches, but the resulting agents may lack detail due to the lack of precise control.

2. Data set alignment method: The agent’s profile information is obtained from real-world population data sets, such as by extracting population survey data and organizing it into a natural language description. This can make the agent behavior more realistic and credible and accurately reflect the attribute distribution of the real population. But it requires reliable large-scale data set support.

3. Combination method: Use real data sets to generate a part of key agents to ensure that they reflect the laws of the real world; then use the LLM generation method to supplement a large number of other agents to expand the number of agents. This not only ensures the authenticity of the agents, but also achieves a sufficient number of agents, allowing the system to simulate more complex social interactions. Careful configuration file design is the basis for building an effective proxy system.

Memory module

It plays an important role in the Agent system, which stores and organizes information obtained from the environment to guide future actions.

Structurally, a memory module usually contains two parts: short-term memory and long-term memory. Short-term memory temporarily stores recent perceptions, and long-term memory stores important information for retrieval at any time.

In terms of format, memory information can be expressed in natural language or encoded into vector embeddings to improve retrieval efficiency. You can also use database storage, or organize it into a structured list to represent memory semantics.

In operation, it mainly interacts with the environment through three mechanisms: memory reading, writing and reflection. Read and extract relevant information to guide actions, write and store important information, reflect and summarize insights to improve the level of abstraction.

Planning module

The main task is to help the Agent decompose complex tasks into more manageable sub-tasks and formulate effective strategies. It is roughly divided into two types, one is a plan that does not rely on feedback, and the other is a plan based on feedback.

Feedback-independent plans do not refer to post-task feedback during the formulation process and have several common strategies. For example, single-path reasoning generates plans step by step in a cascading manner. In addition, there is multi-path reasoning, which generates multiple alternative planning paths to form a tree or graph-like structure. Of course, we can also use an external planner to quickly search to find the optimal plan.

Feedback-based planning, which adjusts the plan based on feedback after task execution, is more suitable for situations where long-term planning is required. The source of feedback may come from objective feedback of task execution results, feedback based on human subjective judgment, or even feedback provided by an auxiliary model.

Action module

The responsibility is to transform abstract decisions into concrete actions. It is like a bridge that connects the Agent’s internal world and the external environment. When performing a task, consider the goal of the action, how it was generated, its scope of application, and its likely impact.

Ideally actions should be purposeful, such as completing a specific task, communicating with other agents, or exploring the environment. Actions can be generated by consulting past memory experiences or by following a preset plan. The scope of action can not only be expanded by leveraging external tools such as APIs and knowledge bases, but also requires leveraging the inherent capabilities of large language models (LLM), such as planning, dialogue, and understanding common sense.

The architecture is like the hardware of a PC, but relying solely on architectural design is not enough. We also need to give the Agent the ability to complete different tasks. These are regarded as “software” resources. Several methods are proposed in the paper, including model fine-tuning, prompt engineering, and mechanical engineering. Among them, prompt engineering is probably the most common form. The prompt word engineer we often hear is the role in this context.

The design and construction strategy of LLM-based Agent is a complex and challenging task. With the advancement of technology, I believe that more excellent AI applications will be produced through Agents in the future, and ordinary users can also create their own Agents through open source projects and become super individuals in the AI era. I hope everyone can take action as soon as possible, reserve more knowledge, and use or make their own Agent as soon as possible when the technology matures.

The design and construction strategy of LLM-based Agent is a complex and challenging task. With the advancement of technology, I believe that more excellent AI applications will be produced through Agents in the future, and ordinary users can also create their own Agents through open source projects and become super individuals in the AI era. For .NET developers, there are already some excellent frameworks to accelerate LLM system integration, such as Semantic Kernel and BotSharp. I hope everyone can take action as soon as possible, reserve more knowledge, and use or make their own Agent as soon as possible when the technology matures.

--

--

SciSharp STACK
SciSharp STACK

Published in SciSharp STACK

A .NET based Open Source Ecosystem for Data Science, Machine Learning and AI.

Haiping Chen
Haiping Chen

No responses yet