A Complete Guide to LLMs-based Autonomous Agents (Part I):

Yule Wang, PhD
The Modern Scientist
20 min readOct 9, 2023

— — Chain of Thought, Plan and Solve/Execute, Self-Ask, ReAct, Reflexion, Self-Consistency, Tree of Thoughts and Graph of Thoughts

Large Language Models (LLMs) provide an intuitive natural language interface, making them ideal for user-computer interactions and addressing complex problems. Some pretrained LLMs, such as GPT-4, come with notable reasoning capabilities, enabling them to break down intricate issues into more simpler steps, offering solutions, actions, and evaluations at each step. This suggests that these LLMs are already adequate to address diverse challenges.
However, being closed systems, LLMs are unable to fetch the most recent data or specific domain knowledge. This limitation can lead to potential errors or “hallucinations” (i.e., generating incorrect responses). While fine-tuning the pretrained LLMs is a potential remedy, it compromises their generality, as it requires fine-tuning the transformer neural network weights and data collections across every specific domain. Additionally, LLMs have intrinsic limitations in domains like arithmetic operations and staying current with the latest information. Simple fine-tuning cannot overcome these shortcomings, indicating the importance of incorporating external data and supplementary tools. Therefore, it becomes essential to design an autonomous agent alongside LLMs.
In the eyes of the general public, GPT-4 Plugins that utilize external instruments and Auto-GPT, which demonstrates automated behaviors, are perceived as LLM-based agents. This article provides an overview of LLM-based agents, from the
simplest forms to those whose problem-solving processes mirror human-like complexities, and offers an engineer’s perspective on the architecture behind autonomous agents.

Here is YouTube recording video of the presentation of LLM-based agents, which is currently available in a Chinese-speaking version. If you’re interested in an English version, please let me know.

1. Motivation of LLM-Based Agents

My previous two blogs “Transformer Based Models” & “Illustrated Explanations of Transformer” delved into the increasing prominence of transformer-based models in the field of Natural Language Processings (NLP). A highlight of these discussions was the inherent advantages of the decoder-only transformer models (GPT, Llama & Falcon). As generative models, or GenAI, their strength in in-context learning — stemming from self-supervised pretraining — stands out as a foundation of its remarkable reasoning ability.

Transitioning from GPT-3/GPT-3.5 (where GPT-3.5 was fine-tuned on pre-trained GPT-3 model via the InstructGPT method) to GPT-4 has further enhanced this capability. This improvement is showcased in the improved performances on exams like SAT, GRE, and LSAT as mentioned in the GPT-4 Technical Report. However, the model specifications of GPT-3/3.5 or GPT-4 remain undisclosed. Recently, competent open-source models like Llama-2 from Meta and Falcon from TII have been made available, offering avenues for further fine-tuning.

While Llama-2 as the competent open-sourced model, though competent, can sometimes might fall short in some reasoning tasks; GPT-4, as the the most powerful model available for public use, showcases prowess in fields such as generic reasoning tasks, including reading comprehension, commonsense reasoning and logical reasoning, and is also adept at code generation. GPT-4, though powerful, without specific guiding prompts, can stumble even when the challenge lies within high school-level math and physics. Additionally, in the realm of coding, GPT-4 has shown tendencies to errors or hallucinations, particularly with newer APIs (knowledge as of January 2022).

Fig. 1: Accuracy of the top 15 open-source LLMs, alongside GPT-3.5 and GPT-4, on ARC, HellaSwag, MMLU, and TruthfulQA. (Image Source: Lee. el al. (2023) )

1.1 Why Not Fine-Tuning?

Fine-tuning can transform LLMs into domain-specific experts, and consistently infuse the transformer with the latest information, but it comes with several drawbacks:

  1. It demands domain-specific fine-tuning, which is burdensome not merely due to its cost but also because it compromises generality. This process requires finetuning of the transformer’s neural network parameters and data collections across every specific domain.
  2. Fine-tuning a pre-trained LLM requires careful operation, as the model acts like a “black box”; otherwise, there’s a risk of overwriting or conflicting with existing knowledge.
  3. In specific tasks, LLMs, being closed systems and being language models, struggle without external tools such as calculators or specialized APIs. They naturally exhibit weaknesses in areas like math, as observed in GPT-3’s performance with arithmetic calculations involving 4-digit operations or even more complex tasks. Even if the LLMs are trained frequently with the latest data, they inherently lack the capability to provide real-time solutions, like current datetime or weather details. The incorporation of external knowledge retrieval and tool utilization is essential. After all, it’s unnecessary to make an entity like Albert Einstein universally adept; what’s crucial is its ability to know when and where to source relevant information or tools. (Lilian Weng et al. “How to Build an Open-Domain Question Answering System” )
  4. Enhancing reasoning capabilities through fine-tuning proves challenging. Pretrained LLMs come with a fixed number of transformer parameters, and enhancing their reasoning often depends on increasing these parameters (stemming from emergent behaviors from upscaling complex networks). Simply fine-tuning based on pretrained transformer models rarely augments this reasoning capability, especially if the pretrained models are aleady adequately trained. This is particularly true for tasks that prioritize reasoning over domain knowledge, like solving mathematical or physics reasoning problems.

In this perspective, solely relying on fine-tuning or mere scaling isn’t an all-in-one solution. It’s a sensible to construct a system around LLMs, leveraging their innate reasoning prowess to plan, decompose the complex task, reason, and action at each step. Given that LLMs inherently possess commendable reasoning and tool-utilizing skills, our role is primarily to guide them to perform these intrinsic abilities in appropriate circumstances.

2. What is an agent?

2.1 A gentle introduction of generic agents

The concept of an ‘agent’ has its roots in philosophy, denoting an intelligent being with agency that responds based on its interactions with an environment. When this notion is translated to the realm of artificial intelligence (AI), it represents an artificial entity employing mathematical models to execute actions in response to perceptions it gathers (like visual, auditory, and physical inputs) from its environment. Within reinforcement learning (RL), the role of the agent is particularly pivotal due to its resemblance to human learning processes, although its application extends beyond just RL. In this blog post, I won’t delve into the discourse on an agent’s self-awareness from both philosophical and AI perspectives. Instead, I’ll focus on its fundamental ability to engage and react within an environment.

2.2 What is an LLM-based agent?

Fig. 2: An LLM-based agent interacts with its environment through perception, sensing environmental data, and takes action based on the information, which may involve tools. In unimodal mode, the agent uses only text for both input and output. In multi-modal mode, the agent can perceive using visual, auditory, and physical inputs and can execute embodied actions in the environment. (Image Source: Xi et al. (2023))

In textual unimodal LLMs, text is the exclusive medium of perception, with other sensory inputs being disregarded. This text serves as the bridge between the users (representing the environment) and the LLM. Consequently, all actions manifest as text-based instructions, be it generating text responses or activating external resources and tools.

(Note: While GPT-4 can handle both visual comprehension and generation, this multimodal capability is now publicly accessible to users, but won’t be the focus of this article.)

2.3 Critical Components

We depend on LLMs to function as the brains within the agent system, strategizing and breaking down complex tasks into manageable sub-steps, reasoning and actioning at each sub-step iteratively until we arrive at a solution. Beyond just the processing power of these ‘brains’, the integration of external resources such as memory and tools is essential. Traditional rule-based programming, serves as the backbone to organically connect each component. When LLMs access the contextual information from the memory and external resources, their inherent reasoning ability empowers them to grasp and interpret this context, much like reading comprehension.

  • LLMs (Brains)
    Within the autonomous agent, LLMs manage activities such as planning, reasoning, actioning, evaluating and summarizing. They harness linguistic reasoning, commonsense knowledge, and actionable insights to deploy particular tools. A single LLM isn’t necessarily designated to cover all these functions; multiple LLMs can be assigned specialized roles. This division not only enhances production efficiency but also optimizes costs, much like specialized sectors of a brain.
    o Input: Text-based. This encompasses more than just the immediate user command. It also integrates instructions, which might range from broad system guidelines to specific user directives, preferred output formats, and instructed examples (k-shot). In an ongoing dialogue between the user and LLMs, prior conversation history is also included. For intricate tasks that involve multiple iterative sub-steps, a record of previous thought processes, actions, observations and responses from LLMs are retained.
    o Output: Also Text-based. Conclusive solutions to a user’s command are addressed and articulated in textual form. For tasks that progress in sequences, it may formulate actionable plans. When provided with clear directives in the input prompt, the LLM can produce outputs in structured formats (e.g., JSON), simplifying the transition of data and actions into upcoming rule-centric programming workflows.
    Here is the input prompt template:
"""
Instructions: {System Instruction}
Previous Data: {Memory}
Reference Data: {Retrieved Information}
Respond in the specified JSON format: {JSON Format with descriptions}
Please replicate the examples to generate the answer:
{k examples}
Given question: '''{Question}''', provide the process leading to the answer:
"""
  • Memory
    In an ongoing chat dialogue, the history of prior conversations must be reintroduced to the LLMs with each new user message. This means the earlier dialogue is stored in the memory. Additionally, for decomposable tasks, the plans, actions, and outcomes from previous sub-steps are saved in memory and they are then integrated into the input prompts as contextual information. However, due to the Transformer’s input sequence length constraints and for operational efficiency and production costs, we can’t store endless past interactions to feed into the LLMs. To address this, various memory strategies have been devised.
    o Memory Buffer: This holds the entire chat history. However, if the accumulated context surpasses the input sequence length limit, a “window” can trim older portions of the history. (Refer to the established module Buffer_Memory by LangChain). This method, however, results in inevitable loss of historical information. For longer histories, there are associated concerns about production costs and increased latency due to an overly lengthy input context. Some LLMs might struggle to extract the most relevant content and might demonstrate “forgetting” behaviors towards the earlier or central parts of the context.
    o Memory Summarization: This strategy condenses the user-LLM interactions into shorter content, leveraging the LLMs for the task. (See the established module Memory Summarization by LangChain). The downside is that while core information is retained, finer details might be lost, particularly after multiple rounds of summarization. It’s also worth noting that frequent summarization with LLMs can lead to increased production costs and introduce additional latency.
    o Structured Memory Storage: As a solution to the drawbacks of the previous methods, past dialogues can be stored in organized data structures. For future interactions, related history information can be retrieved based on their similarities.
  • External tools or data resources (Retrieved Augmented Generator, RAG)
    o Data: For enhancing LLMs with external data or scripts, it’s vital to segment (breaking down long texts to fit token limits), embed, index (organizing segmented texts for efficient retrieval), and retrieve (using databases, for instance). I’ll dive deeper into the intricacies of data indexing in my upcoming article.
    o Tools: Advanced pretrained LLMs can discern which APIs to use and input the correct arguments, thanks to their in-context learning capabilities. This allows for zero-shot deployment based on API usage descriptions.

3. An Evolution of LLMs-Based Agents

This section presents the evolution of the autonomous agent (as shown in the chart below), transitioning from a straightforward input-output (direct prompting) approach to a complex autonomous LLM-based agent. This multi-step agent is adept at planning and segmenting tasks. During each sub-step, it reasons, employs external tools & resources, evaluates results, and can refine its ongoing sub-step or even shift to a different thought trajectory.

Fig. 3: An illustration of the evolution of LLM-based agents: from simplistic direct I-O prompting, to directive prompting, and ultimately to a sophisticated autonomous agent. This advanced agent integrates multiple problem-solving steps, utilizes external resources, and chooses the best path among various thought trajectories to reach the conclusive answer. (Image Source: Created by the Author)

3.1 Input-Output (Direct Prompting)

Plain user prompt. Some questions can be directly answered with a user’s question. But some problems cannot be addressed if you simply pose the question without additional instructions.

3.2 Instruction Prompting

If a basic prompt doesn’t yield a satisfactory response from the LLMs, we should offer the LLMs specific instructions. These include guiding them on how to approach and formulate answers, suggesting templates to adhere to, or presenting examples to mimic. Below are some exemplified prompts with instructions:

  • General System Instruction (as seen in the “custom instructions” on the ChatGPT web version):
    — “I’m just a little kid. Can you make things simple for me? If you’re unsure, just tell me ‘No’.”
  • Structured Output Instruction: The LLM is capable of producing outputs in structured formats, simplifying subsequent parsing and facilitating rule-based programming. This eliminates the reliance on traditional parsing methods, e.g. regex.
    — “*Please rate the toxicity of these texts on a scale from 0 to 10. Parse the score to JSON format like this {‘text’: the text to grade; ‘toxic_score’: the toxicity score of the text}”
  • Few-Shot Examples Instruction (k-Shot Learning): Instead of giving a prompt without any guiding examples, k-shot learning provides the LLMs with several samples to recognize and replicate the patterns from those examples through in-context learning. The examples can steer the LLM towards addressing intricate issues by mirroring the procedures showcased in the examples or by generating answers in a format similar to the one demonstrated in the examples (as with the previously referenced Structured Output Instruction, providing a JSON format example can enhance instruction for the desired LLM output). Notably, unlike finetuning, this method doesn’t alter the network’s parameters and the patterns won’t be remembered if the same k examples aren’t supplied in the future
    — “Following the format(/process) of this(/these) example(s): ‘’’… ‘’’, answer my question.”

3.3 Chain of Thought (Single I-O)

I will introduce more complicated prompting techniques that integrate some of the aforementioned instructions into a single input template. This guides the LLM itself to break down intricate tasks into multiple steps within the output, tackle each step sequentially, and deliver a conclusive answer within a singular output generation. This procedure can be encapsulated by the term “chain of thought”. Nevertheless, depending on the instructions used in the prompts, the LLM might adopt varied strategies to arrive at the final answer, each having its unique effectiveness.

  • “Let’s think step by step” ([Kojima et al. (202205)]([2205.11916] Large Language Models are Zero-Shot Reasoners)):
    Simply adding “Let’s think step by step” to the user’s question elicits the LLM to think in a decomposed manner, addressing tasks step by step and derive the final answer within a single output generation. Without this trigger phrase, the LLM might directly produce an incorrect answer.
Fig. 4: An illustrative example demonstrating how an simple instructive chain-of thought prompting “Let’s think step by step” can trigger the LLMs to think step by step and improve the accuracy of the results. (Image Source: Kojima et al. (2022))
  • Plan & Solve (Wang et al. (202305)) (Refer to the established module “Plan-and-Execute” agent by LangChain):
    Expanding on the “let’s think step by step” prompting, by prompting the LLM to initially craft a detailed plan and subsequently execute that plan — following the directive, like “First devise a plan and then carry out the plan”. Without a proper planning phase, as illustrated, LLMs risk devising sometimes erroneous steps, leading to incorrect conclusions. Adopting this “Plan & Solve” approach can increase accuracy by an additional 2–5% on diverse math and commonsense reasoning datasets.
Fig. 5: An illustrative example showing that the impact of a two-phase instruction (in red, Fig (b)) — first crafting a plan and then executing it step by step — on enhancing the accuracy of answers when compared to just “Let’s think step by step”. (Image Source: Wang et al. (202305))
  • Self-Ask (Press et al. (202210)) (k-shot):
    As illustrated in the figure below, the input prompt provides the LLM with example questions and their associated thought chains leading to final answers. In its response generation, the LLM is guided to craft a sequence of intermediate questions and subsequent follow-ups mimicing the thinking procedure of these examples. This “chain of thought”, characterized by the pattern “question → intermediate question → follow-up questions → intermediate question → follow-up questions → … → final answer”, guides the LLM to reach the final answer based on the previous analytical steps.
Fig 6: An illustrative example showing that the effect of Self-Ask instruction prompting (In the right figure, instructive examples are the contexts not highlighted in green, with green denoting the output.) — which consistently prompts the model to evaluate if the current intermediate answer sufficiently addresses the question– in improving the accuracy of answers derived from the “Let’s think step by step” approach. (Image Source: Press et al. (2022))

3.4 Retrieve External APIs (RAG) (Several I-Os)

The aforementioned chain of thoughts can be directed with or without the provided examples and can produce an answer in a single output generation. When integrating closed-form LLMs with external tools or data retrieval, the execution results and observations from these tools are incorporated into the input prompt for each LLM Input-Output (I-O) cycle, alongside the previous reasoning steps. A program will link these sequences seamlessly.

  • ReAct (Yao et al. (202210)) (k-shot):
    ReAct, short for reason and action, shares similarities with Self-Ask agent but integrates external tool actions. Upon being provided with sample ReAct examples, it adopts the flow: “Question →Reason →Action →Observation →Reason →Action →Observation → … →Reason →Final Answer”. ReAct leverages external entities like search engines to acquire more precise observational information to augment its reasoning process.
    (Note: Updated Self-Ask agent are now equipped to harness internet search engines.)
Fig. 7: An illustrative example showing how an ReAct agent that combines reason and action (e.g., using a search engine) can achieve a more accurate final answer by accessing up-to-date information external to closed-form LLMs. (Image Source: Yao et al. (2022))

3.5 Add Evaluator

Incorporating an evaluator within the LLM-based agent framework is crucial for assessing the validity or efficiency of each sub-step. This aids in determining whether to proceed to the next step or revisit a previous one to formulate an alternative next step. For this evalution role, either LLMs can be utilized or a rule-based programming approach can be adopted. Evaluations can be quantitative, which may result in information loss, or qualitative, leveraging the semantic strengths of LLMs to retain multifaceted information. Instead of manually designing them, you might consider to leverage the LLM itself to formulate potential rationales for the upcoming step.

  • Self-Refine (Madaan et al. (202303)) (k-shot):
    Upon receiving a generated work or answer, an LLM can self-evaluate using rationales like concepts and commonsense reasoning, and refine its output. In the initial stages, the LLM provides self-feedback on its output. Given the original context and this feedback, both included in the input prompt, the model initiates refinements. This iterative process, characterized by a “feedback-refine” loop, continues until no further refinements are required. The procedure of this “feedback-refine” loop, along with generic rationales like concepts, commonsense reasoning and other specific rationales tailored to different domains (e.g., code generation, dialogue response) are demonstrated through k examples in the input prompt, allowing the LLM to replicate the process. Self-refine has demonstrated improvements ranging from 20–50% in dialogue responses and 4–10% in code optimization across different LLMs. (Note: With advanced models like GPT-4, there might not be a need to manually input these self-assessment rationales; the model can potentially generate by itself.)
Fig. 8: A graph showcasing the Self-Refine agent mechanism: it constantly enhances its generated solution to a task by repeatedly seeking feedback from an LLM and incorporating this feedback for refinement. (Image Source: Madaan et al. (2023))
  • Reflexion (Shinn et al. (202303) (Verbal Reinforcement Learning without Finetuning):
    A limitation of Self-Refine is its inability to store refinements for subsequent LLM tasks, and it doesn’t address the intermediate steps within a trajectory. However, in Reflexion, the evaluator examines intermediate steps in a trajectory, assesses the correctness of results, determines the occurrence of errors, such as repeated sub-steps without progress, and grades specific task outputs. Leveraging this evaluator, Reflexion conducts a thorough review of the trajectory, deciding where to backtrack or identifying steps that faltered or require improvement, expressed verbally rather than quantitatively. This self-reflection process distills the long-term memory, enabling the LLM to remember aspects of focus for upcoming tasks, akin to reinforcement learning, but without altering network parameters. As a prospective improvement, the authors recommend that the Reflexion agent consider archiving this long-term memory in a database.
Fig. 9: A diagram of the Reflexion agent’s recursive mechanism: A short-term memory logs earlier stages of a problem-solving sequence. A long-term memory archives a reflective verbal summary of full trajectories, be it successful or failed, to steer the agent towards better directions in future trajectories. This design allows it to function as a reinforcement learning agent without altering the network parameters. (Image Source: Shinn et al. (2023) )

3.6 Multiple Chains of Thoughts

“A genuine problem-solving process involves the repeated use of available information to initiate exploration, which discloses, in turn, more information until a way to attain the solution is finally discovered. “ — — Newell et al. (1959)

When humans tackle complex problems, we segment them and continuously optimize each step until prepared to advance further, ultimately arriving at a resolution. An agent replicating this problem-solving strategy is considered sufficiently autonomous. Paired with an evaluator, it allows for iterative refinements of a particular step, retracing to a prior step, and formulating a new direction until a solution emerges.

Both ToT and GoT are prototype agents currently deployed for search and arrangement challenges, including crossword puzzles, sorting, keyword counting, the game of 24, and set operations. They have not yet been experimented on certain NLP tasks like mathematical reasoning and generalized reasoning & QA. Real-world problem-solving is considerably more complicated. We anticipate seeing ToT and GoT extended to a broader range of NLP tasks in the future.

Fig. 10: A diagram that shows the evolution from agents that produce a singular chain of thought to those capable of generating multiple ones. It also showcases the progression from agents with parallel thought processes (Self-Consistency) to advanced agents (Tree of Thoughts, Graph of Thoughts) that interlink problem-solving steps and can backtrack to steer towards more optimal directions. (Image Source: [Besta et al. (2023)]([2308.09687] Graph of Thoughts: Solving Elaborate Problems with Large Language Models))
  • Self-Consistency (SC) (Wang et al. (202203)):
    Without specific indication, the LLM uses greedy encoding to generate a singular response. This token-by-token generation is deterministic, with a temperature setting of 0. As we raise the temperature, the output becomes more diverse and creative. A higher temperature, therefore, introduces more diverse thought trajectories for problem resolution. These different paths can lead to varied conclusions. From these, a majority vote can finalize the answer. Implementing Self-Consistency enhances performance by 5% — 15% across numerous arithmetic and commonsense reasoning tasks in both zero-shot and few-shot Chain of Thought settings.
  • Tree of Thoughts (ToT) (Yao et al. (202305)):
    While Self-Consistency produces multiple distinct thought trajectories, they operate independently, failing to identify and retain prior steps that are correctly aligned towards the right direction. Instead of always starting afresh when a dead end is reached, it’s more efficient to backtrack to the previous step. The thought generator, in response to the current step’s outcome, suggests multiple potential subsequent steps, favoring the most favorable unless it’s considered unfeasible. This approach mirrors a tree-structured methodology where each node represents a thought-action pair. Such problem structures have commonly employed depth-first or breadth-first search techniques. The evaluation technique here contrasts with those in Reflexion and Self-Refine. For creative writing, the model votes on candidate passages five times, ultimately selecting the one with the most overall votes to proceed. Meanwhile, for the Game of 24 and Crosswords, the system employs a scale of sure/likely/impossible to guide its decisions. Compared to Chain of Thought, ToT realizes improvements of 70%, 20%, and 40% in the Game of 24, Creative Writing, and Crosswords, respectively.
  • Graph of Thoughts (GoT) (Besta et al. (202308)):
    GoT advances upon ToT in several ways. Firstly, it incorporates a self-refine loop (introduced by Self-Refine agent) within individual steps, recognizing that refinement can occur before fully committing to a promising direction. Second, it eliminates unnecessary nodes. Most importantly, GoT merges various branches, recognizing that multiple thought sequences can provide insights from distinct angles. Rather than strictly following a single path to the final solution, GoT emphasizes the importance of preserving information from varied paths. This strategy transitions from an expansive tree framework to a more interconnected graph, enhancing the efficiency of inferences as more data is conserved. The evaluation criteria differ per task; for instance, sorting tasks assess subset accuracy, while document merging evaluates redundancy and information preservation. GoT experiments focus on tasks that benefit from consolidated thought branches. Compared to ToT, there’s a reduction in error rates by 26% for sorting, 15% for set operations, and 3% for keyword counting. There’s also a 0.25% boost in the quality score for document merging. Among these tasks, ToT can cut costs by as much as 50%.

4. Designing an Autonomous LLMs-Based Agent

4.1 Components

An autonomous agent usually consists of various modules. The choice to employ identical or distinct LLMs for assisting each module hinges on your production expenses and individual module performance needs. While LLMs have the versatility to serve various functions, it’s the distinct prompts that steer their specific roles within each module. Rule-based programming can seamlessly integrate these modules for cohesive operation.

  • Planner (LLM-assisted): This module can either lay out a comprehensive plan with all the steps upfront before proceeding to evaluate each one, or it can devise a plan for a single step at a time, creating the next step only after the completion of the preceding one.
  • Reasoner (LLM-assisted): Based on the current step’s plan and the context from prior trajectories, this module logically processes information, analyzes the results of actions, and formulates an intermediate solution for the current phase.
  • Actioner (LLM-assisted): When allowed access to external resources (RAG), the Actioner identifies the most fitting action for the present context. This often involves picking a specific function/API and its relevant input arguments. While models like Toolformer and Gorilla, which are fully finetuned, excel at selecting the correct API and its valid arguments, many LLMs might exhibit some inaccuracies in their API selections and argument choices if they haven’t undergone targeted finetuning. Such models rely on their inherent in-context learning capabilities, selecting an API based on the provided reasoning context and API descriptions. While they benefit from illustrative examples of API usages, capable LLMs can operate effectively without any examples.

It’s also worth noting that LLMs can generate outputs in structured formats like JSON, facilitating the extraction of the desired action and its parameters without resorting to traditional parsing methods like regex. Given the inherent unpredictability of LLMs as generative models, robust error handling becomes crucial. Some sophisticated LLMs possess self-error-handling abilities, but it’s vital to consider the associated production costs. Moreover, a keyword such as “finish” or “Now I find the answer:” can signal the termination of iterative loops within sub-steps.

  • Executor (RAG-enabled, a wrapper function separate from LLM): When access to external resources like data or API functions (e.g., a search engine, calculator, or calendar) is granted, an execution wrapper becomes essential. This wrapper manages the function calls and data retrieval processes. (Details on RAG with indexing will be covered in an upcoming blog article.)
  • Evaluator (LLM-assisted or Rule-Based Program):Using either predefined or LLM-generated rationales, the LLM-based evaluator assesses if you’ve hit a dead end or if the step’s quality is suboptimal, leading to an unpromising direction. For tasks with clearly defined outcomes, a rule-based program can be utilized for evaluation. The feedback might take the form of numerical ratings associated with each rationale or be expressed as verbal commentary on individual steps or the entire process.
  • Evaluator Ranker (LLM-assisted; Optional): If multiple candidate plans emerge from the planner for a specific step, an evaluator should rank them to highlight the most optimal. This module becomes redundant if only one plan is generated at a time.
  • Memory (Outside LLM; LLM assists summarization): This module stores the sequential progression of past thoughts, actions, and results, providing LLMs with context for subsequent recursive steps. Whether to summarize past trajectories hinge on efficiency and related costs. Given that memory summarization requires LLM involvement, introducing added costs and latencies, the frequency of such compressions should be carefully determined.

Based on the context, the Planner, Reasoner, and Actioner can operate jointly or as individual modules. For instance, the current step’s reasoning might directly imply the next move, removing the necessity for a separate reasoner. Similarly, reasoning might implicitly recommend a specific tool. However, overly decomposing steps and modules can lead to frequent LLM Input-Outputs, extending the time to achieve the final solution and increasing costs.

Fig. 11: A flowchart illustrating the workings of an autonomous agent: The planner breaks down a complex task, proposing multiple sub-task options for that step, each leading to a unique problem-solving direction. The best sub-task is selected, and the reasoner strategizes its action, usually producing a structured output like JSON. If an external function/API is deemed necessary, its results get integrated into the context to shape an intermediate answer for that step. An evaluator then assesses if this intermediate answer steers towards a probable final solution. If it’s not on the right track, a different sub-task is chosen. (Image Source: Created by Author)

4.2 Pseudocode

Here’s a pseudocode representation of a comprehensive problem-solving process using autonomous LLM-based agent. The method presented follows a “plan a step” followed by “resolve this plan” loop, rather than a strategy where all steps are planned upfront and then executed, as seen in plan-and-solve agents:

# Initially, append the queston requiring a resolution to the memory
memory = [Question]
i = 0

while True:
# Generate candidates thoughts for the current step
candidates = Planner.generate_candidates(step_i, memory)

while candidates:
# Select the most optimal candidate thought step
candidate_j = Evaluator_Ranker.select_best(candidates)
# Reasoner reasons this thought step when given the previous trajectory info
reason = Reasoner.process(memory, candidate_j)
# Provide an action to execute based on the previous reasoning
action = Actioner.select_an_action(reason)

# If we find a keyword like "Final Answer", end this iterative loop
# and return the final solution
if action == "Final Answer":
return final_solution

# This checks if external resource retrieval is allowed.
if RAG_enabled:
observation = Executor.execute(action)

# Reasoner reasons upon the observation of the action
candidate_j_result = Reasoner.process(memory, candidate_j, observation)

# Evaluator acesses whether the result from this candidate thought step
# will lead to a positive direction to the final answer.
if Evaluator.access(candidate_j_result) is positive:
i += 1 # Move to the next thought step.
Memory.append(step_i, candidate_j, candidate_j_result)
break
else:
# If not positive, try an alternative thought step
candidates.remove(candidate_j)
if not candidates:
candidates = Planner.generate_candidates(step_i, memory)

Biography

Yule Wang, Physics PhD, NLP Machine Learning Engineer

My LinkedIn: https://www.linkedin.com/in/yule-wang-ml/

My YouTube Channel

Other YT Videos:

In-Depth Look at Transformer Based Models: BERT, GPT: Training Objectives & Architectures Compared

ChatGPT’s reinforcement model — InstructGPT

Word-Embeddings: GloVe, CBOW, skip-gram

--

--