What is an LLM Agent and how does it work?

5 min readJan 9, 2024

The main intuition behind agents is a model using a large language model as its central computational engine to reason through a problem, plan to solve the problem and use a set of tools to solve it.

Real tasks in real world do not have one step solutions. They usually require multiple dependent and independent steps to be completed. An answer to a question like ‘What are the earnings for the A company in 2023?’ can be given by using a simple lookup.

However, a question like ‘What are the three takeaways from the Q2 earnings from FY23? Focus on the technological moats that the company is building.’ is not that easy to solve. To find the answer to a question like this, we need knowledge about financial analysis, planning to break down the problem into simpler sub-parts, memory to remember the previous steps and finally tools to complete the simple tasks.

A general framework of using LLMs for solving complex embodied tasks[1]

Let’s breakdown components of LLM agents to better understand how they operate.

Components of LLM Agents

1. Large Language Model

The core computational engine of an LLM agent is a large language model. LLM is trained on a massive dataset to understand and reason from text data.

Notable works such as [3], and [4] have shown that LLMs have the capacity for reasoning which is crucial for agents to work.

2. Prompt

general_prompt = '''
                    Act as a software engineer.
                    Your abilities are:
                    - Writing code.
                    - Writing a README.md file.
                    - Creating a unittest.
                    - Evaluating code quality.
                    - Creating a repository in github.
                    - Reaching a repository in github.

                    Do not finish the chain until you are sure that you have completed it.
                  '''

specific_prompt = f'''
                  Rewrite the 'README.md' file of the {repo_name} repository of the user {username}
                  '''

Prompts are instructions that give information to LLM about its objective, behavior and plan. The agent’s performance is very dependent on the quality of a prompt. An agent has two prompts:

General prompt: This prompt explains the role and behavior of the agent. Prompt does not change for each task. It must be carefully planted since the quality of this prompt is directly correlated to the performance of the agent.
Specific prompt: This prompt tells the objective of a certain task to the agent. Prompt changes for each task.

3. Memory

As we know, agents complete a complex task by first breaking down into sub-tasks than executing tools to finish sub-tasks. For this, the model needs to remember its previous steps. There are two main types of memory:

Short term memory: The agent’s “train of thought.”
Long term memory: The log book that contains a conversation history stretching across weeks or months.

By combining both these memories, the agent gets a firms grasp of the past and contextual knowledge about the user. Memory requires more than a semantic-based retrieval. A composite score is made up of semantic similarity, importance.

4. Knowledge

Without the knowledge of the field, agent can not solve or even understand the task. So either the LLM must be fine-tuned to have the knowledge or we can create a tool to extract the knowledge from a database.

5. Planning

Complex problems often need a chain-of-thought approaches. Agents forms a plan by using a combination of two methods:

Task and question decomposition: Breaking down the task or question into smaller parts
Reflection or critic: Frameworks such as React are used to critic the plan generated by the agent.

6. Tools

Executable functions, APIs or other services that allow agents to complete their duties.

Example

The question such as ‘What is the average age of a dog? Multiply the age by 3’ requires a step by step approach. Agent in below was created by using the Langchain framework. First we initialize our large language model for the central computational engine of the agent, secondly we integrate our tools and finally initialize our agent.

The tool ‘wikipedia’ is for extracting knowledge from the internet and ‘llm-math’ for the numerical calculation.

# Large language model
llm = AzureChatOpenAI()

# Tool integration
tools = load_tools(['wikipedia', 
                    'llm-math'], llm=llm)

# Initialization of the agent
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True,
                         handle_parsing_errors=True)

# Run the agent with a prompt
result = agent.run('What is the average age of a dog? Multiply the age by 3')

The output looks like this:

Conclusion

In conclusion, the LLM Agent represents a powerful framework for solving complex embodied tasks by leveraging a Large Language Model (LLM) as its central computational engine. Comprising essential components such as a carefully crafted general prompt outlining the agent’s abilities and behavior, a task-specific prompt defining the objective, and a sophisticated memory system encompassing both short and long-term memory, the agent is equipped to tackle intricate problems. Knowledge, gained either through fine-tuning the LLM or extracting information from databases, forms a critical foundation for the agent’s understanding and problem-solving capability. The planning phase involves task and question decomposition, along with reflective analysis, allowing the agent to devise effective strategies. Finally, the integration of executable tools enhances the agent’s ability to execute its plans, showcasing the versatility and adaptability of the LLM Agent in addressing real-world challenges.

References

[1] Hu, Bin, et al. 2023, “Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach.”

[2] A Comprehensive Overview of Large Language Models, https://www.wisecube.ai/blog/a-comprehensive-overview-of-large-language-models/

[3] Radford, Alec, et al. “Learning transferable visual models from natural language supervision.” International conference on machine learning. PMLR, 2021.

[4] https://paperswithcode.com/method/gpt-3

[5] Introduction to LLM Agents, https://developer.nvidia.com/blog/introduction-to-llm-agents/

[6] What is LLM Agent? Ultimate Guide to LLM Agent [With Technical Breakdown], https://www.ionio.ai/blog/what-is-llm-agent-ultimate-guide-to-llm-agent-with-technical-breakdown

[7] Building Your First LLM Agent Application, https://developer.nvidia.com/blog/building-your-first-llm-agent-application/