AI Agents — Behind the Scenes

ASHPAK MULANI
6 min readJul 8, 2024

--

Artificial Intelligent Agents (AI-Agents) are changing how we use technology by automating complex tasks and making decisions on their own. At their core, these agents are computer programs that use Large Language Models (LLMs) as their “brains”. These Agents break down queries into smaller parts, decide which tools can solve each part, and combine the results to achieve the final goal.

Someone who understands human language and use different tools to solve the problems autonomously !!! Does it sound familiar ? YES, it does….Do you remember Doremon the Japanese cartoon character ? I know this might sound funny but on a lighter note I personally find some similarities between Doremon and AI-Agent to be honest and it helps to understand the core concept behind AI-Agent easily.

Just for fun…at its core following are some qualities of Doremon

Language Understanding : Doraemon can understand human language, much like how AI agents use LLM to understand the language and find the context.

Decision-Making : Doraemon chooses the best gadget for a situation, just like AI agents analyze a query and decide which tools to use and in what order.

Action Execution : AI agents use various tools to accomplish tasks, just as Doraemon uses his gadgets and run them to solve issues.

Implementation

Let’s explore and understand the most simplest single agent and how it works behind the scene. Since we are already talking about Doremon lets name out agent AgentDora

Our first goal is to grasp the fundamentals of a simple AI agent. To achieve this, we will NOT use any open-source libraries like LangChain, LlamaIndex etc. These libraries simplify the process by wrapping complex implementations, which can obscure how AI agents work internally. For now, let’s avoid these shortcuts and focus on the core concepts by implementing AI agent without any related library or framework

Simplest single agent implementation

Now, let’s define the tools for our agent. In this example, we will use two simple functions (tools). The first one is add_numbers, which adds two numbers and doesn't need to access the external world for information. The second function is get_ip, which takes a single URL as input and finds the IP address for that website.

Each tool will have docstrings, which are essential. Clear docstrings explain what each function does, so the AI agent can understand and select the right tool for the task.

Definition of tools for AgentDora

Doraemon uses his “4D Pocket” to pull out any tool when required. Similarly, we can create a ToolPocket class to store and retrieve details about our tools. In the ToolPocket class, we'll define an add_tool method that adds tools to a dictionary, with the tool name as the key and the tool's docstring as the value. This setup will help our AgentDora to browse through the list of available tools and find the right one for the task. Another method, tool_detailswhich will simply return all the tool details in a string format.

Now, Let’s define the prompt that will instruct AgentDora to understand the input query and find the correct tool from the ToolPocket to achieve the desired results.

Now, let’s start building our AgentDora.

Defining AgentDora class, which is main agent implementation. In this example we will be using gpt-3.5-turbo OpenAI LLM.

Adding some more methods to AgentDora class

Below method prepares tools pocket for AgentDora when agent is being initialized. While initializing agent we will be taking list of tool object as param and by using ToolsPocket class defined previously this method will prepare tools pocket for use.

One more method in AgentDora class to prepare prompt from template and put a call to LLM.

Below is main run method in AgentDora class which calls llm, extracts matching function (tool) and required parameters from response provided by LLM.

Define list of tools, OpenAI model and initialize AgentDora

Let’s put our AgentDora in action now.

For given query AgentDora is able to understand that get_ip tool (function) needs to be used to get the required results

Now lets provide different query to check if AgentDora is able to identify correct tool and extract given inputs from query to achieve given goal with help of the tool.

In below execution Agentdora automatically understood that add_numbers tool needs to be used and 3,9should be passed as parameters to execute the tool and get the results.

Above results clearly show that AgentDora is smart enough to get results by choosing right tool and executing it with correct parameters.

What if AgentDora doesn’t find the suitable tool in provided list ?

If you remember in prompt we have instructed agent to generate response automatically using LLM if no suitable tools are provided in given list.

In below execution AgentDora automatically understood there is no suitable tool provided to answer given query so it automatically requested LLM to provide the results and responded with same.

Fantastic !!! Our AgentDora can now autonomously decides whether to use any suitable tool from the provided list or provide response directly from LLM to answer the user’s query.

Please note that if we use open-source libraries like LangChain or LlamaIndex to define agents, or multi-agent frameworks like LangGraph or Crew.AI, we don’t need to write all the code above. We just need to define the set of tools followed by initializing the agent and the library or framework will handle the rest. This article was mainly focused in understanding behind the scene for agent so we have not used any libraries, in upcoming articles we will focus on agent implementation using opensource libraries and frameworks.

Summary

AI-Agent is an entity that leverages Large Language Models (LLMs) to understand user queries, breaks them into smaller tasks if needed, find suitable tools from a given list and execute them to get the results. These operations can be performed autonomously without user input.

By understanding the goal using natural language, AI-Agents can act and accomplish tasks on their own, opening tremendous possibilities in Retrieval-Augmented Generation (RAG) and other automation scenarios.

In this document, we focused on implementing a basic core agent from scratch without using any libraries. This helped us to understand how computer programs can use LLMs as their “brains” to perform tasks autonomously.

In real-world applications, multiple agents can work together, performing their tasks, coordinating and communicating with each other to achieve complex operations with little to no user intervention. To implement such army of agents, there are several frameworks available, like LangGraph Crew.AI, AutoGen etc. which we will explore in upcoming articles.

--

--