Part 4: “Scientific” Classification

AI Agents: From Dumb to Self-Learning

On the road to pragmatic AGI

Published in

Superstring Theory

5 min readAug 14, 2024

In the previous parts of our AI Multi-Agent series, we looked at why ChatGPT is not AI, how to build a simple, “Poor Man’s RAG” agent that uses context before interacting with you, and discussed the general anatomy of AI agents and their similarity to humans. Let’s delve deep into different types of AI Agents and discuss how to build them practically using LLMs and Integrail platform!

1. Generic AI Agent

Generic AI Agents: Sensors, Actuators (“Hands”), and the black box to select the action

As we discussed in the previous articles, all AI (and not only) agents can be described by the diagram above. Humans included. An Agent perceives the environment via sensors, has some (currently blackbox) mechanism to select the best response in the current situation, and then can enact change in the environment using actuators (e.g., hands for a human). But this is way too generic, so let’s move on to:

2. Simple Reflex Agent

Simple Reflex Agent: straightforward reaction to the current situation as perceived

This is the simplest possible Agent. It operates in a very straightforward way: sensors get a reading of the current environment state, “brain” selects the best action based on this current state. This seems very basic, but you don’t need anything more complex to play tic-tac-toe for instance. Your next move can always be 100% defined by the current board situation.

LLMs, or large language models, such as ChatGPT, in essence are also simple reflex agents: they take whatever the user inputs as the reading of the current “world state” and then respond directly to this input based on the probabilities learned. Boring!

Let’s make things a bit more interesting with:

3. State Agents

State agents keep certain state information in addition to the current sensors' reading

State agents keep some internal State information in addition to the current sensor’s reading. LLM RAG, or retrieval augmented generation, is an example of a simple state agent. Whatever is provided to the LLM as part of the context as a result of RAG is a kind of state — and it influences the output of your chatbot along with the current “sensor reading” — which is simply the latest user input. Then, the history of your conversation normally also becomes part of the state. This provides for much better experience when interacting with LLMs.

State agents need not be limited to LLM features, despite all the latest craze. As seen in the picture, in addition to basic state, your agent may also contain some reasoning functions that allow it to estimate “what will happen to the world if I do this or that” — and this makes it much more powerful. For instance, if you remember AlfaGo and AlfaZero, the “Deep Reinforcement Learning” Agents that beat the humans in Go and Chess respectively, use this technique to the fullest (although technically they are type 4 agents that we discuss below).

Before we discuss how we can build some similar agents ourselves, let’s look at the next type:

4. Model-Utility Agents

Model-Utility agents have a model of the world and a utility (“happiness”) function to optimize their behavior

Model-Utility Agents add some model of the external environment to the equation, which simplifies the “what if” analysis, as well as a Utility, or “Happiness”, function, which allows the Agent to be self-sufficient and goal-oriented. The Deep Reinforcement Learning agents mentioned in the previous section use these improvements to beat the humans — utility function drives them towards winning games (they are “happy” when they win and “unhappy” when they lose, so tend to avoid unhappy behavior as much as possible).

For the Agents that can make your life easier today using Integrail and that operate mainly online, the environment is the whole Web, or parts of it — such as specific tools (Email, CRM, CMS, etc). We have found that providing a basic vertical model (e.g., of the typical CRM environment) makes LLMs much better at solving complex tasks (responding to an email taking into account deal history and various interactions with the client as an example), and if you are able to add some sort of “Happiness” estimation with an additional Reinforcement Learning algorithm on top — you can create extremely powerful self-sufficient agents that will make you 10x more productive in whatever you do.

But of course, the “holy grail” of AI Agents is the:

5. Self-Learning Agents

Self-Learning Agents are able to adjust their behavior without human supervision

In the general self-learning agent scheme above, “normal agent” box designates all previous agent types we looked at, and the rest is the machinery that allows the agent to learn.

Problem generator, uhm, generates problems for the agent so that it can learn new behaviors. In the case of our online agents, this can be examples of customer emails together with different sales pipeline states — to which our agent must react and receive feedback from the Critic that evaluates performance.

Feedback from the Critic then goes to Learning Element, which in turn makes any necessary adjustments to our “brain” so that the performance of our agent improves.

The most popular self-learning method is mentioned above Deep Reinforcement Learning, but even “simple” Reinforcement Learning can do wonders when combined with LLM-based agents. This is an area with which we actively experiment at Integrail, and welcome you to do it with us and share this exciting journey together. Only imagine what’s becoming possible very soon: you simply ask your, personal, knowing your preferences and goals, agent to do a task for you — book an air or concert ticket, sort our your mailbox, write a web application — and it simply does it, using the tool models and “happiness” functions we provided.

Bonus: Building Agents

In the meantime, join Integrail and start experimenting with your own agents, no coding required! Simply choose and connect different “boxes”, from LLMs, to other GenAI models (such as image generation or text to voice), to automatic integration with any API available online — and then deploy to production in one click.