AI generated picture of Starling birds flying together in a pattern to avoid predators

Building Blocks of AGI — How Agent Architecture is Shaping the Way we Build AGI Systems

Madhukar Kumar
madhukarkumar

--

A deep dive into a shift from simple programming practices to the creation of sophisticated, autonomous Agents, paving the way for building AGI systems.

Introduction

Throughout the universe, if there is one repeating pattern, it is this — sometimes, just like an amoeba, things start with a single cell and purpose. Over time, they become bigger, more specialized, more complex, and then they start to break apart. Eventually as the volume grows, these independent entities interact with each other and form a network and often a bigger more complex organism.

Some examples include how an egg turns into a bird and eventually a flight of birds coordinate to fly and migrate together or how humans build a company and work together towards a common goal.

AI generated animation of Starlings murmurations

Within software development we have seen this pattern as well — from single file programs written in C to modules, objects, packages, to sophisticated microservices coordinating with each other.

With one year into the Generative AI boom, we are now seeing a similar pattern although this time around we are reusing a term that has been around for a while but is now taking a whole new life of its own — Agents (interestingly OpenAI is calling them Assistants). These, I believe, are the very early building blocks for building Artificial General Intelligence (AGI) systems.

Incidentally, exactly 20 years ago, my Master’s thesis was about creating an application embedded in Microsoft Word that used Agents (a set of independently deployed Java packages talking to each other via web sockets) to do research and help authors write better content in real-time. The project and the thesis was part of a new framework at that time called Multi-agent Software Engineering (MaSE).

Understanding MaSE

Multi-Agent Software Engineering (MaSE) is a software paradigm that views a software system as a collection of autonomous, interactive entities known as agents. Each agent in MaSE is designed to perform specific tasks and possesses the capability to interact with other agents to achieve complex goals. This approach is particularly effective in environments where tasks are distributed and require collaboration among various software components. However, two decades ago when there were no APIs, no NLP based interaction, this methodology stayed within the academic circles. With Gen AI, now we are seeing this come to a fruition.

First, let’s define what an Agent is and what it consists of.

A simple Agent has three core attributes that makes it self-contained and specialized:

1) A custom knowledge base and memory (this is in addition to the LLM’s knowledge it may interact with and a way to store and access memory,

2) Skills and tools — ability to do specialized tasks based on custom instructions provided to it and tools it can use, and finally

3) Effectors and Receptors— Ability to talk to other services and Agents to achieve a certain goal using Natural Language and APIs.

A conceptual diagram of an independent, specialized agent

The AI Architecture Evolution

In the realm of Gen AI when the LLMs became popular we saw the emergence of two open source frameworks — LangChain and Llamaindex that allows developers to ferry information back and forth from LLMs in the early days. As the use cases evolved that required semantic search of data that LLMs were not aware of, we saw encapsulations of objects to talk to vector databases in a pattern called Retrieval Augmented Generation (RAG).

To be fair, LangChain always had the notion of an Agent and you could use a combination of Tools and vector stores to build a custom use case. More recently, LangChain added a few things to the stack that made this architecture more useful and production ready. This includes LangSmith, a way to debug your LLM applications, LangServe — a way to deploy your application and finally — Templates — a way to create independent and specialized AI services that are then automatically also exposed as API endpoints, making them able to communicate easily with other Templates and agents.

The Rise of Autoagents

When you combine multiple specialized entities (agents), you now have some interesting use cases that start to become a reality, for example, researching, writing, self-critiquing, rewriting, and publishing similar to a marketing function. If this sounds very similar to Artificial General Intelligence (AGI) embodied in a physical object, for example, a car or a humanoid, you can now see how these building blocks could be modularly assembled to create AGI systems.

Currently, there are a few popular frameworks that allow users to create agents and have them work together towards a common goal.

  1. AutoGPT — One of the original GenAI Autoagents framework, you can use it to build your own agent and once you define a goal, AutoGPT works with multiple agents, do self-prompting, have a chain of thoughts and complete tasks.
  2. SuperAGI — Very similar to AutoGPT, this allows users to also build their own agents.
  3. Microsoft’s Autogen: Microsoft’s recent announcement of Autogen marks another significant milestone. In Autogen, agents can be defined in a configuration file and managed through a proxy agent, incorporating a human-in-the-loop approach. This allows for more dynamic and responsive AI systems.
Autogent Architecture from Microsoft’s Autogen Documentation

Examples:

Let’s look at an example to illustrate how an agent is created, and how they can be used together to solve a specific task. My favorite is LangChain and templates using private data because the creator has the most flexibility and granular level of control over the agent.

Step 1 — Install Langchain CLI and create a new app

pip install -U langchain-cli
langchain app new my-app

Step 2 — Add an existing template to your app

langchain app add pirate-speak

Step 3 — Deploy and access the APIs. Once you add the template, you can open the code in your IDE and make changes to suit your requirements. Finally, the entire app is now available to you to invoke from other services and apps.

Example APIs generated from LangChain Templates from LangChain’s Documentation

In Autogen, the way you create agents look slightly different. You can configure and create agents with a few lines of code. Here is a simple example from Autogen’s documentation.

from autogen import AssistantAgent, UserProxyAgent, config_list_from_json

# Load LLM inference endpoints from an env variable or a file
# See https://microsoft.github.io/autogen/docs/FAQ#set-your-api-endpoints
# and OAI_CONFIG_LIST_sample
config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")
# You can also set config_list directly as a list, for example, config_list = [{'model': 'gpt-4', 'api_key': '<your OpenAI API key here>'},]
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding"})
user_proxy.initiate_chat(assistant, message="Plot a chart of NVDA and TESLA stock price change YTD.")
# This initiates an automated chat between the two agents to solve the task

Last week, Open AI jumped into the world of multi-agent architectures and came up with its own definition as well as methodology. They named it Assistants. It appears this is their foray into the ability to gradually build AGI with iterative advances in the Assistant framework. Let’s take a look at this new framework.

OpenAI’s Assistants: A Parallel Development

OpenAI’s Assistants feature a few concepts that are very similar to the agents. Every assistant has threads (multiple messages in one conversation), the ability to use tools (custom functions and API call outs), and store and retrieve context and memory (retrieval tool). However, at the time of writing this article, they currently lack API support and the capability for inter-agent communication, which might be on the horizon. This is not to discount the fact that you can call these agents and execute them through your code if you use their SDKs. What is impressive about this framework though is the fact that you can use this framework to generate images, and also send audio, text and video files as input.

This means, I can build an Assistant that can see and hear and respond either in text, image, video, sound or can go and perform an action. In my opinion, we are now witnessing the very beginning blocks of building AGI applications.

Comparative Analysis: Multi-Agents vs. Assistants

When comparing multi-agent systems with OpenAI’s Assistants, several key differences emerge:

  • Control and Target Audience: Multi-Agent systems offer more fine-grained control and are typically more suited for B2B applications. You have more options of using multiple LLMs, using your own corpus of data and the ability to put guardrails as needed. With Assistants, so far it seems like it is targeted toward the B2C use cases and more towards the independent developers and startups looking to build and go to market fast.
  • Pricing and Accessibility: As of now LangChain is open source but if you use certain enterprise features, I believe those would come in as paid options. You also have to build and then deploy to your infrastructure that includes the cost of using LLMs, creating embeddings, storing vectors and data and finally the use of compute. Assistants, on the other hand, have a very steep price of entry when it comes to retrieval tools. As of now it costs $0.20 per GB per day for storing and retrieving unstructured data within Assistants. This means if I was using OpenAI to use 1 GB of data, I would be paying about $73 per year!! In comparison, a Dropbox account costs about $18 per month for 3TB of data. If OpenAI is looking to become the repository for all unstructured data to be used by LLMs, this pricing would need to come down significantly.
Screenshot of OpenAI’s Pricing Page for Assistants showing the cost of Retrieval to be 20 cents per GB per day

Conclusion

The evolution from modules to agents in software engineering reflects the industry’s ongoing quest for more dynamic, adaptable, and efficient systems. As AI continues to evolve, the integration of agents in platforms like LangChain and AutoGPT, along with developments like Microsoft’s Autogen and OpenAI’s Assistants, will likely redefine how we interact with and leverage AI and move towards building AGI systems.

--

--

Madhukar Kumar
madhukarkumar

CMO @SingleStore, tech buff, ind developer, hacker, distance runner ex @redislabs ex @zuora ex @oracle. My views are my own