Intro of AI agent, & AI agent projects summary

Henry Heng LUO
16 min readJan 23, 2024

--

Why AI Agents

Artificial intelligence (AI) is developing rapidly in the field. Today’s AI agents can perceive, decide and act on their own. With the rise of large language model (LLM) driven AI agents, we are on the edge of a new era: AI agents may form their own society and coexist harmoniously with humans.

Newton once said: “If I have seen further, it is by standing on the shoulders of giants.” Now these giants are AI agents, coming to help shoulder the heavy work.

In today’s article, we will introduce some of the best open source AI agents and multi-agent frameworks that can be used in personal and enterprise contexts, and discuss:

  • How AI agents create opportunities for innovation and efficiency.
  • Which multi-agent frameworks offer the best capabilities.
  • When it is best to implement agents to solve real-world practical problems
  • What impact autonomous agents will have on AI-driven task management.

We will also dive deeper into some opportunities, challenges, and trends around agent architectures.

Intro to AI Agents

Tools like ChatGPT, DALL-E 3 or Midjourney use prompt-based interfaces for human-machine interaction. This means you need to write a set of instructions in natural language (often followed by a lot of trial-and-error prompt tweaking) to get meaningful responses. Given the capabilities of AI models, it is slow and counter-intuitive. We need better, more efficient ways to interact with AI.

AI agents play the role of AI foremen. They work in a self-directing loop, setting tasks for AI, determining priorities, and re-prioritizing tasks until the overall goal is achieved.

The overall framework of AI agents consists of three key parts: brain, perception, and action. [1]

  • Brain: The brain is mainly composed of a large language model, which not only stores knowledge and memory, but also undertakes information processing and decision-making functions, and can present reasoning and planning processes to deal well with unknown tasks.
  • Perception: The core purpose of the perception module is to expand the perceptual space of the agent from the pure text domain to include textual, auditory and visual modalities.
  • Action: In the construction of the agent, the action module receives the action sequence sent by the brain module and performs actions that interact with the environment.

After perceiving the environment, humans integrate, analyze, and reason the perceived information in their brains, and make decisions. Subsequently, they utilize the nervous system to control their bodies and engage in adaptive or creative actions, such as conversing, avoiding obstacles, or starting a fire. When an agent possesses a brain-like structure, along with knowledge, memory, reasoning, planning, generalization capabilities, and multimodal perception abilities, it also has the potential to exhibit various human-like actions in response to the surrounding environment. During the construction process of an agent, the action module receives action sequences sent by the brain module and executes actions that interact with the environment.

LLM-driven AI agents have the following advantages:

  • Language interaction: Their inherent ability to understand and generate language ensures seamless user interaction.
  • Decision-making capabilities: Large language models have the ability to reason and make decisions, making them adept at solving complex problems.
  • Flexible adaptation: The adaptability of agents ensures they can be shaped for different applications.
  • Collaborative interaction: Agents can interact with humans or other agents collaboratively, paving the way for multifaceted interaction.

The use cases of AI agents are extensive and diverse. These intelligent agents are driven by large language models (LLM) and can be used in a variety of scenarios, including:

  • Single agent applications: Agents can serve as personal assistants to help users get rid of daily chores and repetitive labor. They are able to independently analyze, strategize, and solve problems, reducing individual workload and improving task solving efficiency.
  • Multi-agent systems: Agents can interact with each other in collaborative or competitive ways. This allows them to make progress through teamwork or adversarial interactions. In these systems, agents can collaboratively complete complex tasks or compete with each other to improve their performance.
  • Human-machine cooperation: Agents can interact with humans, providing assistance and executing tasks more efficiently and safely. They can understand human intent and adjust their behavior to provide better service. Human feedback can also help agents improve their performance.
  • Professional domains: Agents can be trained and specialized for specific domains such as software development, scientific research, or other industry-specific tasks. They can leverage the pre-training on large corpuses and the ability to generalize to new tasks to provide expertise and support in these areas.

These are just a few examples of AI agents. Their versatility and capabilities make them suitable for a wide range of applications and industries.

Moreover, agent society is a concept where artificially intelligent agents created with language models interact with each other in a simulated environment. These agents can act, make decisions, and engage in social activities like humans.

It helps us understand how AI agents may collaborate and behave in a society-like setting. This simulation can provide insights into collaboration, policymaking, and ethical considerations. Overall, agent society helps us explore the social aspects of AI agents and their interactions in real and controlled environments.

AI Agent Development Frameworks

There are many frameworks that can help create AI agents. Here are some of the best frameworks.

  1. LangChain

LangChain is a framework for building applications backed by language models. It enables applications to:

Perceive context: Connect language models to sources of context (prompt instructions, few shot examples, content of responses, etc)

Reason: Rely on language models for reasoning (about how to respond based on provided context, what actions to take, etc)

LangChain has the following core components:

LangChain Library: Python and JavaScript libraries. Contains interface and integrations for arrayless components, basic runtime for chaining components into chains and agents, and off-the-shelf implementations of chains and agents.

LangChain Templates: A set of ready-to-deploy reference architectures suited for various tasks.

LangServe: A library for deploying LangChain chains as REST APIs.

LangSmith: A developer platform for debugging, testing, evaluating, and monitoring chains built on any LLM framework, with seamless integration to LangChain.

2. AutoGen

AutoGen is a framework that supports developing LLM apps using multiple agents (Agents) that can converse with each other to solve tasks. AutoGen’s agents are customizable, conversational, and seamlessly allow human participation. AutoGen apps can operate using combinations of large language models, human input, and various modalities of tool use.

Some key highlights of AutoGen:

Easily build next-gen LLM apps based on multi-agent conversations. It streamlines complex LLM workflows with orchestration, automation, and optimization. It maximizes LLM performance and overcomes their weaknesses.

Supports complex workflow with diverse conversation modes. With customizable and conversational Agents, developers can use AutoGen to build various conversation patterns involving conversational autonomy, number of agents, and agent conversation topologies.

Provides a spectrum of task systems with varying complexity. These systems cover a wide breadth of applications across domains and complexity. This demonstrates how AutoGen can easily support different conversation modes.

Enhanced LLM reasoning. It provides utilities like API unification and caching, and advanced usage patterns like error handling, multi-config reasoning, context programming, etc.

3. PromptAppGPT

PromptAppGPT is the first natural language application development framework based on LLMs: supports fully automatic compilation, running, UI generation, supports no-code configuration to implement process scheduling, supports tens of lines of low code to implement AutoGPT-like fully autonomous agents.

PromptAppGPT significantly lowers the threshold for agent development: no need to download any software, just open the website (https://promptappgpt.wangzhishi.net/) to develop.

PromptAppGPT contains low-code prompt-based development, GPT text generation, DALLE image generation, online prompt editor + compiler + runner, automatic user interface generation, support for plugin extensions, and more:

⚡ Low-code prompt-based rapid application development

🧠 GPT3/4 executor for text generation

🍯 Dalle executor for image generation

🔌 Custom extension executors (plugins)

#️⃣ Online prompt editor, compiler and runner

⚙️ Automatically generated user interface

🧨 English and Chinese user interface

PromptAppGPT also contains the following built-in agent examples:

All Executors: An app that uses all executors.

My ChatGPT: A chatbot app.

Imaginative Image Creator: An agent that creates imaginative images from any language using GPT and DALL·E.

Pizza Order Bot: An automated agent that collects pizza restaurant orders.

Universal Translator: An agent that translates text in any language to English/Chinese/French/Spanish.

English Improver: An agent that does English translation and proofreading.

Web & ImageSearcher: An agent that uses Bing to search the web and images.

My AutoGPT: An agent similar to AutoGPT that can fully autonomously use GPT and executors (plugins) to accomplish any goal.

4. AutoGPT

AutoGPT was developed by Toran Bruce Richards, founder of Significant Gravitas Ltd. video game company, and launched in March 2023 as one of the early agents. It is also the most popular agent project on GitHub today.

The idea behind AutoGPT is simple — it is a complete toolkit for building and running custom AI Agents for various projects. The tool utilizes OpenAI’s GPT-4 and GPT-3.5 large language models (LLM), allowing agents to be built for a range of personal and business projects.

5. BabyAGI

BabyAGI is a minimalist task-driven autonomous agent. The Python script has just 140 characters of code and, according to the official GitHub repo, “uses OpenAI and a vector database (e.g. Chroma or Weaviate) to create, prioritize, and execute tasks.”

Since its release, BabyAGI has expanded into several interesting projects. Some like twitter-agent or BabyAGI on Slack bring the power of agents to existing platforms. Others add plugins and addons or port BabyAGI to other languages (e.g. Babyagi-perl).

6. SuperAGI

SuperAGI is a more flexible and user-friendly alternative to AutoGPT. Think of it as an integrator for open-source AI agents that contains everything needed to build, maintain, and run your own agents. It also includes plugins and a cloud version to test in.

The framework has multiple AI models, a graphical user interface, integration with vector databases (for storing/retrieving data), and performance insights. There is also a marketplace with toolkits to connect it to popular apps and services like Google Analytics.

SuperAGI features:

Configure, generate and deploy autonomous AI Agents — Create production-ready and scalable autonomous Agents.

Extend Agent capabilities with Toolkits — Add toolkits from our marketplace into Agent workflows.

Graphical User Interface — Access Agents via a graphical user interface.

Operations Console — Interact with them by providing input and permissions to Agents.

Multiple Vector Databases — Connect to multiple vector databases to enhance Agent performance.

Performance Telemetry — Get deep insights into Agent performance for optimization.

Optimize Token Usage — Control token usage for effective cost management.

Agent Memory Store — Enable Agents to learn and adapt by storing memories.

Models — Custom fine-tuned models for specific business use cases.

Workflows — Easily automate tasks with predefined steps using ReActLLM.

7. ShortGPT

ShortGPT is a powerful framework for automating content creation. It simplifies the tasks of video creation, sourcing materials, voice synthesis, and editing.

ShortGPT can handle most typical video-related tasks like writing video scripts, generating voiceovers, selecting background music, writing titles and descriptions, and even editing video. The tool is suitable for short-form video content across platforms as well as long-form video content related tasks.

Some key features of the ShortGPT framework:

🎞️ Automatic editing framework: Simplifies video creation process using LLM-oriented video editing language.

📃 Scripts and prompts: Ready-to-use scripts and prompts for various LLM automated editing workflows.

🗣️ Voiceover/content creation: Support for multiple languages including English🇺🇸, Spanish🇪🇸, Arabic🇦🇪, French🇫🇷, Polish🇵🇱, German🇩🇪, Italian🇮🇹, Portuguese🇵🇹, Russian🇷🇺, Mandarin🇨🇳, Japanese🇯🇵, Hindi🇮🇳, Korean🇰🇷 and over 30 more languages (using EdgeTTS).

🔗 Subtitle generation: Automatically generate subtitles for videos.

🌐🎥 Resource sourcing: Fetch images and video clips from the internet as needed, connecting web and Pexels API.

🧠 Memory and persistence: Ensure long-term persistence of automated editing variables using TinyDB.

8. ChatDev

ChatDev is a virtual software company operated by various agents playing different roles including CEO, CPO, CTO, programmers, reviewers, testers, designers. These agents form a multi-agent organizational structure united by the mission of “radically changing the digital world through programming.” Agents at ChatDev collaborate by attending specialized functional workshops including design, coding, testing, documentation etc.

The main goal of ChatDev is to provide an easy-to-use, highly customizable and extensible framework based on large language models (LLM) that is an ideal testbed for studying collective intelligence.

Tools like CoPilot, Bard, ChatGPT, and many others are powerful coding assistants. But projects like ChatDev may soon compete with them. ChatDev is described as a “virtual software company” that uses not one but multiple agents playing different roles in traditional dev organizations. Each agent is assigned a unique role and can collaborate on various tasks from designing software to writing code and documentation.

9. MetaGPT

MetaGPT is another open source AI agent framework that tries to mimic the structure of a traditional software company. Similar to ChatDev, agents are assigned product manager, project manager, and engineer roles and collaborate on user-defined coding tasks.

10. Camel

In a nutshell, Camel is one of the earlier multi-agent frameworks that uses a unique role-playing design to enable multiple agents to communicate and collaborate with each other.

Everything starts with a human-defined task. The framework leverages the powerful capabilities of LLMs to dynamically assign roles to agents, specify and develop complex tasks, and stage role-playing scenarios to enable collaboration between agents.

11. JARVIS

JARVIS handles task planning, model selection, task execution, and content generation. By accessing dozens of specialized models in the HuggingFace hub, JARVIS leverages ChatGPT’s reasoning abilities to apply the best models for a given task. This makes it quite versatile across a range of tasks from simple summarization to object detection.

JARVIS introduces a collaborative system comprised of a large language model as the controller and many expert models (from the HuggingFace Hub) as collaborative executors. The system’s workflow consists of four stages:

Task planning: ChatGPT analyzes the user’s request to understand their intent and breaks it down into addressable tasks.

Model selection: To solve the planned tasks, ChatGPT selects expert models hosted on Hugging Face based on description.

Task execution: Each selected model is invoked and executed, returning results to ChatGPT.

Response generation: Finally, ChatGPT consolidates all model predictions and generates a response.

12. OpenAGI

OpenAGI is an open source AGI (artificial general intelligence) research platform that combines small expert models (models tailored for tasks like emotion analysis or image deblurring) and task feedback reinforcement learning (RLTF) to improve its outputs. It brings together popular platforms like ChatGPT, large language models like LLaMa2, and other specialist models, dynamically selecting the right tools based on the task context.

OpenAGI is an open source AGI research platform specifically designed to provide complex multi-step tasks, and comes with task-specific datasets, evaluation metrics, and a variety of scalable models. OpenAGI expresses complex tasks as natural language queries that serve as input to the LLM. The LLM then selects, orchestrates, and executes models provided by OpenAGI to solve the task. Additionally, the project also proposes a task feedback reinforcement learning (RLTF) mechanism that takes task solving results as feedback to improve the LLM’s task solving abilities. Thus, the LLM is responsible for orchestrating various expert models to solve complex tasks, while RLTF provides feedback for self-improvement in the AI, providing a feedback loop for self-improving AI. The paradigm of LLM orchestrating various expert models to solve complex tasks is a promising approach towards achieving AGI.

13. XAgent

XAgent is an experimental open source large language model (LLM) driven autonomous agent that can automatically solve a variety of tasks. It is designed as a general agent that can be applied to a wide range of tasks. XAgent is still in early stages with developers working to improve it.

XAgent is designed with the following characteristics:

Autonomy: XAgent can automatically solve various tasks without human involvement.

Safety: XAgent is designed to run safely. All operations are constrained within docker containers no matter how you run it!

Extensibility: XAgent is designed to be extensible, easily adding new tools to enhance agent capabilities or even new capabilities!

GUI: XAgent provides a friendly GUI for users to interact with the agent. Command line interface is also available to interact with the agent.

Human Collaboration: XAgent can collaborate with humans on tasks. It is not only able to follow human guidance along the way to solve complex tasks, but also seek help from humans when encountering challenges.

XAgent consists of three parts:

🤖 Dispatcher: Responsible for dynamically instantiating tasks and dispatching tasks to different agents. It allows adding new agents and enhancing agent capabilities.

🧐 Planner: Responsible for generating and correcting task plans. It breaks down tasks into subtasks and generates milestones, allowing agents to solve tasks incrementally.

🦾 Actor: Responsible for taking actions to achieve goals and complete subtasks. The actor utilizes various tools to solve subtasks, and it can also collaborate with humans to solve tasks.

Roles and Challenges of AI Agents

“So what can I use agents for?” is a fair question, and we’d love to say “everything” but that’s far from the truth considering the current state of the technology. Still, even in their nascent stages, AI agents can make life and work easier by:

  • 🔎 Simplifying research and data collection.
  • ✏️ Generating content in various styles and tones.
  • 🌐 Crawling the web and extracting key insights.
  • 💭 Summarizing documents and spreadsheets.
  • 🔀 Translating content between languages.
  • 🤝 Acting as virtual assistants for creative tasks.
  • ⚡️ Automating management tasks like scheduling and tracking.

Agents will continue to evolve from prompt-based tools that need human interaction to fully autonomous systems running in self-directing loops. After all, this is what AI tools should be — automatic, trustworthy, reliable, without lengthy prompts or vetting every step.

Suppose you want to analyze electric vehicle (EV) industry market trends over the past decade. You could delegate these tasks to an agent while doing other things, rather than manually collecting data, reading countless articles, and parsing financial reports.

Even with tools like ChatGPT, humans still need to stay on top of developments. Agents can help find the right information, take notes, and organize everything. If some data already exists, agents will provide enriched key insights in seconds.

Sometimes a project can be too complex for one agent to manage. With a multi-agent setup, each agent is responsible for a part of the project. One agent can collect data while another creates the report outline. Then a third agent can compile the information and generate actual content.

Fully autonomous agents are still the wild west of AI tools, largely experimental and requiring some technical know-how to set up, deploy, and maintain. This is great for DIY projects but not a plug-and-play experience if you just want to get work done. It is technically possible to integrate open source agents into existing workflows. But it takes time, expertise, and resources.

Of course, there is also the issue of hallucination. Since agents rely on large language models to generate information, they are equally prone to veering into bizarre narratives without factual basis. The longer an agent runs, the more likely it is to fabricate and distort reality. This poses some dilemmas from a productivity standpoint. Some simple loops include: limiting agent runtimes, narrowing the scope of tasks, having a human in the loop to review outputs, and so on.

Better results may come from deploying multiple agents with specialized expertise and unique skills — so multi-agent frameworks could gain more traction.

Future of AI Agents

With faster, more accurate, and larger-scale iterations of AI models like GPT-4, Bard, and LLaMa2 on the horizon, we may see many more exciting breakthroughs in AI development in the coming months. Especially the rise of AI agents marks a monumental shift in the digital realm. These agents possess the capabilities to understand, create, and interact — they are not just tools but potential collaborators across domains. As we stand at the crest of this revolution, we must harness their powers responsibly.

The tools and platforms available today allow us to customize agents for different tasks, but we must also remain vigilant and consider the ethical impacts of these advancements. The bridge between humans and AI has never been shorter, and as we move forward, harmonious coexistence not only seems possible but imminent.

In the foreseeable future, agents will redefine how we view work, planning, and collaboration. They will revolutionize productivity and augment traditional workflows. So, are you ready to join the revolution?

References

[1] Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E. and Zheng, R., 2023. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864.

[2] Taskade, Top 11 Open-Source Autonomous Agents & Frameworks: The Future of Self-Running AI, https://www.taskade.com/blog/top-autonomous-agents.

[3] BILAL MANSOURI, What Are LLM Agents ? An Overview of Their Capabilities, https://gptpluginz.com/llm-agents/.

--

--

Henry Heng LUO

A highly self-motivated and enthusiastic data scientist.