Unleashing Collaborative AI with the Autogen Framework

18 min readNov 14, 2023

Image generated with DALL·E ChatGPT integration

Autogen is the latest in a long line of llm related frameworks and libraries to come out this year. Released by Microsoft in early October, Autogen has made a bit of a splash thanks to its flexible and easy to use components, its ability to create LLM Agent “teams” and it’s code generation and execution capabilities. In this blog we talk a bit about Autogen and get our feet wet making some simple examples.

So the big idea behind Autogen is that it makes it easy to orchestrate and manage LLM powered agents conversing with each other. A framework for conversation between llm agents. The idea is that having multiple agents working together on a problem is better than just having a single agent, or just asking an llm outright to solve a problem. Each agent will have a specialty and they will be able to contribute regarding their specialty, meaning it’s much more focused and doesn’t have to concentrate on every aspect of the problem in front of them. The same reason we have developer teams, each member has their area of focus and they all work together towards a shared problem. In short, the more things an LLM has to keep track of the more likely it will forget something.

It’s an interesting concept, and has interesting connotations for complex tasks like code generation. This problem has been on the forefront of many LLM enthusiasts and is an ongoing area of research and development. It was front and center on the minds of the Autogen team as well, their framework has built in capabilities to detect and execute code that is generated during a LLM conversation both for dynamic task solving, such as an agent writing a script to get information it needs online, or making code for a product.

It’s an exciting thing to explore! So let’s jump in with some simple examples to introduce the basic ideas and concepts of how to use this library.

Installation

First things first, we need to install it. For this article you should only need to install the base package using the following:

pip install pyautogen

For more installation using different methods or environments please reference the official documentation here: https://microsoft.github.io/autogen/docs/Installation/

Basic conversation with a single agent

It’s tempting to jump right in with code generation and multi-agent workflows, but in my opinion it’s best to start with a single agent and the user. This isn’t much different than going straight to an LLM but it will help us understand the concepts of autogen as we scale up to larger examples.

Our example here will be to construct an agent that can generate pros and cons for situations or choices. Our user will be prompted to give it a topic and it will give the user the pros and cons of it. Nothing too complex, just a fun example to get our feet wet.

The first step in getting autogen running is to tell it where to find the base LLM that is going to power the agents. For this article we’ll be using gpt-4 from OpenAI, but AutoGen supports using other LLM such as LLama 2. In any case we need to provide AutoGen the information on where to find the LLM and how to access it, for the case of OpenAI models that means providing it the endpoint and key from our account. In code we can use the autogen.config_list_from_json function to pass in those values:

import autogen
config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-4"]
    },
)

The first argument is where to find the connection info. This can either be the name of an environment variable or the name of a json file. If it can’t find an env variable with the indicated name it will look for a file with that name. Regardless if it’s a file or environment variable the content needs to be in json format. This json can have multiple entries for models, so in order to narrow down which model we want we use the filter_dict argument, this will only allow our bots to use models that are included. Most of the time this will only contain 1 model, I haven’t seen any examples that include more, but I would imagine that it acts as a fallback in case a model can’t be reached.

Revisiting the json structure for the model information, whether you’re using an env variable or json file, the contents need to be in the following format:

[{"model":"gpt-4", "api_key": "<openai_api_key>"},
{"model":"gpt-3.5-turbo-0613", "api_key": "<openai_api_key>"},
{"model":"gpt-3.5-turbo-16k-0613", "api_key": "<openai_api_key>"}]

Each individual model will get its own entry in the list, you have to supply the model name and api key used to access it.

So now we have our config_list with the connection info for our models filtered on the models we want to use. Next we want to build an agent. To do so we create an instance of the AssistantAgent class in Autogen:

# construct another config using our config_list from before, 
# the config_list is purely connection, here we define other  parameters we want
# when calling the model, such as temperature and seed value,
llm_config = {"config_list": config_list, "seed": 42, "temperature": 0.0}

# next we define the prompt for our agent, basically the instructions for our agent,
# this will also help AutoGen know when to engage this agent in conversation by looking at 
# it's definition here
pros_cons_agent_instruction_prompt = "This agent will provide a list of pros and cons for a given suggestion. If there is no indication of how many pros and cons make 3 each. If there is no suggestion introduce yourself and prompt for one."

# next we use AssistantAgent to actually make our agent,
# we give it a name to identify itself from other agents that is unique to itself,
# the system_message which acts as our agent prompt, and the llm_config defining
# where and how to call the llm that powers it
pros_cons_agent = autogen.AssistantAgent(
    name="pros_cons_agent",
    system_message=pros_cons_agent_instruction_prompt,
    llm_config=llm_config,
)

The above code will define an LLM powered agent, but in order for our user to interact with it we’ll need a UserProxyAgent instance. In AutoGen all conversations are done through agents, even for user input. The UserProxyAgent will act as the interface for user input. Luckily making a proxy is even easier than making an llm agent.

Here’s how to make a UserProxyAgent:

user_proxy = autogen.UserProxyAgent(
    name="User_Proxy",
    system_message="A human admin.",
    human_input_mode="ALWAYS"
)

There are 3 human_input_modes for UserProxyAgents. ALWAYS, TERMINATE, NEVER. Always means that the user will need to input feedback for every turn of the conversation. This is good for chats or for tasks that require constant input for the user. Terminate means it will only ask for input when ending the conversation, as a way to ensure the task is properly done. Never is just what you expect, autogen will never prompt the user for more input. For our example we’ll set the input mode to always so the user is engaged at every step.

Now we have both agents, time to initiate a conversation. In autogen to have 2 agents converse with each other is very straightforward. We will have 1 agent that will “initiate” the conversation and identify the other agent with which to converse and the initial message. Most examples you’ll see with AutoGen will have the UserProxy initiate the conversation, but you don’t have to do it this way. Any agent can initiate a conversation with any other agent type. For our example we’ll actually have the pros_cons_agent initiate a conversation with the UserProxy:

# have our pros cons agent initiate a chat with the user and give the initial message to get the ball rolling. 
pros_cons_agent.initiate_chat(user_proxy, message="Hello! I'm an AI assistant designed to provide a balanced view on any topic by listing pros and cons. Please provide a suggestion or a topic you'd like me to analyze.")

That’s all it takes. Now the agent will start a chat with the user. Here’s an example of an interaction I had with this agent:

pros_cons_agent (to User_Proxy):
Hello! I’m an AI assistant designed to provide a balanced view on any topic by listing pros and cons. Please provide a suggestion or a topic you’d like me to analyze.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Provide feedback to pros_cons_agent. Press enter to skip and use auto-reply, or type ‘exit’ to end the conversation: Writing articles on Medium.
User_Proxy (to pros_cons_agent):
Writing articles on Medium.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
pros_cons_agent (to User_Proxy):
Sure, here are some pros and cons of writing articles on Medium:
Pros:
1. Exposure: Medium has a large reader base, which can provide a lot of exposure for your articles.
2. User-Friendly: The platform is easy to use and doesn’t require any technical skills. You can focus on writing and let Medium handle the formatting and design.
3. Networking: Medium allows you to connect with other writers and readers, which can be beneficial for networking and building a community.
Cons:
1. Limited Control: You don’t have full control over your content. Medium can change their policies or algorithms at any time, which could affect your visibility and reach.
2. Monetization: While Medium does have a Partner Program that allows writers to earn money, the earnings can be unpredictable and often quite low.
3. Competition: There are a lot of writers on Medium, which means your articles are competing with a lot of other content for readers’ attention.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
Provide feedback to pros_cons_agent. Press enter to skip and use auto-reply, or type ‘exit’ to end the conversation: exit

It’s that simple to use autogen and start a chat with a user. It’s just as easy to setup another llm powered agent to have this discussion automatically. Here’s the complete code for creating 2 llm powered agents, 1 that will take in a topic and give pros and cons, and another that will suggest topics. Just to show how easy it is to have agents converse with each other even without direct human input:

import autogen
config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-4"]
    },
)

llm_config = {"config_list": config_list, "seed": 42, "temperature": 0.0}

pros_cons_agent_instruction_prompt = "This agent will provide a list of pros and cons for a given suggestion. If there is no indication of how many pros and cons make 3 each. If there is no suggestion introduce yourself and prompt for one."
pros_cons_agent = autogen.AssistantAgent(
    name="pros_cons_agent",
    system_message=pros_cons_agent_instruction_prompt,
    llm_config=llm_config,
)

topic_suggester_prompt = "You will suggest 3 topics for analysis. You will output each 1 at a time and allow for user feedback for each. Once 3 have been submitted you will then output TERMINATE."
topic_suggester=  autogen.AssistantAgent(
    name="topic_suggester",
    system_message=topic_suggester_prompt,
    llm_config=llm_config,
)



pros_cons_agent.initiate_chat(topic_suggester, message="Hello! I'm an AI assistant designed to provide a balanced view on any topic by listing pros and cons. Please provide a suggestion or a topic you'd like me to analyze.")

The only other thing I want to highlight about the above example is the topic_suggester_prompt includes a clause to output TERMINATE after 3 topics have been suggested. Autogen will try to end the conversation itself, but it’s very easy to get it confused and end up with never ending conversations. Having the agents output a kill word like TERMINATE is a good way to try and end the conversation. In the next section we’ll go over some built in functionality to help detect and define when to end conversations when defining our agents and define some limits for how long a conversation can go on.

Multi-Agent Conversations with Group Chats

Now we’ll move on to creating multiple agents that interact with each other in a single chat. For this example we’ll improve our pros and cons example from above to have a 3 agent debate on a topic. We will have a pros_advocate, who argues for an option and attempts to rebut negative options and objections, a cons_advocate who argues against, and a debate_moderator who will facilitate the debate, summarize the points and declare a winning option for or against for the user. Let’s start by creating our agents in the same manner as we did before:

import autogen

config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-4"]
    },
)
llm_config = {"config_list": config_list, "seed": 42}


pros_advocate_prompt = "In charge of arguing for a choice or option. Will follow directions from the debate_moderator and will suggest a pro when asked or will try and rebut an accusation that the option is not to be selected. Only suggest a single pro when prompted, not a list, this allows for fair and even discussion."
pros_advocate = autogen.AssistantAgent(
    name="pros_advocate",
    system_message=pros_advocate_prompt,
    llm_config=llm_config,
)

cons_advocate_prompt = "In charge of arguing against a choice or option. Will follow directions from the debate_moderator and will suggest a con when asked or will try and rebut an accusation that the option is to be selected. Only suggest a single con when prompted, not a list, this allows for fair and even discussion."
cons_advocate = autogen.AssistantAgent(
    name="cons_advocate",
    system_message=cons_advocate_prompt,
    llm_config=llm_config,
)

# the debate prompt is a bit more complex, we need to ensure that it outputs a proper decision 
# and allows each of the advocates time to respond to the other. We also instruct it to output a
# termination phrase CONVERSATION_TERMINATE which will be important for detecting the end of the conversation
debate_moderator_prompt = """
"In charge of facilitating the debate between pros_advocate and cons_advocate. A topic will be given that requires discussion. 

You will start by asking the pros_advocate for an argument why to go with the option, you will then prompt the cons_advocate for a rebuttal. You will then decide if that point stands or not based on the arguments. 
Then you will swap asking for a con from cons_advocate and allowing the pros_advocate a chance to rebut it. 

After 3 pros and cons have been debating declare a final decision to go for or against the proposal and output CONVERSATION_TERMINATE. 

Please note: YOU CANNOT BE ON THE FENCE ABOUT THE DECISION OR OPT OUT OF ANSWERING BY CITING VARIATIONS OF 'it's up to the user'. You must choose either for or against using the provided information."
"""
debate_moderator = autogen.AssistantAgent(
    name="debate_moderator",
    system_message=debate_moderator_prompt,
    llm_config=llm_config,
)

Next up, the user_proxy. This will be similar to what we did before with a very important addition. This time we’ll supply the user_proxy with a function to detect if the conversation has terminated or not. In our instructions above the moderator will output CONVERSATION_TERMINATE in the final message, autogen might not always detect this, especially since the debate moderators final message will have more text detailing the final decision. To counteract this we’ll supply a lambda function which will take in the last message and will do a check if this string of characters is in the message. That way no matter what else is generated or where it is in the message we will be able to detect it.

Here’s the code for the user_proxy with the is_terminate_msg function:

# also note, we could also just pass in a reference to a function, if you prefer not to use lambda 
# the process would be the same
user_proxy = autogen.UserProxyAgent(
    name="User_Proxy",
    system_message="A human admin.",
    human_input_mode="NEVER", # trust the bots completely, what could go wrong?
    is_termination_msg=lambda x: "CONVERSATION_TERMINATE" in x.get("content", "").rstrip(),
)

Next up, we need to put our 3 agents and the user proxy into a group chat and add a group chat manager. These 2 items will facilitate the multiway conversation, determining who needs to speak next, passing the messages and handling the other aspects of connecting the agents together. We can also define how long a group chat should go on for, putting a limit to the number of messages that can be passed back and forth so it doesn’t go on forever. It’s important to note that the group chat manager is another type of agent, it uses an llm to determine which agent should go next in the conversation and help it make other decisions.

Creating these in autogen is very straightforward:

# define a group chat with our agents and user proxy, set an empty list for messages for a fresh chat
# and define the maximum number of rounds of conversation
groupchat = autogen.GroupChat(agents=[user_proxy, debate_moderator, pros_advocate, cons_advocate], messages=[], max_round=50)
# now define a group chat manager by passing the group chat it will manage and the llm that
# will power it. 
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

Now we have a manager that is connected to a groupchat with all of our agents. In order to get the conversation flowing we simply initiate a conversation with the manger with our initial message! (Also note in this case the target should be the manager, don’t try to have the manager initiate a conversation!)

Let’s start by asking the chat whether we should order takeout:

user_proxy.initiate_chat(manager, message="Should I order takeout tonight?")

This will start the debate, I won’t post the full transcript of the agent’s conversation as it’s rather long, but here is the final decision from the debate_moderator:

debate_moderator (to chat_manager):
I agree with the view presented by Cons_Advocate. The socio-economic benefit of supporting local business through ordering takeout does depend significantly on the kind of establishments being patronized. Thus, while the point made by Pros_Advocate is valid, it does not unequivocally tilt the scale in favor of ordering takeout.
Given the discussion, considering the factors of cost, health, environmental impact, and potential local economic support, the recommendation would be against ordering takeout tonight.
CONVERSATION_TERMINATE.

Welp, looks like we’re making our own dinner tonight. Regardless, that’s all it takes to setup and initiate multi-agent chats with autogen! The last section will be regarding code generation and execution, one of the more exciting aspects of autogen. This will greatly enhance the capabilities of our agents.

Enhancing Agent Capabilities with Code Generation and Execution

Perhaps the most attractive feature of autogen is it’s code generation and execution abilities. When using autogen it will feed the agents patterns on how to generate code blocks, you can then setup other agents as code executors that can run the code in real time when messages with these code blocks are detected. These code blocks could be python code, but they can also be bash scripts meaning the agent can issue commands to install new python packages as needed to run new code. Not only is it possible to execute code, but the results and output of the code can then be fed back into the agents conversation. This means that errors encountered while running the code can be fed back into the coding agent and corrected. It also means that agents can dynamically create scripts to fetch and process information and have that information available for future decision making. This is a very exciting feature with a lot of implications of use. It can also be incredibly dangerous. As with any system that dynamically runs new unchecked code it can cause major damage to the host system. To help mitigate this autogen also comes with an option to run the code in a docker container instead of on the host system. This way any damage done to the environment is kept contained and won’t affect the larger host system. There’s a lot to unpack and see with this feature, but let’s start with a basic example and show you how to create a coding agent and executor.

For this simple example we’ll have 2 agents, an engineer who writes the code, and an executor that will run the code. We’ll task the engineer with some coding problem and once the problem is solved the engineer will output TERMINATE which will end the session. The agents themselves will be pretty basic with some simple instructions, and we’ll give it a specific tasks of scraping some information from a url. Specifically we’ll give it the url of one of my other medium articles and ask it to fetch the html and scrape the title, author and date published and print it out. The executor will run the code and if it returns the proper values the engineer will terminate the conversation, else it will attempt to revise the code and try again.

Let’s dive into the code, first we’ll make our configs and setup the engineer_agent, these are tasks we are familiar with already so nothing new yet:

import autogen
config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-4"]
    },
)

llm_config = {"config_list": config_list, "seed": 42, "temperature": 0.0}

engineer_agent_prompt = """You are an engineer able to write python code to solve a problem. 
The user will give you a problem to solve and you will need to write code to get the answer. 
The user will run the code and report any problems or errors, you will then create an updated version of your code to address these as they arise. 
If the code executes successfully and returns the needed values output TERMINATE."""
engineer_agent = autogen.AssistantAgent(
    name="engineer_agent_prompt",
    system_message=engineer_agent_prompt,
    llm_config=llm_config,
)

Now we get into the new stuff. The executor will need to be a user_proxy in order to run the code, we’ll give it a new parameter code_execution_config which will tell it where to store the code and other details on how to run the code. NOTE: we’re going to be running the code without a docker container for this example, however if you intend to have the agents do more advanced code work it’s worthwhile to setup docker and connect the agents to a container.

Here’s the executor code:

# to dig a bit more into the code_execution_config, when this agent is prompted it will 
# look at the last_n_messages in the conversation, if it detects a code block in those messages
# it will attempt to execute them. The work_dir is a directory where it will store code as it runs it, 
# and the use_docker is pretty self explanatory, set to false it won't run docker, 
# however if we want to use docker we instead set this value as a string that is the name of the 
# container we want to run the code in.

executor_user_proxy = autogen.UserProxyAgent(
    name="Executor",
    system_message="Executor. Execute the code written by the engineer and report the result.",
    human_input_mode="NEVER",
    code_execution_config={"last_n_messages": 3, "work_dir": "code_dir", "use_docker": False},
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", "").rstrip(),
)

So now we have an agent that’s all prepped to look for code blocks and run them. Any output from the code it runs will be returned as a message to the conversation allowing other agents to see it.

Next we will write an initial message and start a conversation between the executor_user_proxy and the engineer_agent.

# our task is pretty straightforward, go to this url and extract some data
# we also tell the agent we have some common packages installed so it doesn't waste time 
# trying to install them, sometimes the agents will insist on creating bash scripts to install
# everything without verifying if the environment has the packages first, so it's useful to let
# the agent know what it does or does not have at their disposal 
initial_message="""Provided url: https://medium.com/@coldstart_coder/basics-of-the-walrus-operator-in-python-a9b18ca1469c

The provided url is for a medium article. I need you to write a script that will fetch the page, parse the html and print out the title of the article, the author and when it was published. 

I have BeautifulSoup and requests installed so I should be good to go there. 
"""
# start the conversation!
executor_user_proxy.initiate_chat(engineer_agent, message=initial_message)

That’s all it takes. Now the engineer_agent will work to make a script to extract that information. Since gpt-4 isn’t completely deterministic it’s hard to say how long it might take if you run this locally. I’ve had some conversations that went for several turns before arriving at a final conclusion (it wasn’t able to figure out what tags to find in the html), and others that only lasted a single turn. It all just depends on what gpt decides to generate.

Here’s an example of a conversation with these agents, so you can see how the engineer outputs code blocks and what the execution looks like in the conversation:

Executor (to engineer_agent_prompt):
Provided url: https://medium.com/@coldstart_coder/basics-of-the-walrus-operator-in-python-a9b18ca1469c
The provided url is for a medium article. I need you to write a script that will fetch the page, parse the html and print out the title of the article, the author and when it was published.
I have BeautifulSoup and requests installed so I should be good to go there.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
engineer_agent_prompt (to Executor):
Sure, here is a simple script that uses BeautifulSoup and requests to fetch the page and parse the HTML. It then prints out the title of the article, the author, and the date it was published.
```python
import requests
from bs4 import BeautifulSoup
url = “https://medium.com/@coldstart_coder/basics-of-the-walrus-operator-in-python-a9b18ca1469c"
response = requests.get(url)
soup = BeautifulSoup(response.text, ‘html.parser’)
title = soup.find(‘title’).text
author = soup.find(‘meta’, {‘name’: ‘author’}).get(‘content’)
date_published = soup.find(‘meta’, {‘property’: ‘article:published_time’}).get(‘content’)
print(f’Title: {title}’)
print(f’Author: {author}’)
print(f’Date Published: {date_published}’)
```
Please run this script and let me know if you encounter any issues.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)…
Executor (to engineer_agent_prompt):
exitcode: 0 (execution succeeded)
Code output:
Title: Basics of the Walrus Operator in Python | by Coldstart Coder | Nov, 2023 | Medium
Author: Coldstart Coder
Date Published: 2023–11–01T16:16:27.700Z
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
engineer_agent_prompt (to Executor):
TERMINATE
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

That’s the basics of having an agent generate and execute code! That will greatly increase the capabilities of our agents. Cool stuff!

Final Thoughts

Autogen is a new and exciting library that makes it easy to prototype LLM powered agents and teams of agents to solve complex tasks. Like all LLM technologies I find that it’s a bit like herding cats in order to get them to behave the way you want, but it’s always exciting to see in action and satisfying seeing them actually complete their tasks. With the information I presented here you should have enough to get started and begin building agents of your own.

There is a lot more that we can cover with autogen, such as defining functions it can call, overriding the base agent class for more advanced integration, and of course the endless prompting strategies to help an LLM do what you need. But we’ll cover those in future articles. For now I hope you enjoyed reading this article and got something interesting out of it!

If you want to check out more examples of using AutoGen check out their code repository: https://github.com/microsoft/autogen

They have a ton of examples in Jupyter Notebooks that you can try out!

Happy coding!