Code like a pro: A tour of the AutoGen SDK

Published in

Data Science at Microsoft

7 min readJul 23, 2024

As I wrote in my recent earlier article, AutoGen is revolutionizing the tech industry with its innovative capabilities, attracting significant attention for its transformative potential. This powerful tool functions as a software code generator, simplifying the program development process. By allowing developers to automatically generate code autonomously based on specific input criteria, AutoGen enhances productivity while reducing the likelihood of errors.

In this article, we’ll dive deep into the fascinating world of AutoGen, showcasing its primary features through a practical example. AutoGen comes in two versions: AutoGen Studio and the AutoGen Software Development Kit (SDK). In my previous article, we delved into working with AutoGen Studio. Now, get ready for a new journey as we explore the workings of the AutoGen SDK. In this article I share detailed explanations to help make it easier to follow along and implement a functional demo in little time. We will replicate the same example used in the AutoGen Studio article to compare both offerings.

Installation

Installing AutoGen with its SDK in Python is notably easier than setting up AutoGen Studio. The library we use is pyautogen. Type the following in your open Python window:

%pip install pyautogen

Azure OpenAI model definition

To kick things off, let’s set up the details for our model. We’ll be using GPT-4 from the Azure OpenAI Service, and we’ll need to provide some core components: the API key and the base URL. Here are the five fields for you to fill out:

Model: Enter the model name.
API Key: Input your model’s API key.
Base URL: Enter the endpoint of your Azure-deployed resource.
API Type: Specify that this is an Azure model. (It could be an OpenAI model too!)
API Version: Specify the version of the model.

Here’s a quick code snippet to get you started:

import autogen 

config_list = [{ 

        "model": "gpt-4", 

        "api_key": "xxx", 

        "base_url": "https://endpoint-name.openai.azure.com/", 

        "api_type": "azure", 

        "api_version": "2024-02-15-preview" 

},] 

 

llm_config = {"config_list": config_list, "cache_seed": 42}

Agents’ definition

Next, we define the agents that will participate in our setup. We distinguish between two types of agents: User Proxy Agents and Assistant Agents.

The User Proxy acts as the orchestrator, and we need to define the following features:

Agent Name
System Message: This clarifies the exact role of the agent. It’s crucial to be clear here for optimal functionality. For example: “You are a helpful assistant. You have to help chefs agree on a menu that combines two appetizers, two main courses, two desserts, and four drinks from both cuisines.”
Code Configuration: For instance, using last_n_messages to define the number of messages to keep track of.
Human Input Mode: Decide if you want to involve human input (not needed in this example).
Max Consecutive Auto Reply: Specify how many consecutive messages the agent can send.
Termination Message Feature: Provide an output when a termination message is prompted.

Here’s some code for how you can define a User Proxy Agent:

user_proxy = autogen.UserProxyAgent( 

    name="User_proxy", 

    system_message="You are a helpful assistant. You have to help chefs agree in a menu that combines two appetizers, two main courses, two desserts and four drinks of both cuisines. ", 

    code_execution_config={ 

        "last_n_messages": 2, 

        "work_dir": "groupchat", 

        "use_docker": False, 

    },   

    human_input_mode="NEVER", #"TERMINATE", 

    max_consecutive_auto_reply=10, 

    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE"), 

)

The rest of the agents follow a similar structure, defined by their names, system messages, and the model configuration (llm_config):

mexican_chef = autogen.AssistantAgent( 

    name="Mexican_Chef", 

    system_message="You are a Mexican chef who is an expert in tacos, burritos, tortillas, and margaritas, and knows every recipe that is from Mexico. You Speak Spanish so you can greet in your language. You provide ideas about different dishes that can be offered, and you discuss with the Italian chef about which dish you prefer. Afterward, you both have to agree. If you have doubts about what to offer, look for information on the Internet.", 

    llm_config=llm_config, 

) 

italian_chef = autogen.AssistantAgent( 

    name="Italian_Chef", 

    system_message="You are a well-known Italian chef who is an expert in pizza, pasta, and gelato (ice cream), and knows every recipe that is italian focused. You Speak Italian so you can say some typical words in your language. You provide ideas about different dishes that can be offered, and you discuss with the Mexican chef about which dish you prefer. Afterward, you both have to agree. If you have doubts about what to offer, look for information on the Internet. ", 

    llm_config=llm_config, 

) 

image_assistant = autogen.AssistantAgent( 

    name="Image_assistant", 

    system_message="You know how to code. At the end of the conversation, when the chefs have agreed on the menu, you are going to use the create_menu_img function from among the Skills to create an image; this function will generate the image based on the provided prompt, and the provided prompt is the agreed menu. ONLY use the create_menu_img skill. ", 

    llm_config=llm_config, 

)

Group Chat and Manager: Bringing it all together

Next up, we set the stage for our dynamic group chat, a pivotal component akin to the workflow at AutoGen Studio. This chat orchestrates the collaboration and coordination among our virtual agents. Here’s the rundown of the three key parameters:

Agents: This includes four entities — both user proxies and assistant bots.
Messages: These define the roles and responsibilities for coordination within the group.
Max_round: This sets the cap on the number of iterations the group chat can go through before the conversation concludes.

The group chat manager seamlessly integrates the group chat’s framework with the model configuration (llm_config), ensuring everything runs smoothly.

groupchat = autogen.GroupChat(agents=[user_proxy, mexican_chef, italian_chef, image_assistant], messages=["You are a helpful assistant skilled at coordinating a group of other assistants to solve a task. When a chef assistant writes you a menu proposal, you have to share it with the other chef assistant to ask him if he agrees with the initial proposal and ask how he would improve it. Iterate several times until the resulting menu combines dishes that are both Mexican and Italian, and decide a final menu of two appetizers, two main courses, two desserts, and four drinks. Provide the final decision as a text output. At the end, provide image_assistant the menu chosen as a prompt. After creating the image, terminate the conversation and exit. "], max_round=12) 

manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

Skills

You can craft any number of skills that you desire with Python code. In this instance, we’ll be illustrating how to construct the same skill as with Studio: a function that generates an image from a menu definition, or in technical terms, the ‘create_menu_img’. This particular function utilizes DALL·E 3 from the Azure OpenAI Service.

The code we are using is straightforward, and there are just two elements you may need to tweak if you wish to employ this code yourself. These are azure_endpoint, where you input the Base URL of your Azure OpenAI Service resource, and its corresponding key (api_key).

There’s a slight deviation in the process when using the SDK versus the Studio. Here, in the SDK, it’s necessary to register the skill functions and determine who will act as the caller and the executor. In contrast, the Studio flips this process; upon defining an agent, you connect the skills that the agent will utilize. Intriguing, isn’t it?

import os 

from openai import AzureOpenAI 

import json 

import requests 

from datetime import datetime 

from typing_extensions import Annotated 

def create_menu_img(prompt: Annotated[str,"Use DALLE3 to create an image of the menu."]) -> str: 

    client = AzureOpenAI( 

        api_version="2023-12-01-preview", 

        azure_endpoint="https://endpoint-name.openai.azure.com/", 

        api_key="xxx", 

    ) 

    result = client.images.generate(model="dall-e-3", prompt=prompt, n=1) 

    image_url = json.loads(result.model_dump_json())['data'][0]['url'] 

    img_response = requests.get(image_url) 

    if img_response.status_code == 200: 

        file_path = "./"+str(datetime.now()).replace(" ", "").replace("-","").replace(":","").replace(".","")+".jpg" 

        with open(file_path, "wb") as img_file: 

            img_file.write(img_response.content) 

            print(f"Image saved to {file_path}") 

    else: 

        print(f"Failed to download the image from {image_url}") 

    return image_url 

  

autogen.agentchat.register_function( 

    create_menu_img, 

    caller=image_assistant, 

    executor=image_assistant, 

    name="create_menu_img", 

    description="create an image", 

)

Playground

Once all the elements are precisely calibrated, it’s time for the grand finale — executing the piece, with the manager and the sparking message in the spotlight:

user_proxy.initiate_chat( 

    manager, message="Create a menu for an event. " 

)

Now, let’s see the results obtained after executing the last line of script. The following screenshots are the output of the execution. This code was executed on a Microsoft Fabric notebook powered by Spark.

The following images are examples of the ones generated by the code above:

Alongside the engaging dialogue, the code reveals intriguing insights about the costs associated with the use of the GPT4 model, specifically in the context of tokens.

cost=({'total_cost': 0.76824, 'gpt-4': {'cost': 0.76824, 'prompt_tokens': 25544, 'completion_tokens': 32, 'total_tokens': 25576}}, {'total_cost': 0.76824, 'gpt-4': {'cost': 0.76824, 'prompt_tokens': 25544, 'completion_tokens': 32, 'total_tokens': 25576}}), human_input=[])

Conclusion

As our journey into the world of the AutoGen SDK comes to a close, I hope you’ve found this exploration both enlightening and inspiring. Remember, the power of coding lies in your hands, and with it, you can bring digital dreams to life. So, continue to experiment, continue to innovate, and most importantly, continue to create. Until next time, happy coding!

I would be happy to hear your ideas about how to use AutoGen to solve real use cases. You can leave comments in the Comments section associated with this article.

Eva Jurado Cortés is on LinkedIn.