Code cuisine: Whipping up gourmet menus with AutoGen

Published in

Data Science at Microsoft

10 min readJul 16, 2024

AutoGen is an innovative technology that has been gaining attention in the tech world due to its many useful applications. A software code generator designed to simplify the process of creating programs and generate multiagent flows, it enables developers to automatically generate code based on certain input parameters, thereby increasing efficiency and reducing the likelihood of errors.

This article aims to dive deeper into the world of AutoGen, exploring key features following an example. Whether you’re a seasoned developer or a novice in the field, an understanding of AutoGen may enhance your coding skills and even potentially transform your programming approach. AutoGen is available in two flavors: AutoGen Studio and its Software Development Kit (SDK). Whether you’re a seasoned pro or a newbie, these two options provide a tailored approach based on needs and skill levels.

In this article, I take you on an in-depth journey into some of the workings of AutoGen Studio. I take you through the lines of code I’ve crafted, accompanied by screenshots to help guide you through. My goal is to help you be up and running with a working demo in short order.

Installation

Ready to kickstart your journey with AutoGen Studio? Start with these steps:

First, open the command line window on your laptop.

Second, install the AutoGen Studio library by typing the following in the command line window:

pip install autogenstudio

Third, navigate to the folder where AutoGen Studio is located. Getting there could involve typing commands that look something like this, depending on your version of Python:

cd C:\Users\<username>\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\Scripts 

cd C:\Users\<username>\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\Scripts

Fourth, enter the command autogenstudio ui --port 8081 to enable AutoGen Studio on this specific port.

Last, in your browser, navigate to: http://localhost:8081/

Voilà! You’re ready to start coding with AutoGen Studio.

Working in the AutoGen Studio

To make this explanation easier to follow, I’m going to show you how to create an AutoGen system from scratch that has one purpose, namely, to create a menu of food offerings for an event.

Imagine we’re hosting an event and need to put together a tantalizing menu. But here’s what’s special — we’ll be using AutoGen to do it! Our team consists of four agents: two culinary wizards specializing in Italian and Mexican cuisine, a mediator to help them agree on a menu, and an image assistant who’ll craft an enticing image to advertise our event. Sounds exciting, right?

Once we’re in AutoGen Studio, we’ll see three areas: Build, Playground, and Gallery.

The Build area is where all the magic happens. Here, we’ll define the logic of our process across four tabs: Skills, Models, Agents, and Workflows.

Skills are like superpowers we can bestow upon our agents, enabling them to perform smart actions like generating code or creating images. Models are sets of Language Learning Models (LLMs) that allow our agents to communicate. Agents are the roles participating in our AutoGen conversation, while Workflows tie everything together, building a cohesive conversation.

Models

Let’s start by exploring the Models tab. This is where we can create as many models as we wish. For this example, we’re going to use GPT-4 from the Azure OpenAI Service. To do it, we need to fill in five fields:

Model: Here, I input the model’s name.
API key: This is where the model key goes.
Base URL: I input the endpoint of the resource I’ve already deployed on Azure.
API type: I specify that this is an Azure model (it could be an OpenAI model too!).
API version: I specify the version of the model.

Skills

Our next stop is the Skills tab. This is where we can unleash our creativity! We can create as many skills as we fancy, all in Python code. For this example, I need to create one skill — a function that creates an image based on the menu definition (create_menu_img). This function is going to use DALL·E 3 from the Azure OpenAI Service. Then, I specify that after creating the image, it is saved on my laptop in the Downloads folder.

The code to use is as follows — the three things you may need to update if you want to use this code include the azure_endpoint, where you have to set the base URL of your Azure OpenAI Service resource; its key (api_key); and the path where you want to keep the image (file_path):

import os 

from openai import AzureOpenAI 

import json 

import requests 

from datetime import datetime 

  

def create_menu_img(prompt): 

    client = AzureOpenAI( 

        api_version="2023-12-01-preview", 

        azure_endpoint="https://aoai_resource.openai.azure.com/", 

        api_key="xxx", 

    ) 

    result = client.images.generate(model="dall-e-3", prompt=prompt, n=1) 

    image_url = json.loads(result.model_dump_json())['data'][0]['url'] 

    img_response = requests.get(image_url) 

    if img_response.status_code == 200: 

        file_path = "C:/Users/evaju/Downloads/"+str(datetime.now()).replace(" ", "").replace("-","").replace(":","").replace(".","")+".jpg" 

        with open(file_path, "wb") as img_file: 

            img_file.write(img_response.content) 

            print(f"Image saved to {file_path}") 

    else: 

        print(f"Failed to download the image from {image_url}") 

    return image_url

My skills are also saved locally at C:\Users\<user_name>\AppData\Local\miniconda3\Lib\site-packages\autogenstudio\web\files\user\1xxx5\scratch\skills.py

Agents

Next, we hop over to the Agents tab, where we configure our four agents. We start with the first agent, the conversation orchestrator named ‘userproxy’. There are seven fields to fill to complete its specification:

1. Agent Name

2. Agent Description

3. Max Consecutive Auto Reply: How many consecutive messages it may send before terminating the conversation

4. Human Input Mode: If you may want to put a human in the loop

5. System Message: This clarifies the exact role of the agent; it is probably the most important part of the agent’s definition, so the clearer you are, the better the functionality

You can copy the System Message of the example from here:

You are a helpful assistant. You have to help chefs agree on a menu that combines two appetizers, two main courses, two desserts, and four drinks of both their cuisines.

6. Model: The LLM model that is going to be used. In this example, we are going to use the model that previously has been created, GPT-4 32K.

7. Skills: any additional functionality that the agent could use. We only have one existing skill, create_menu_img.

The Italian chef agent (italian_chef) specification is similar to userproxy, the main difference being around its System Message — it has been tailored to have a clear focus on its role, so it may also speak in Italian and it is going to push to put Italian dishes on the menu but always in agreement with the other chef.

System Message: You are a well-known Italian chef who is an expert in pizza, pasta, and gelato (ice cream), and who knows every recipe that is Italian focused. You speak Italian so you can say some typical words in the language. You provide ideas about different dishes that can be offered, and you discuss with the Mexican chef about which dish you prefer. Afterward, you both have to agree.

The same type of thing happens with the Mexican chef (mexican_chef).

System Message: You are a Mexican chef who is an expert in tacos, burritos, tortillas, and margaritas, and knows every recipe that is from Mexico. You speak Spanish so you can greet in the language. You provide ideas about different dishes that can be offered, and you discuss with the Italian chef about which dish you prefer. Afterward, you both have to agree.

The last agent is a bit different: It is responsible for creating the menu image (image_assistant) and its system message is oriented to use the create_menu_img skill and use the menu as input prompt.

System Message: You know how to code. At the end of the conversation, when the chefs have agreed on the menu, you are going to use the create_menu_img function from the skills to create an image, and this function will generate the image based on the provided prompt; the provided prompt is the agreed menu. ONLY use the create_menu_img skill.

Workflows

Finally, we create a workflow to wrap everything together. In the Workflows tab, we can see every workflow we’ve created and make a new one simply by clicking “New Workflow” and selecting “Group Chat” because we’re working with multiple agents.

For the workflow, you provide five items:

Workflow Name: A name for the workflow
Workflow Description: A detailed description of the purpose you want to achieve with the workflow. In this example, the description is: You take a request for creating a recipe, you ask one agent for the recipe, then you pass the answer to the other agent and repeat this process several times. Create an image of the final menu.
Summary Method: This sets how you want to summarize the whole conversation. In the example, we are using an LLM as a summarization method.
Sender: This is the agent that starts the conversation and controls it, namely the conversation orchestrator, our userproxy.
Receiver: The set of agents that are going to interact. Here we create an agents’ group.

Workflow Specification:

The Agents’ group has nine fields:

Group Chat Agents: We include every agent except for the orchestrator (userproxy).
Speaker Selection Method: This field could be Auto, Round Robin, or Random. It sets the order of agents talking.
Agent Name: The agents’ group name.
Agents Description: The description of the agent (not relevant).
Max Consecutive Auto Reply, Human Input Mode, Model, and Skills: These are the same as before.

System Message: This explains the role of this set of agents as a group. You are a helpful assistant skilled at coordinating a group of other assistants to solve a task. When a chef assistant writes you a menu proposal, you have to share it with the other chef assistants to ask him if he agrees with the initial proposal and ask how he would improve it. Iterate several times until the resulting menu combines dishes that are both Mexican and Italian, deciding on a final menu of two appetizers, two main courses, two desserts, and four drinks. Provide the final decision as a text output. At the end, provide image_assistant the menu chosen as a prompt.

Now you have configured your AutoGen system! We’re ready to play with it, so we navigate to the Playground tab.

At Playground, on the left, you see previous sessions that you have executed at the playground. At the central area, you can spark the conversation. You are just going to watch it as the human is out of the loop here. Just type: You have to come up with a menu for an event and see the magic.

Playground

Once we’ve configured our AutoGen system, it’s time for the fun part — seeing it in action! We head over to the Playground tab and start the conversation. You can see how the conversation unfolds, with both chefs interacting to finalize a mouth-watering menu and then the ‘image_assistant’ creating a prompt in Python for the menu and executing the ‘create_menu_img’ function. The best part? This conversation can be executed as many times as you want, resulting in a different response and a unique set of images each time! Let’s explore some of the images that my AutoGen has generated:

Conclusion

And that’s a wrap on our journey exploring the dynamic world of AutoGen! As we’ve seen, this compelling tool not only streamlines the coding process but also injects a dose of fun and creativity into the mix. Whether you’re a seasoned pro or a coding newbie, AutoGen offers an approach tailored for you. From creating a gastronomic menu with our virtual chefs to crafting an eye-catching image with our image assistant, AutoGen expands the horizons of what we can accomplish. And remember, the only limit here is your imagination!

With its ability to adapt and grow, I believe that AutoGen is not just a tool for today but a companion for the future. I think it’s exciting to ponder the possibilities and innovative solutions that lie ahead with AutoGen in my toolkit.

So, go ahead, step into the world of AutoGen, and let the magic unfold!

I would be happy to hear your ideas about how to use AutoGen to solve real use cases. You can leave comments in the Comments section associated with this article.

Eva Jurado Cortés is on LinkedIn.