HospitalGPT: Managing a patient population with Autogen powered by GPT-4 & Mixtral 8x7B

13 min readFeb 4, 2024

I believe that AI agents will play a crucial role in solving complex problems in the future. With this perspective, I aimed to develop a multi-agent solution tailored for healthcare, specifically focusing on managing a cohort of patients and utilizing proactive outreach to improve population health outcomes.

Problem Statement

As healthcare organizations shift towards value-based care models, there is a growing emphasis on proactive outreach to minimize downstream acute care. Population health teams are tasked with identifying and implementing proactive outreach programs to mitigate the high costs associated with downstream care. An example of such an initiative might involve identifying a group of patients at high risk of colon cancer who would benefit from colonoscopy screening.

These population health teams consist of multidisciplinary professionals, including epidemiologists, data analysts, and outreach workers collaborating across a campaign. Given the collaborative nature of this work, I felt like it was a great use case for multi-agent solution.

Solution

In my proposed solution, I established a multi-disciplinary team that collaborates to achieve the overarching goal.

Overview of the agents involved in the process

The process works as follows:

A human user initiates the script by describing the high level goal. For instance, ‘Send an outreach email to all patients who might need to schedule a colonoscopy screening’. The group chat is orchestrated by a ‘Manager’ agent.
In the planning stage, a dedicated Planner agent translates the high-level goal into a detailed plan, considering the expertise of other team members. A ‘Critic’ agent reviews the plan to ensure its coherence, enhancing overall results.
Following the plan’s definition, the Epidemiologist agent establishes criteria for identifying a patient cohort for outreach, such as ‘Patients aged 50 to 70 with Osteoporosis.’
Next, a data analyst writes code to query a patient repository (FHIR API server) based on the criteria set by the Epidemiologist. An ‘Executor’ agent is responsible for running the code, minimizing errors. The analyst returns a list of patients and some demographic information.
Finally, we pass the list of patients to an ‘Outreach’ agent who creates personalized emails for each individual to schedule their appointment.

Implementation

GitHub - micklynch/hospitalgpt

Contribute to micklynch/hospitalgpt development by creating an account on GitHub.

github.com

I implemented this project using Autogen, a framework developed by Microsoft to enable multi-agent applications. I won’t go into the details of setting up Autogen in this article as there are many excellent examples and tutorials available.

One specific implementation detail of my project is that I use different models for different tasks. Generally, handling multi-agent conversation requires a high degree of logic and is better handled by larger and more powerful models like OpenAI’s GPT-4. However, some of the downstream tasks, such as generating the outreach email can easily be handled by less powerful, and importantly, less expensive, models. So in my implementation, I use two models:

GPT-4 from OpenAI
$0.03c/1k tokens ingress & $0.06c/1k tokens egress
Mixtral 8x7B model from Mistral.AI (hosted on deepinfra.com)
$0.0003/1k tokens ingress & $0.0003c/1k tokens egress

I set up both models in the OAI_CONFIG_LIST file as follows:

The implementation script below orchestrates these models and manages various configurations, including caching past answers to minimize costs.

The user_proxy is the human agent which initiates the chat and may be required for subsequent details or validation during the conversation. The remaining prompts for the other agents are pretty self-explanatory.

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager, config_list_from_json

openai_config_list = config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-4-0613"],
    },
)

mixtral_config_list = config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["mistralai/Mixtral-8x7B-Instruct-v0.1"],
    },
)

gpt4_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0,
    "config_list": openai_config_list,
    "timeout": 120,
}

mixtral_config = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0,
    "config_list": mixtral_config_list,
    "timeout": 120,
}

user_proxy = UserProxyAgent(
   name="Admin",
   system_message="A human admin who will define the condition that the hospital planner needs to screen for",
    max_consecutive_auto_reply=15,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "groupchat",
        "use_docker": False,  # set to True or image name like "python:3" to use docker
    },
    human_input_mode="ALWAYS",
)

planner = AssistantAgent(
    name="hospital_planner",
    system_message="""
    Hospital administrator.

    Suggest a plan. Revise the plan based on feedback from admin and critic, until admin approval.
    The plan may involve an epidemiologist who defines the patient criteria to solve target outreach.
    The data analyst who can write code and an executor who will run the code to output a list of patients.
    An outreach assistant who can take the list of patients and write personalized messages to them.
    Explain the plan first. Be clear which step is performed by the epidemiologist, data analyst, executor
    and outreach admin.
    """,
    llm_config=gpt4_config,
)

epidemiologist = AssistantAgent(
    name="epidemiologist",
    system_message="""
    Epidemiologist. You are an expert in the healthcare system. You can help the planner to define which patients 
    that should receive outreach. Define the criteria based on their demographics, medications and past conditions. 
    When you have the criteria defined pass these onto the data analyst to write and execute code to search within
    a FHIR R4 API server for patients that match the criteria. 
    """,
    llm_config=gpt4_config,
)

data_analyst = AssistantAgent(
    name="data_analyst",
    system_message="""
    Data analyst. You are an expert in the healthcare data systems and FHIR standards. 

    You write python/shell code to find patients in a FHIR R4 API server that match the criteria defined by the epidemiologist.
    The FHIR API server URL is https://hapi.fhir.org/baseR4/.
    Wrap the code in a code block that specifies the script type. 
    The user can't modify your code. So do not suggest incomplete code which requires others to modify. 
    Don't use a code block if it's not intended to be executed by the executor.
    Don't include multiple code blocks in one response. Do not ask others to copy and paste the result. 
    Check the execution result returned by the executor.
    If the result indicates there is an error, fix the error and output the code again. 
    Suggest the full code instead of partial code or code changes. 
    If the error can't be fixed or if the task is not solved even after the code is executed successfully, 
    analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try. 
    Save the output to a file called patients.csv.
    """,
    llm_config=gpt4_config,
)

executor = UserProxyAgent(
    name="Executor",
    system_message="""
        Executor. Execute the code written by the data analyst and report the result.
    """,
    human_input_mode="NEVER",
    code_execution_config={"last_n_messages": 3, "work_dir": "groupchat"},
)

outreach_admin = AssistantAgent(
    name="outreach_admin",
    system_message="""
    Outreach administrator. You are an expert in the healthcare system. You take the list of patients the patients.csv file and create a personalized 
    email to send to each patient. 
    You output a csv file called out.csv which contains the patient ids, names, email addresses and the text of the email you just created.
    """,
    llm_config=mixtral_config,
)

critic = AssistantAgent(
    name="Critic",
    system_message="""
    Critic. Double check plan, claims, code from other agents and provide feedback. 
    Check whether the plan includes adding verifiable info such as source URL.
    """,
    llm_config=gpt4_config,
)

# Create groupchat
groupchat = GroupChat(
    agents=[user_proxy, planner, epidemiologist, data_analyst, executor, outreach_admin, critic], messages=[])
manager = GroupChatManager(groupchat=groupchat, llm_config=gpt4_config)


user_proxy.initiate_chat(
    manager,
    message="""
    Contact all the patients that need a colonoscopy screening.
    """,
)

While the initial implementation showed promise, there were notable shortcomings, particularly in the data analyst agent’s struggle with creating valid scripts for data retrieval from the FHIR server. This maybe because there are very few resources on FHIR standards in it’s training sets.

You can see an example output here.

Improved Solution

To address these challenges, I ‘hand-wrote’ a separate function to get patients from that server based on very specific ages and conditions. This function is then provided to the data analyst as a Tool that they can use. This implementation also separates out the group chat from the data analysts conversation and writing the outreach emails.

The code is below:

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager, config_list_from_json
import requests
import datetime
from dateutil.relativedelta import relativedelta
from typing import List, Optional, Dict, Union
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

# Here is the OpenAI details that will be used for the group chat. We use GPT-4 as you need 
# a powerful model to handle a complex conversation
openai_config_list = config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-4"],
    },
)

# Here is the Mixtral details that will be used for generating the emails to users. 
# We use Mixtral as it is much cheaper and can easily handle the task of writing an email.
openai_client = OpenAI(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="https://api.deepinfra.com/v1/openai",
)

# This is where we store the list of patient details that are
# returned from the FHIR server. We use this list when generating outreach emails
patients = []


"""
STEP 1: 
Define the cohort criteria. This involves a group chat between the admin and the epidemiologist.
We also include a critic who will review the criteria and ensure it meets the required fields.
"""
def define_cohort_information(target_cohort) -> str:
    gpt4_config_define = {
        "cache_seed": 42,  # change the cache_seed for different trials
        "temperature": 0,
        "config_list": openai_config_list,
        "timeout": 120,
    }

    user_proxy = UserProxyAgent(
        name="Admin",
        max_consecutive_auto_reply=3,
        is_termination_msg=lambda x: x.get("content", "") and x.get(
            "content", "").rstrip().endswith("TERMINATE"),
        human_input_mode="TERMINATE",
    )

    critic = AssistantAgent(
        name="Critic",
        system_message="""
        Critic. I will review the criteria defined by the epidemiologist and ensure it meets the required fields.
        The required fields min age, max age and previous conditions.
        You must add TERMINATE to the end of your reply.
        """,
        llm_config=gpt4_config_define,
    )

    epidemiologist = AssistantAgent(
        name="epidemiologist",
        system_message="""
        Epidemiologist. You are an expert in the healthcare system. You define the criteria to target patients
        for outreach. The criteria must be min age, max age and any previous conditions. The conditions must be
        in the format of snowmed display names e.g. Osteoporsis (Disorder), Acute bronchitis, Hyperglycemia.
        
        You must add TERMINATE to the end of your reply.

        Examples:
        - Patients aged 50 to 70 with Osteoporsis. TERMINATE
        - Male patients aged between 100 and 120 with Myocardial disease. TERMINATE
        - Patients aged between 80 and 100 with Acute bronchitis. TERMINATE
        """,
        llm_config=gpt4_config_define,
    )


    # Create groupchat
    groupchat = GroupChat(
        agents=[user_proxy, epidemiologist, critic], messages=[])
    manager = GroupChatManager(groupchat=groupchat, llm_config=gpt4_config_define)


    user_proxy.initiate_chat(
        manager,
        message=target_cohort,
    )

    user_proxy.stop_reply_at_receive(manager)
    user_proxy.send(
        "Give me criteria that was just defined again, return only the criteria and add TERMINATE to the end of the message", manager)
    
    return user_proxy.last_message()["content"]

"""
STEP 2: 
Once the definition of the cohort criteria is complete, we can start the data analysis. This
involves using a defined function to search for patients within a FHIR R4 API server.
"""
def find_patients(criteria: str) -> None:
    gpt4_config_data = {
    "cache_seed": 42,  # change the cache_seed for different trials
    "temperature": 0,
    "functions": [
        {
            "name": "get_patients_between_ages_and_condition",
            "description":     '''
            Fetches and returns a list of patients from a specified FHIR R4 API endpoint based on the patients' age range and condition.
            
            Returns:
            An array of dictionary where each dictionary represents a patient and contains the patient's full name, age, MRN, email address, and condition.

            Example usage:
            >>> get_patients_between_ages_and_condition(50, 70, "Myocardial")
            This will return all patients who are between the ages of 50 and 70 (inclusive) and who have a condition with the name containing 'Myocardial'.
            ''',
            "parameters": {
                    "type": "object",
                    "properties": {
                        "min_age": {
                            "type": "number",
                            "description": "The minimum age to filter patients by. It returns only patients older than or equal to this age.",
                        },
                        "max_age": {
                            "type": "number",
                            "description": "The maximum age to filter patients by. It returns only patients younger than this age.",
                        },
                        "condition": {
                            "type": "string",
                            "description": "The specific health condition to filter patients by. It returns only patients who have this condition.",
                        }
                    },
                "required": ["min_age, max_age, condition"],
            },
        }
    ],
    "config_list": openai_config_list,
    "timeout": 120,
}
    user_proxy = UserProxyAgent(
        name="User_proxy",
        max_consecutive_auto_reply=3,
        code_execution_config={"last_n_messages": 2, "work_dir": "coding", "use_docker": False},
        is_termination_msg=lambda x: x.get("content", "") and x.get(
                "content", "").rstrip().endswith("TERMINATE"),
        human_input_mode="TERMINATE",
        function_map={
                "get_patients_between_ages_and_condition": get_patients_between_ages_and_condition,
        },
        )


    data_analyst = AssistantAgent(
        name="data_analyst",
        system_message="""
        Data analyst. Only use the function you have been provided with. Find all the
        patients that match the defined criteria. 
        You must add TERMINATE to the end of your reply.
        """,
        llm_config=gpt4_config_data,
    )

    #user_proxy.initiate_chat(
    #   data_analyst, message=f"{cohort_definition}")

    ## Injecting in some text in order to find an actual patient to match the criteria
    user_proxy.initiate_chat(
    data_analyst, message=f"patients aged between 100 and 105 with Hyperglycemia")

    return

"""
STEP 2.1: 
This is a helper function to get the patient details from a FHIR R4 API server based
on the patient's birthdate and the condition name.
This function is used by the data analyst.
"""
def get_patients_between_ages_and_condition(min_age: int, max_age: int, condition: str) -> List[Dict[str, Union[str, int, None]]]:
    today = datetime.date.today()
    max_birthdate = today - relativedelta(years=min_age)
    max_birthdate = max_birthdate.strftime('%Y-%m-%d')

    min_birthdate = today - relativedelta(years=max_age + 1)
    min_birthdate = min_birthdate.strftime('%Y-%m-%d')

    # Get conditions
    r = requests.get(f'https://hapi.fhir.org/baseR4/Condition?_pretty=true&subject.birthdate=le{max_birthdate}&subject.birthdate=gt{min_birthdate}')
    conditions = r.json()
        
    # Check if conditions is empty
    if 'entry' not in conditions or not conditions['entry']:
        # Handle the empty conditions case, you can return an empty list or raise an exception
        return "No patients match the given criteria"
    
    # Filter out the conditions where the 'code' key does not exist
    entries = [entry for entry in conditions['entry'] if 'code' in entry['resource']]
    # Filter by condition
    filtered_conditions = [entry for entry in entries
        if any(condition.lower() in cond['display'].lower() for cond in entry['resource']['code']['coding'])]
    # Get patient data for each condition
    for cond in filtered_conditions:
      patient_id = cond['resource']['subject']['reference'].split('/')[1]
      r = requests.get(f'https://hapi.fhir.org/baseR4/Patient/{patient_id}?_pretty=true')
      patient = r.json()

      if 'telecom' in patient and 'maritalStatus' in patient:
          full_name = patient['name'][0]['given'][0] + " " + patient['name'][0]['family']
          email = next((t['value'] for t in patient['telecom'] if t['system'] == 'email'), "test@test.com")

          # Check for 'type' key in each identifier before accessing its value
          mrn = next((i['value'] for i in patient['identifier'] if 'type' in i and i['type']['text'] == 'Medical Record Number'), None)
          patient_age = relativedelta(datetime.datetime.now(), datetime.datetime.strptime(patient['birthDate'], '%Y-%m-%d')).years
          patient_condition = cond['resource']['code']['coding'][0]['display']
          # Get postal code
          postal_code = patient['address'][0]['postalCode'] if 'address' in patient and patient['address'] else None

          patients.append(
              {
                  'patient_url': f"https://hapi.fhir.org/baseR4/Patient/{patient_id}?_pretty=true",
                  'full_name': full_name,
                  'age': patient_age,
                  'postal_code': postal_code,
                  'MRN': mrn,
                  'email': email,
                  'condition': patient_condition
              }
          )
    return patients

"""
STEP 3: 
This is a function which generates the emails for the patients.
"""
def write_outreach_emails(patient_details: List, user_proposal: str) -> None:
    MODEL_DI = "mistralai/Mixtral-8x7B-Instruct-v0.1"
    # Check if patient_details has any values before continuing
    if not patient_details:
        print("No patients found")
        return
    for patient in patient_details:
        chat_completion = openai_client.chat.completions.create(
            model=MODEL_DI,
            messages=[{"role": "user", "content": "Write an email to the patient named " + patient["full_name"] + 
                       " to arrange a screening based on the following  " +user_proposal+ 
                       "because they have previously had " + patient["condition"] + "."}],
            stream=False,
            # top_p=0.5,
        )

        # Write each individuals email into a text file. Call the file with the patient MRN and add the name, MRN, postcode and email
        # address into the start of the text file
        with open(f"{patient['MRN']}.txt", "w") as f:
            f.write(f"Name: {patient['full_name']}\n")
            f.write(f"MRN: {patient['MRN']}\n")
            f.write(f"Postcode: {patient['postal_code']}\n")
            f.write(f"Email: {patient['email']}\n")
            f.write("\n")
            f.write(f"Patient: {patient['patient_url']}\n")
            f.write(chat_completion.choices[0].message.content)
            f.write("\n")
            f.write("-----------------------------------------")

        
        
    return


# Define the diagnostic screening we wish to perform
user_proposal = "Find patients for colonoscopy screening"
# Define the cohort information based on the user's proposal
criteria_definition = define_cohort_information(user_proposal)
# Find the patients based on the criteria
find_patients(criteria_definition)
# Write the emails to the patients
write_outreach_emails(patients, user_proposal)

The script has four main parts:

Problem Statement
This is the sole input to the program from the User Agent (me!). In this scenario, we simple ask, “Find patients for a colonoscopy screening”
Defining the Criteria (STEP 1 in the code)
Based on an Autogen group chat between a Epidemiologist and a Critic, the agents define the criteria used to select the cohort of patients that should be targeted for outreach
Data Access (STEPS 2 & 2.1 in the code)
This uses an agent with access to a pre-defined tool called `get_patients_between_ages_and_condition` that returns a list of patient information based on the criteria defined by the previous agents
Outreach emails (STEP 3 in the code)
This agent writes a personalized message to each

The output from this script is a text file which contains relevant information about the patient and also the body of the personalized email that can be sent. Here is an example of an output file:

Name: Normand813 Zemlak964
MRN: 28ea39f3-a07b-457b-b12d-e35c8bd2d1dd
Postcode: 01545
Email: test@test.com
Patient: https://hapi.fhir.org/baseR4/Patient/9689767?_pretty=true
Subject: Important Colonoscopy Screening for You
Dear Normand813 Zemlak964,
I hope this email finds you well. I am writing to inform you about an important preventive health measure that is recommended for individuals who have a history of hyperglycemia, a disorder characterized by high blood sugar levels.
Given your previous history of hyperglycemia, I would like to schedule a colonoscopy screening for you. This screening procedure is essential for the early detection and prevention of colorectal cancer, one of the most common types of cancer. It is recommended for individuals over the age of 50, or earlier if there is a family history of colorectal cancer or other risk factors.
The colonoscopy screening procedure involves the examination of the colon and rectum using a thin, flexible tube with a camera attached to it. The procedure usually takes about 30 minutes and is performed under sedation to ensure your comfort. The preparation for the procedure involves a clear liquid diet and taking a laxative to clean out the colon.
Colonoscopy screening can detect polyps, which are growths in the colon that can become cancerous over time. If any polyps are found during the procedure, they can be easily removed, preventing the development of colorectal cancer.
I would like to schedule your colonoscopy screening as soon as possible. Please let me know if you have any questions or concerns, or if you would like to schedule a consultation with me to discuss the procedure in more detail.
Thank you for your attention to this important matter. I look forward to hearing from you soon.
Best regards,
[Your Name]
[Your Title]
[Your Contact Information]
— — — — — — — — — — — — — — — — — — — — -

This implementation definitely improved the performance — at the expense of requiring more detailed design and implementation of the solution. While the group-chat worked well most of the time, there were many occasions where the epidemiologist and critic would disagree and the resulting criteria would become rather random (IMH‘non-clinical’O).

Conclusion

I firmly believe that leveraging multiple AI agents collaboratively is pivotal for addressing complex tasks. By combining models based on their strengths, specialized agents akin to real-life multidisciplinary teams can be created.

Although Autogen is a promising framework, it is still evolving, and there is ample room for improvement. The same applies to powerful language models like GPT-4, which, while impressive, may face challenges in certain tasks. To be fair to the model, I believe a skilled prompter could more effectively refine the objectives and assist in constraining the conversation to a greater extent.

I look forward to witnessing the ongoing advancements in frameworks like Autogen and refining the capabilities of large language models in collaborative scenarios.

References

Official AutoGen Documentation: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
A very helpful tutoria from @AIJasonZ which inspired this work: https://youtu.be/Bq-0ClZttc8
My repo on Github: https://github.com/micklynch/hospitalgpt

About me

LinkedIn: https://www.linkedin.com/in/michaellynchphd