Comparing OpenCopilot to LangChain’s Plan and Execute Agent

Published in

Engineering@Incorta

7 min readJun 15, 2023

A few days ago, we announced a new AI-driven Copilot Agent called OpenCopilot. Designed to offer an efficient and controllable framework for software companies, OpenCopilot seamlessly integrates with a company’s software ecosystem and performs complex tasks with minimal guidance.

In this article, we will compare OpenCopilot to LangChain’s Plan and Execute Agent, a similarly purposed AI agent that focuses on planning tasks before executing them. The Plan and Execute Agent is based on the “Plan and Solve” paper authored by Lei Wang et al. in May 2023.

A Copilot for a SQL Database

Imagine you want to develop a copilot for a SQL database. The database contains a set of tables, and you want to ask questions about the data stored within them. In this scenario, we will provide the large language model (LLM) with two tools, one that searches for relevant tables using some semantic search engine, and another one that executes SQL query:

RelevantTables: Get the most relevant tables for 
  a given query.
SQLDatabase:  Executes a sql query on the SQL DB.

The question will be: How many employees are there?

Throughout this article, our primary focus will be on the prompts exchanged with the LLM model rather than the underlying code. We will be utilizing ChatGPT-3.5-Turbo with a temperature of 0.

LangChain’s Plan and Execute agent

We will start by trying out the above query using Plan and Execute agent. The process typically starts by defining the available tools for the agent, followed by running it with the user’s question.

The initial interaction between LangChain and the LLM involves building the plan as follows:

Let's first understand the problem and
devise a plan to solve the problem.
Please output the plan starting with
the header 'Plan:' and then followed by
a numbered list of steps. Please make
the plan the minimum number of steps
required to accurately complete the
task. If the task is a question, the
final step should almost always be
'Given the above steps taken, please
respond to the users original
question'. At the end of your plan, say
'<END_OF_PLAN>'

User: The plan should be relevant to a SQL DB software. 
How many employees are there?

Notice that we added the sentence “The plan should be relevant to a SQL DB software.” before the question to provide some context

And here is GPT-3.5-Turbo reply:

Plan:
1. Connect to the SQL database.
2. Identify the table that contains employee information.
3. Write a SQL query to count the number of rows in the employee table.
4. Execute the SQL query.
5. Retrieve the result of the query.
6. Display the result to the user.
7. Close the database connection.

Given the above steps taken, we can accurately determine the 
number of employees in the database.

The next call to the LLM was to execute the first task:

Respond to the human as helpfully and
accurately as possible. You have access
to the following tools:

SQLDatabase: Executes a sql query on
the SQL DB, args: {{'tool_input':
{{'type': 'string'}}}}

RelevantTables: Use this tool to get
the most relevant tables for a given
query., args: {{'query': {{'title':
'Query', 'type': 'string'}}}}

Use a json blob to specify a tool by
providing an action key (tool name) and
an action_input key (tool input).

Valid "action" values: "Final Answer"
or SQLDatabase, RelevantTables

Provide only ONE action per $JSON_BLOB,
... (ommitted for simplicity) ...

Previous steps: steps=[]

Current objective: value='Connect to
the SQL database.'

And here is GPT-3.5-Turbo reply:

Action:
```
{
  "action": "SQLDatabase",
  "action_input": "Connect to the SQL database."
}
```

It becomes apparent that the agent will not progress with the provided plan. Connecting to SQL Database is not part of SQLDatabase tool exposed functionalities. DB connection is something that is done in the code in the background and SQLDatabase is expects a SQL query to execute

This chain of thought, of course, never ends successfully.

The problem is apparent: the plan is not devised within the context and capabilities of the system

OpenCopilot

OpenCopilot addresses the previously mentioned issues by first giving context to the LLM describing who is he supposed to be. He is an Assistant. This is also useful when users of your software ask the pilot who they are — you don’t want the response to be “I am ChatGPT,” right?

Second, OpenCopilot introduces the concept of Operators. Each set of coherent tasks should belong to an operator. We add an overview of the tools and make sure that he will always devise a plan within these external capabilities:

Your are SQL Assistant and your name is
SQL Assistant. I am SQL Operator. You
are talking to me only and you can talk
to me by providing a set of tasks in
JSON format.

You are required to answer user
inquiries that will be sent to you by
SQL Operator. You can ask SQL Operator
to do some tasks like executing query
on SQL system. If the inquiry is
complex, break it down to simpler
tasks.

Tasks can be one or more of the
following:
- SQL Metadata Helper -->
(TableFinderOp) All queries must
be against a single table, this
operator can help you find the most
relevant table
- SQL Query Helper --> (QueryOp) This
operator requires input from the
TableFinderOp, this operator can
execute queries on SQL DB. The
queries would be on data and not
metadata.
-UiText Operator, (UiTextOp) This operator 
can send a reply to the user

You should only respond with tasks in
JSON format as described in the
following JSON schema:
... (ommitted for simplicity) ...


From SQL Operator: The user is asking: 
How many employees are there? 
JSON:

With the above prompt, the LLM provides a concise and focused plan as follows:

{
    "tasks": [
        {
            "id": 1,
            "name": "Find relevant table",
            "goal_and_purpose": "Find the table that contains 
information about employees",
            "operator": "TableFinderOp",
            "status": "TODO",
            "depends_on_output_of": [],
            "detailed_purpose": "Search for tables with column names 
like 'employee', 'staff', 'worker', etc.",
            "result": ""
        },
        {
            "id": 2,
            "name": "Execute query",
            "goal_and_purpose": "Execute a query to count the 
number of employees",
            "operator": "QueryOp",
            "status": "TODO",
            "depends_on_output_of": [1],
            "detailed_purpose": "Execute a query 
like 'SELECT COUNT(*) FROM employees'",
            "result": ""
        },
        {
            "id": 3,
            "name": "Send reply to user",
            "goal_and_purpose": "Send the result of the query to the user",
            "operator": "UiTextOp",
            "status": "TODO",
            "depends_on_output_of": [2],
            "detailed_purpose": "Send the result of the query to 
the user in a clear and concise manner",
            "result": ""
        }
    ]
}

As demonstrated, the plan consists of exactly three tasks, which is the optimal number of tasks for this question.

Next, we delve deeper and provide detailed command definitions (or tools, as in LangChain) for each operator to obtain the final command. Here is the request for task 1 as an example

Your are Incorta Assistant and your
name is Incorta Assistant. I am Incorta
Operator. You are talking to me only
and you can talk to me by providing a
command in JSON format.

We have a user who sent an inquiry to
the Incorta Operator. Incorta Operator
has prepared a set of tasks to carry
out the inquiry of the user. I want you
to generate command for a specific task
that I will provide later.

The user inquiry is: How many employees are there?

The list of tasks prepared by Incorta
Operator is:
[
  {
    "id": 1,
    "name": "Find relevant table",
    "goal_and_purpose": "Find the table that contains 
information about employees",
    "operator": "TableFinderOp",
    "status": "TODO",
    "depends_on_output_of": [],
    "detailed_purpose": "Search for tables with 
column names like 'employee', 'staff', 'worker', etc.",
    "result": ""
  }
]

You must generate one of the following
commands only and it must follow the
JSON format provided:

Commands Overview: I'm
TableFinderOp, I can get you the
tables that's most relevant to
the user query.

Command 1 Description: [
"Given query, search the available
tables, to find the most relevant table.
That contains all the required field"
]
Command 1 JSON Format:
{
  "command_name": "GetRelevantTable",
  "args": {
    "query": "Query to search for the most relevant table",
    "require_result_summary": false
  }
}


Now, given the list of possible
commands, and the list of tasks,
generate the command for Task_id: 1


Ensure the response can be parsed by
Python json.loads.

Command ==> JSON:

Finally, for this operation, the LLM directly replies with the command:

[
  {
    "command_name": "GetRelevantTable",
    "args": {
      "query": "employee OR staff OR worker",
      "require_result_summary": false
    }
  }
]

It will continue so with consistent commands and without diverging away from the scope of our SQL DB software.

More Room for Commands Description

With commands categorized under operators, we only need to provide command descriptions for each operator. This approach allows for more room to describe complex commands without sacrificing the inclusion of other commands.

Categorizing commands into operations might be somewhat limiting for the LLM. However, in many cases, we found that most of the needed commands can be categorized under one operation without the need to inform the LLM about other available commands.

Conclusion

In this article, we demonstrated how OpenCopilot offers more concise plans that better suit the needs of software companies seeking a copilot, compared to general-purpose agents like the Plan and Execute Agent in LangChain. Furthermore, categorizing commands under operations provides more room for developers to describe complex commands while still fitting within the limited context of LLMs.