Building agents that can call APIs with the OpenAPI specification

Published in

Indicium Engineering

8 min readJul 5, 2024

Expand the capabilities of your conversational agents and enable them to interact dynamically with APIs.

By bridging the LangChain framework with the versatile OpenAPI specification, we’ll build a conversational agent that goes beyond text processing.

This guide is designed for developers and enthusiasts eager to enhance their conversational agents. We will delve into the mechanisms of LangChain, explain how it interacts with external APIs, and provide code snippets to illustrate these concepts.

By the end of this post, you’ll be equipped with the knowledge to create an agent that not only communicates effectively, but also leverages external data through API calls.

Environment setup and API key configuration

The first step in creating an agent and integrating API calls involves setting up the environment. This includes loading environment variables and configuring API keys, which are critical for authenticating API calls securely.

By using the dotenv library, sensitive information like API keys are kept out of the codebase and loaded from an environment file (.env). This ensures that the API key, necessary for making requests to OpenAI and other APIs, remains secure and can be easily modified without changing the code.

import os
from dotenv import load_dotenv, find_dotenv
import openai

# Load environment variables from .env file
_ = load_dotenv(find_dotenv())  
openai.api_key = os.environ['OPENAI_API_KEY']

Explanation: this snippet initializes the agent with the necessary API credentials, ensuring all requests to external APIs are authenticated, a critical step for API interaction.

API interaction tools: defining and using tools

In the realm of LangChain, tools are central components that dramatically amplify the capabilities of conversational agents by enabling them to perform specific, often complex tasks, such as making API calls.

The concept of a tool transcends simple function execution, integrating into the core of LangChain to facilitate dynamic interactions with external systems and services.

Why tools are important

Tools are crucial because they modularize functionality into reusable components within an agent. This modularity allows developers to abstract complex operations into simpler, manageable entities that can be easily tested, maintained, and scaled.

For instance, a tool to fetch weather data can be used across various parts of an application wherever weather data is required, ensuring consistency and reducing code duplication.

Leveraging the use case of tools

Tools enhance the flexibility and utility of conversational agents by allowing them to interact with a wide array of external APIs and services. By encapsulating API logic within tools, agents can:

Dynamically fetch data in response to user queries.
Process and compute data using external algorithms.
Integrate with other services for extended functionality like booking, reservations, or even conducting transactions.

When and why to use tools

Tools should be employed whenever there is a need to extend the agent’s capabilities beyond static responses or pre-defined scripts. They are particularly useful when:

The response depends on real-time or frequently updated data.
The agent’s tasks involve complex computations that are best handled externally.
There is a need to interact with third-party services as part of the conversational flow.

How to create and define tools

Creating tools in LangChain involves a few systematic steps that ensure functionality and ease of use:

Define the tool function: this function performs the primary operation of the tool, such as making an HTTP request to an API.
Use the @tool decorator: LangChain provides a @tooldecorator to designate functions as tools. This decorator enriches the function with additional capabilities like automatic integration into the LangChain ecosystem.
Specify input schemas with pydantic: to ensure that tools receive properly formatted and valid data, input schemas can be defined using Pydantic models. This not only helps in validating the inputs but also in documenting the expected data structure, which enhances maintainability and reduces errors.

Here’s a detailed example to illustrate the creation of a tool:

from langchain.agents import tool
import requests
import datetime
from pydantic import BaseModel, Field

class OpenMeteoInput(BaseModel):
    latitude: float = Field(..., description="Latitude of the location to fetch weather data for")
    longitude: float = Field(..., description="Longitude of the location to fetch weather data for")

@tool(args_schema=OpenMeteoInput)
def get_current_temperature(latitude: float, longitude: float) -> dict:
    """Fetch current temperature for given coordinates using the Open-Meteo API."""
    BASE_URL = "https://api.open-meteo.com/v1/forecast"
    params = {
        'latitude': latitude,
        'longitude': longitude,
        'hourly': 'temperature_2m',
        'forecast_days': 1,
    }
    response = requests.get(BASE_URL, params=params)
    if response.status_code == 200:
        results = response.json()
        # Extract temperature data
        current_utc_time = datetime.datetime.utcnow()
        time_list = [datetime.datetime.fromisoformat(time_str.replace('Z', '+00:00')) for time_str in results['hourly']['time']]
        temperature_list = results['hourly']['temperature_2m']
        closest_time_index = min(range(len(time_list)), key=lambda i: abs(time_list[i] - current_utc_time))
        current_temperature = temperature_list[closest_time_index]
        return {"temperature": current_temperature}
    else:
        raise Exception(f"API Request failed with status code: {response.status_code}")

Explanation: this tool encapsulates all the logic required to interact with the Open-Meteo API, fetch the temperature data based on the provided coordinates, and handle errors effectively. The use of Pydantic models for input validation ensures that the tool is robust and reliable, capable of handling real-world scenarios where data integrity is paramount.

In summary, tools in LangChain are not merely functions; they are powerful extensions that enable conversational agents to perform a wide range of tasks dynamically and interactively, making them invaluable in building sophisticated, responsive, and capable agents.

OpenAI integration with LangChain

Integrating OpenAI with LangChain involves binding functions that can leverage GPT models to understand and generate human-like text.

In the code provided, functions like get_current_temperature are formatted to be compatible with OpenAI through the format_tool_to_openai_function function.

This enables the ChatOpenAI model to utilize these tools as part of its decision-making process, enhancing the agent’s ability to handle complex queries that require external data.

This integration allows the agent to effectively parse user inputs, decide on the appropriate API calls, process the data, and generate informative responses.

from langchain_community.chat_models import ChatOpenAI
from langchain.tools.render import format_tool_to_openai_function

tools = [get_current_temperature, search_wikipedia]
functions = [format_tool_to_openai_function(f) for f in tools]
model = ChatOpenAI(temperature=0).bind(functions=functions)

Explanation: here, the agent is equipped with functions that convert regular tools into ones compatible with OpenAI’s API, enabling it to perform complex language tasks.

Dynamic conversation templates

ChatPromptTemplate plays a pivotal role in structuring the interactions within the conversational agent. It outlines how messages are formatted and ensures that responses are generated in context.

Using placeholders like MessagesPlaceHolder, the system can dynamically insert user inputs and previous messages into the prompt, allowing the AI to maintain context over the course of the conversation.

This is crucial for providing coherent and context-aware interactions, as the model can refer back to earlier parts of the conversation when formulating responses.

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful but sassy assistant"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

Explanation: this template structure helps in maintaining an interactive dialogue where past interactions influence future responses, crucial for contextual awareness within the agent.

Memory management and execution control

Managing memory in conversational agents is vital for maintaining the state and context of the interaction. CoversationBufferMemory stores messages and other data across the conversation, which helps the agent remember past interactions and make decisions based on the entire conversation history.

This is particularly important when dealing with API calls that may need to consider previous inputs or commands. AgentExecutor orchestrates the execution of the agent chain, managing the invocation of tools, handling responses, and maintaining verbose logging for debugging purposes. This component ensures that all parts of the system work in unison to handle user inputs effectively and generate the correct outputs.

from langchain.memory import ConversationBufferMemory
from langchain.agents import AgentExecutor

memory = ConversationBufferMemory(return_messages=True, memory_key="chat_history")
agent_executor = AgentExecutor(agent=agent_chain, tools=tools, verbose=True, memory=memory)

Explanation: this setup allows the agent to remember and utilize the conversation history, aiding in making more informed decisions based on previous interactions.

Agent chains: streamlining data flow and processing

An agent chain is a sequence of processing steps that each piece of user input goes through within the agent. In our setup, the chain consists of a prompt, the AI model, and an output parser. This chain is crucial for directing the flow of data through the agent, ensuring that each input is appropriately processed and that the outputs are parsed to trigger the correct tools or API calls.

The agent chain is configured with the RunnablePassThrough to handle the intermediate steps effectively. This setup allows the agent to maintain a scratchpad (a temporary storage space) where it can format and store the results of tool executions, making them available for subsequent steps in the conversation.

RunnablePassThrough.assign: this method is used to dynamically assign the formatted intermediate steps back into the agent scratchpad. The lambda function (lambda x: format_to_openai_functions(x[‘intermediate_steps’])) formats these steps to be compatible with OpenAI functions, ensuring that all data passed through the agent is properly structured.
Composition of the Agent Chain: the chain is composed of the prompt, the model, and the output parser connected sequentially. This setup ensures that the input goes through a structured process of interpretation, tool invocation, and response generation.

agent_chain = RunnablePassthrough.assign(
    agent_scratchpad= lambda x: format_to_openai_functions(x["intermediate_steps"])
) | prompt | model | OpenAIFunctionsAgentOutputParser()

Interactive function: bridging user inputs and agent responses

The provided code snipped below outlines an important part of building a conversational agent: the runtime function (run_agent) and the configuration of the agent chain to include dynamic and iterative processing steps.

Let’s delve into the explanation, aligning it with the code below: the run_agent function is pivotal in managing the interaction between the user and the conversational agent. This function performs several key tasks:

Handling continuous user input: it initiates an infinite loop to continuously accept and process user inputs until the conversation ends. This loop is essential for maintaining an ongoing dialogue with the user.
Invoking the agent chain: each user input is processed through the agent chain by invoking it with the current user input and any intermediate steps that have been accumulated. This ensures that each input is considered in the context of previous interactions, which is crucial for maintaining a coherent conversation flow.
Processing intermediate steps: the function keeps track of intermediate steps — actions and observations made during the interaction. These are passed back into the agent chain to help maintain state and context throughout the conversation.
Handling tool execution: depending on the output of the agent chain (result), the function determines which tool to execute. This dynamic mapping (tool={…}) allows the function to run the appropriate tool based on the agent’s decision, which might involve fetching current temperature or searching Wikipedia.

Appending observations: after executing a tool, the observation (result of the tool execution) is appended to intermediate_steps, which keeps a running log of all actions and results within the conversation. This is critical for tools that depend on previous outputs to function correctly in subsequent steps.

def run_agent(user_input):
    intermediate_steps = []
    while True:
        result = agent_chain.invoke({
            "input": user_input,
            "intermediate_steps": intermediate_steps
        })
        if isinstance(result, AgentFinish):
            return result
        tool = {
            "search_wikipedia": search_wikipedia,
            "get_current_temperature": get_current_temperature
        }[result.tool]
        observation = tool.run(result.tool_input)
        intermediate_steps.append((result, observation)

To Review…

This function and chain configuration are central to how the conversational agent manages and utilizes data throughout the interaction, ensuring that each user input is processed in a context-aware manner and that the agent can dynamically respond based on both the current input and the history of the conversation. This setup exemplifies how complex interactions with APIs are handled in modern conversational AI systems, enabling robust and flexible dialogues with users.

Supporting code on GitHub

You can find the supporting complete code in the GitHub repository. This demonstrates the processes outlined above for creating a conversational agent that can utilize external APIs. This includes:

The open_agent_API.py script
A requirements.txt file
A README.md file to guide you through this process
The API_agent_chatbot.py script