Enhancing GPT-4o (API) with Web Browsing Capability Using LangChain Agents
Enhance your GPT models with up-to-date information through web search
Integrating web browsing with language models like GPT can significantly enhance their response capabilities by allowing them to access up-to-date and relevant information in real-time. In this article, I’ll show you how to achieve this integration using LangChain and Serper Google Search API.
Introduction
At the time of writing, GPT-4o is the latest and most powerful language model from OpenAI. While it has the capability to perform web searches when used in the web chat interface, this functionality is not available through the API. By leveraging LangChain, we can extend GPT-4o’s capabilities (and those of previous models like GPT-4 and GPT-3.5) to include real-time web searches, making it more effective and accurate in its responses.
Requirements
Before we begin, make sure you have the following API keys:
• OpenAI API Key
Setting Up the Environment
First, install the necessary libraries:
pip install langchain langchain_openai langchain_community
The Script
Here is the complete script to set up the agent:
from langchain_openai import ChatOpenAI
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain_community.utilities import GoogleSerperAPIWrapper
from langchain_core.prompts import PromptTemplate
# API Keys
OPENAI_API_KEY = "your_openai_api_key"
SERPER_API_KEY = "your_serper_api_key"
# Language model
MODEL = "gpt-4o"
# Configure the language model
llm = ChatOpenAI(api_key=OPENAI_API_KEY, model=MODEL)
# Configure the Google search tool
google_search = GoogleSerperAPIWrapper(serper_api_key=SERPER_API_KEY)
# Define the search tool
search_tool = Tool(
name="Google Search",
func=google_search.run,
description="Useful for when you need to answer questions with search"
)
# List of available tools for the agent
tools = [search_tool]
# Define the prompt
template = '''Answer the following questions as best you can. You have access to the following tool:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think if you need to search the web to answer this question.
Action: if you know the answer and you don't think it's necessary to search the web, you can directly answer the question (skip to Final Answer).
Otherwise, if web search is necessary, you can use this tool to search the web [{tool_names}].
Action Input: the input to the action (i.e., the search query you will use).
Observation: the result of the action (i.e., the information retrieved from the web).
... (this Thought/Action/Action Input/Observation sequence can repeat multiple times)
Thought: I now know the final answer.
Final Answer: the complete final answer to the original input question.
Begin!
Question: {input}
Thought: {agent_scratchpad}'''
prompt = PromptTemplate.from_template(template)
# Create the agent
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
# Agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, handle_parsing_errors=True)
Script Breakdown
1. Configure the Language Model: We use ChatOpenAI to set up the GPT-4o model with the corresponding API Key.
2. Configure the Search Tool: We use GoogleSerperAPIWrapper to integrate Google search functionality.
3. Define the Prompt: We create a prompt template that guides the model on how to use the available tools and structure the responses.
4. Create the Agent: We use create_react_agent to combine the LLM and tools with the defined prompt.
5. Configure Executor: We configure AgentExecutor to run the agent and handle potential parsing errors.
Execution and Results
Once configured, you can run a test query to verify that the agent uses web search to obtain relevant information and provide an accurate response.
The parameter verbose=True in AgentExecutor allows you to see the reasoning process of the agent during execution. This is useful for understanding how the agent makes decisions and uses the available tools.
Example 1: Question that Requires Web Search
response = agent_executor.invoke({"input": "What is the current exchange rate of USD to EUR?"})
print(response['output'])
When you run the script, you should see an output similar to the following:
Example 2: Question that Does Not Require Web Search
response = agent_executor.invoke({"input": "What is the capital of France?"})
print(response['output'])
For this query, the agent already knows the answer and does not need to perform a web search:
Conclusion
Integrating web search with language models like GPT-4o (and previous models) when accessed through the API can significantly enhance their responses by providing more up-to-date and relevant information. LangChain facilitates this integration, allowing the creation of powerful and versatile language agents.
I hope you find this tutorial useful. If you have any questions or suggestions, feel free to leave a comment.
Happy coding!