Mastering Browser Automation with Langchain Agent and Playwright Tools

5 min readJun 18, 2024

Introduction

In the ever-evolving world of technology, automation is key to efficiency and productivity. Browser automation, in particular, offers vast possibilities, from automated testing to data scraping and beyond. This tutorial covers using Langchain with Playwright to control a browser with GPT-4.

Part 1: Using Langchain with Playwright to Control a Browser with GPT-4

Step 1: Setting Up the Environment

First, ensure you have Python installed on your system. Then, install the necessary packages:

pip install langchain playwright langchain_community langchain_openai lxml langchainhub

Additionally, you need to install the Playwright browsers:

playwright install

Step 2: Integrating Langchain and Playwright

Create a new Python script and import the required modules:

from langchain_community.agent_toolkits import PlayWrightBrowserToolkit
from langchain_community.tools.playwright.utils import create_sync_playwright_browser
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI

Step 3: Initializing the Playwright Browser and Load tools

Initialize the synchronous Playwright browser with tools for our agent:

sync_browser = create_sync_playwright_browser()
toolkit = PlayWrightBrowserToolkit.from_browser(sync_browser=sync_browser)
tools = toolkit.get_tools()

Step 4: Setting Up the GPT-4 Agent

Define the prompt for the agent. You can try the hwchase17/react prompt with the create_react_agent agent as well however the function calls sometimes fail and the chain crashes when using that agent. You can modify this prompt based on your specific needs:

prompt = hub.pull("hwchase17/openai-tools-agent")

Step 5: Choosing the Language Model

Choose the LLM that will drive the agent. I have previously tried making the tools work with GPT-3.5-turbo but it often fails as it is not finetuned for tool calling to return properly formatted JSONs. In this case, we are using the GPT-4 model:

llm = ChatOpenAI(model="gpt-4", temperature=0)

Step 6: Creating the Agent and Agent Executor

Construct the OpenAI Tools agent with the LLM and the tools initialized earlier and create an agent executor by passing in the agent and tools. The agent executor will handle the execution of the commands:

agent = create_openai_tools_agent(llm, tools, prompt)
# agent = create_react_agent(llm, tools, prompt) #Use this if using hwchase17/react prompt
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Step 7: Controlling the Browser

You can now control the browser using natural language commands. The agent can also scrape html elements, click specific buttons on the page with the click tool, extract hyperlinks, etc. Here for instance, we navigate to a website, scrape data from it and summarize it:

command = {
    "input": "Go to https://python.langchain.com/v0.2/docs/integrations/toolkits/playwright/ and give me summary of all tools mentioned on the page you get. Print out url at each step."
}
agent_executor.invoke(command)

Terminal Output:

> Entering new AgentExecutor chain...

Invoking: `navigate_browser` with `{'url': 'https://python.langchain.com/v0.2/docs/integrations/toolkits/playwright/'}`


Navigating to https://python.langchain.com/v0.2/docs/integrations/toolkits/playwright/ returned status code 200
Invoking: `current_webpage` with `{}`


https://python.langchain.com/v0.2/docs/integrations/toolkits/playwright/
Invoking: `extract_text` with `{}`


PlayWright Browser | 🦜️🔗 LangChain Skip to main content LangChain 0.2 is out! Leave feedback on th e v0.2 docs here . You can view the v0.1 docs here . Integrations API Reference More People Contributing Templates Cookbooks 3rd party tutorials YouTube arXiv v0.2 v0.2 v0.1 
🦜️🔗 LangSmith LangSmith Docs LangServe GitHub Templates GitHub Templates Hub LangChain Hub JS/TS D ocs 💬 Search K Providers Providers Anthropic AWS Google Hugging Face Microsoft OpenAI More Components Chat models LLMs Embedding models Document loaders Document transformers Vector stores Retrievers Tools Toolkits AINetwork Airbyte Question Answering Amadeus Azure AI Services Azure Cognitive Services Cassandra Database ClickUp cogniswitch Connery Toolkit CSV Document Comparison Github Gitlab Gmail Jira JSON MultiOn NASA Office365 OpenAPI Natural Language APIs Pandas Dataframe PlayWright Browser Polygon IO Toolkit PowerBI Dataset Python Robocorp Slack Spark Dataframe Spark SQL 
SQL Database Steam Game Recommendation & Game Details Xorbits Memory Graphs Callbacks Chat loaders Adapters Stores Model caches Components Toolkits PlayWright Browser On this page PlayWright Browser This toolkit is used to interact with the browser. While other tools (like 
the Requests tools) are fine for static sites, PlayWright Browser toolkits let your agent navigate the web and interact with dynamically 
..............{truncated}

The page at the URL [https://python.langchain.com/v0.2/docs/integrations/toolkits/playwright/](https://python.langchain.com/v0.2/docs/integrations/toolkits/playwright/) is about the PlayWright Browser toolkit. This toolkit is used to interact with the browser and is especially useful for navigating and interacting with dynamically rendered sites. The tools bundled within the PlayWright Browser toolkit include:

1. **NavigateTool (navigate_browser)**: This tool is used to navigate to a URL.
2. **NavigateBackTool (previous_page)**: This tool is used to navigate back to the previous page in the browser history.
3. **ClickTool (click_element)**: This tool is used to click on an element specified by a CSS selector.
4. **ExtractTextTool (extract_text)**: This tool uses beautiful soup to extract text from the current web page.
5. **ExtractHyperlinksTool (extract_hyperlinks)**: This tool uses beautiful soup to extract hyperlinks from the current web page.        
6. **GetElementsTool (get_elements)**: This tool is used to select elements by CSS selector.
7. **CurrentPageTool (current_page)**: This tool is used to get the current page URL.

The page also provides information on how to install the necessary packages, how to instantiate a Browser Toolkit, and how to use the tools within an agent.

Complete code:

from langchain_community.agent_toolkits import PlayWrightBrowserToolkit
from langchain_community.tools.playwright.utils import (
    create_sync_playwright_browser,  # A synchronous browser is available, though it isn't compatible with jupyter.\n",      },
)
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-tools-agent")


sync_browser = create_sync_playwright_browser()
toolkit = PlayWrightBrowserToolkit.from_browser(sync_browser=sync_browser)
tools = toolkit.get_tools()

# Choose the LLM that will drive the agent
# Only certain models support this
llm = ChatOpenAI(model="gpt-4", temperature=0)

# Construct the OpenAI Tools agent
agent = create_openai_tools_agent(llm, tools, prompt)


# Create an agent executor by passing in the agent and tools
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

command = {
    "input": "Go to https://python.langchain.com/v0.2/docs/integrations/toolkits/playwright/ and give me summary of all tools mentioned on the page you get. Print out url at each step."
}
agent_executor.invoke(command)

Conclusion

By following this guide, you have set up a powerful automation environment using Langchain and Playwright. This combination leverages the capabilities of GPT-4 for natural language processing and Playwright for robust browser automation. Experiment with these tools to streamline your workflow and unlock new automation possibilities. Happy coding!

In Part 2, we will be adding the Fill tool to the Playwright toolkit that can be used by OpenAI agent to fill in text fields on webpages.

This article provides a comprehensive starting point for those looking to integrate advanced browser automation into their projects using Langchain and Playwright. For more detailed documentation and advanced features, refer to the Langchain and Playwright official documentation.

This is also my first Medium article, so please add your suggestions regarding any errors, libraries, code fixes, etc. in comments below and I will fix them asap. Thanks!