Automated browsing? Browser-use is a FREE alternative to Anthropic’s Computer Use.
I have not tried Anthropic’s Computer Use. It is currently out on OSX only and I use Ubuntu.
Fortunately for me, there is a free alternative that’s actively being developed and so far, is great.
Lets checkout Browser-use.
The readme does not specify any OS restrictions so I want to assume it should work on any computer.
Installation
To install the browser-use
package, run the following command in your terminal:
pip install browser-use
Additionally, you can install the playwright
package, which is optional:
playwright install
Make sure to add your API keys to your .env
file:
OPENAI_API_KEY=xxxxxxxxx
For other settings, models, and more, check out the documentation.
Setup file.
I have written amain.py
file which is based of the example in the Browser-use repository.
import os
import sys
from typing import Optional
from dataclasses import dataclass
from dotenv import load_dotenv
import asyncio
from langchain_openai import ChatOpenAI
from browser_use.browser.browser import Browser, BrowserConfig
from browser_use import Agent, Controller
from tasks.tasks import get_all_recently_launched_profiles, post_message_on_my_timeline
load_dotenv()
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# ============ Configuration Section ============
@dataclass
class TwitterConfig:
"""Configuration for Twitter posting"""
openai_api_key: str
chrome_path: str
target_user: str # Twitter handle without @
message: str
reply_url: str
headless: bool = False
model: str = "gpt-4o-mini"
base_url: str = "https://x.com/home"
# Customize these settings
config = TwitterConfig(
openai_api_key=os.getenv("OPENAI_API_KEY"),
# chrome_path="/opt/brave.com/brave-nightly/brave", # This is for MacOS (Chrome)
chrome_path="/opt/brave.com/brave/brave", # This is for MacOS (Chrome)
target_user="@kodedpxlart",
message=" pst!! who is building more robots",
reply_url="XXXXX",
headless=False,
)
def create_twitter_agent(config: TwitterConfig) -> Agent:
llm = ChatOpenAI(model=config.model, api_key=config.openai_api_key)
browser = Browser(
config=BrowserConfig(
headless=config.headless,
chrome_instance_path=config.chrome_path,
)
)
controller = Controller()
# Construct the full message with tag
full_message = f"@{config.target_user} {config.message}"
# full_message = f"@{config.target_user} {config.message}"
# Create the agent with detailed instructions
return Agent(
# task=post_message_on_my_timeline(base_url=config.base_url, full_message=full_message),
task=get_all_recently_launched_profiles(),
llm=llm,
controller=controller,
browser=browser,
)
async def post_tweet(agent: Agent):
try:
# await agent.run(max_steps=100)
history = await agent.run(max_steps=100)
result = history.final_result()
print(result)
# agent.create_history_gif()
print("Tweet posted successfully!")
except Exception as e:
print(f"Error posting tweet: {str(e)}")
def main():
agent = create_twitter_agent(config)
asyncio.run(post_tweet(agent))
if __name__ == "__main__":
main()
Its spaghetti code. Yep. The important parts are create_twitter_agent
, post_tweet
and config.
: Helps setup and return an
create_twitter_agentAgent
using the passed in config
object. Its sets up a Browser
instance that uses an existing chrome installation because of chrome_instance_path
property. The tasks is set using a function get_all_recently_launched_profiles
that returns the prompt.post_tweet:
This executes the Agent
created by create_twitter_agent
by calling the run
method with the max_steps
property. Lastly, the run history is printed as the result.
Executing the task.
I have 3 tasks to try out with Browser-use. Here are the screen recordings of Agents executing these tasks.
Task 1: Navigate to Twitter and create a post and reply to a tweet
Task 2: Collect 10 user handles related to recently launched startups
Task 3: Engage with Twitter Users Interested in ChatGPT Alternatives
⚡️There a follow up to this post here. If you want to build advanced production ready AI Agents using browser-use, you should read it.
Conclusion
Too early to conclude. So far, I love the results. I can write better prompts and logic to improve the behavior and keep testing.
What about a Cloud-based version?
Will you prefer to run this on your computer or have multiple instances running in the cloud somewhere automated?
Check out Tohju.com for more AI Agents stuff and get to preview our cloud-based browser-use service when it is up.