First Impressions: Claude’s Computer Use Agent Capability

Vignesh Ravindran
4 min readOct 29, 2024

--

Anthropic has recently released a promising agent capability called Computer Use, which enables its AI model, Claude, to perform tasks on user’s computers in a manner similar to human interaction. The model has multimodal capabilities and is designed to interpret screenshots and text about the current state of the computer through the API. It then sends back responses about the next steps to be taken on the computer.

Computer Use Workflow

1. Provide Claude with computer use tools and a user prompt

2. Claude decides to use a tool

3. Extract tool input, evaluate the tool on a computer, and return results

4. Claude continues calling computer use tools until it’s completed the task

For more detailed information about how Computer Use works, please refer to the documentation.

Applications and Potential

The versatility of the Computer Use AI extends across various domains, some possible use cases include:

  • Web Search Automation: Streamlining information retrieval processes.
  • Data Entry: Reducing errors and speeding up data management tasks.
  • Software Testing: Automating repetitive testing procedures for improved quality assurance

Reference Implementation

Anthropic has provided a reference implementation with required software, tools, and a VNC server as an Ubuntu container. The reference implementation includes three tools:

  • Computer Tool: Enables Claude to autonomously perform tasks on a computer, mimicking human actions like mouse movements and keyboard input.
  • Text Editor Tool: Allows Claude to interact with text editors for efficient document creation and manipulation.
  • Bash Tool: Facilitates command execution in a bash shell

For more details about the tools refer the documentation

Prerequisites to run the container:

  • Docker: Required for creating a secure environment.
  • API Key: Add credit balance to utilize the APIs and generate a API key for authentication.
  • Docker Desktop : Install docker desktop, to run docker containers on windows.

Run the container:

# Set the API key
set ANTHROPIC_API_KEY=your_api_key

# Run the container
docker run ^
-e ANTHROPIC_API_KEY=%ANTHROPIC_API_KEY% ^
-v "%USERPROFILE%\.anthropic:/home/computeruse/.anthropic" ^
-p 5900:5900 ^
-p 8501:8501 ^
-p 6080:6080 ^
-p 8080:8080 ^
-it ghcr.io/anthropics/anthropic-quickstarts:computer-use-demo-latest
Container Status

Using the demo environment:

Once the container successfully starts, go to http://localhost:8080, which provides an interface to send prompts to the agent and show the screenshots of what's happening inside the container desktop

Demos

Demo 1 : Finding the weather information

Here’s a step-by-step breakdown of how this demo works:

Step 1: Initialization with User Prompt and Tools

  • The user provides a prompt that specifies the desired action (e.g., “Find the Chennai Weather”)
  • The tools used in the reference implementation are Computer, Text editor and bash tool

Step 2: Tool Selection by Claude

  • Claude evaluates the user prompt against the available tools to determine which one(s) can assist in completing the task.
  • If a suitable tool is identified, Claude formulates a request to use that tool, indicating its intent through an API response.
  • In this case it first decides to use Computer tool to take screenshot

Step 3: Execution of Tool Commands

  • The application extracts the tool name and input from Claude’s request.
  • The specified tool is executed within a controlled environment (like a virtual machine or a container), where it performs actions such as taking screenshots or moving the cursor.

Step 4: Feedback Loop for Task Completion

  • After executing the tool, Claude analyzes the results to determine if further actions are necessary to complete the task. In this case “it opens FireFox”
  • If additional steps are required, Claude will issue another request to use a tool, continuing this process until the task is finished.

Step 5: Response Generation

  • Once Claude determines that the task is complete or no further tool use is needed, it generates a text response for the user summarizing what was accomplished.
Demo 1 — Finding weather information

Demo 2 : Python Program running in the Background

Running program in background

Demo 3 : Python Program running in the Terminal

Running program in the Terminal

Current Challenges:

  • Multiple API calls are required to complete a task, resulting in API rate limiting
  • Higher consumption of input and output API tokens
  • In testing, running these demos approximately 10 times cost around 2 USD
Rate Limits
Rate Limiting error
Tokens Consumed

Limitations:

Claude’s computer abilities are still in testing phase and have some important limits to keep in mind.
May work slower than a human operator

  • Can make mistakes when clicking or interpreting screen elements
  • Has difficulty with tasks like scrolling or working with spreadsheets
  • Cannot create social media accounts or post content online for safety reasons
  • Best suited for non-time-critical tasks in secure environments
  • Prone to prompt injection

For detailed information about limitations refer the documentation

Demonstrations by Anthropic and Community:

RPA vs Computer Use

While experimenting with Computer Use, I noticed its similarities to RPA bots and became curious about what sets them apart from each other.

Closing thoughts

Although Computer Use is currently in its beta phase, its features are expected to evolve and become more advanced over time. Similar capabilities are likely to emerge from other leading frontier model vendors as well. Companies from different domains, such as Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company, have already begun exploring this technology. Enterprises will need to evaluate whether agents like Computer Use, in their current form, can bring business benefits when compared to traditional RPA solutions.

--

--

Vignesh Ravindran
Vignesh Ravindran

Written by Vignesh Ravindran

Cloud Solution Architect — IBM

No responses yet