Skip to content

IvanLuksic/computer-use-preview

 
 

Repository files navigation

Computer Use Preview

Quick Start

This section will guide you through setting up and running the Computer Use Preview model. Follow these steps to get started.

1. Installation

Clone the Repository

git clone https://github.com/google/computer-use-preview.git
cd computer-use-preview

Set up Python Virtual Environment and Install Dependencies

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install Playwright and Browser Dependencies

Note: Skip this section if you're using Daytona.

# Install system dependencies required by Playwright for Chrome
playwright install-deps chrome

# Install the Chrome browser for Playwright
playwright install chrome

2. Configuration

You can get started using either the Gemini Developer API or Vertex AI.

A. If using the Gemini Developer API:

You need a Gemini API key to use the agent:

export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Or to add this to your virtual environment:

echo 'export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activate

Replace YOUR_GEMINI_API_KEY with your actual key.

B. If using the Vertex AI Client:

You need to explicitly use Vertex AI, then provide project and location to use the agent:

export USE_VERTEXAI=true
export VERTEXAI_PROJECT="YOUR_PROJECT_ID"
export VERTEXAI_LOCATION="YOUR_LOCATION"

Or to add this to your virtual environment:

echo 'export USE_VERTEXAI=true' >> .venv/bin/activate
echo 'export VERTEXAI_PROJECT="your-project-id"' >> .venv/bin/activate
echo 'export VERTEXAI_LOCATION="your-location"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activate

Replace YOUR_PROJECT_ID and YOUR_LOCATION with your actual project and location.

3. Running the Tool

The primary way to use the tool is via the main.py script.

General Command Structure:

python main.py --query "Go to Google and type 'Hello World' into the search bar"

Available Environments:

You can specify a particular environment with the --env <environment> flag. Available options:

  • playwright: Runs the browser locally using Playwright.
  • browserbase: Connects to a Browserbase instance.
  • daytona: Connects to a Daytona sandbox environment for remote computer use.

Local Playwright

Runs the agent using a Chrome browser instance controlled locally by Playwright.

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright"

You can also specify an initial URL for the Playwright environment:

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright" --initial_url="https://www.google.com/search?q=latest+AI+news"

Browserbase

Runs the agent using Browserbase as the browser backend. Ensure the proper Browserbase environment variables are set:BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID.

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="browserbase"

Daytona

Runs the agent using Daytona sandbox as the backend. Ensure the DAYTONA_API_KEY environment variable is set and you have installed the Daytona SDK:

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="daytona"

Agent CLI

The main.py script is the command-line interface (CLI) for running the browser agent.

Command-Line Arguments

Argument Description Required Default Supported Environment(s)
--query The natural language query for the browser agent to execute. Yes N/A All
--env The computer use environment to use. Must be one of the following: playwright, browserbase, or daytona No playwright All
--initial_url The initial URL to load when the browser starts. No https://www.duckduckgo.com All
--highlight_mouse If specified, the agent will attempt to highlight the mouse cursor's position in the screenshots. This is useful for visual debugging. No False (not highlighted) playwright

Environment Variables

Variable Description Required
GEMINI_API_KEY Your API key for the Gemini model. Yes
BROWSERBASE_API_KEY Your API key for Browserbase. Yes (when using the browserbase environment)
BROWSERBASE_PROJECT_ID Your Project ID for Browserbase. Yes (when using the browserbase environment)
DAYTONA_API_KEY Your API key for Daytona. Yes (when using the daytona environment)

Known Issues

Playwright Dropdown Menu

On certain operating systems, the Playwright browser is unable to capture <select> elements because they are rendered by the operating system. As a result, the agent is unable to send the correct screenshot to the model.

There are several ways to mitigate this.

  1. Use the Browserbase option instead of Playwright.
  2. Inject a script like proxy-select to render a custom <select> element. You must inject proxy-select.css and proxy-select.js into each page that has a non-custom <select> element. You can do this in the Playwright.__enter__ method by adding a few lines of code, like the following (replacing PROXY_SELECT_JS and PROXY_SELECT_CSS with the appropriate variables):
self._page.add_init_script(PROXY_SELECT_JS)
def inject_style(page):
    try:
        page.add_style_tag(content=PROXY_SELECT_CSS)
    except Exception as e:
        print(f"Error injecting style: {e}")

self._page.on('domcontentloaded', inject_style)

Note, option 2 does not work 100% of the time, but is a temporary workaround for certain websites. The better option is to use Browserbase.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%