This section will guide you through setting up and running the Computer Use Preview model. Follow these steps to get started.
Clone the Repository
git clone https://github.com/google/computer-use-preview.git
cd computer-use-previewSet up Python Virtual Environment and Install Dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtInstall Playwright and Browser Dependencies
Note: Skip this section if you're using Daytona.
# Install system dependencies required by Playwright for Chrome
playwright install-deps chrome
# Install the Chrome browser for Playwright
playwright install chromeYou can get started using either the Gemini Developer API or Vertex AI.
You need a Gemini API key to use the agent:
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"Or to add this to your virtual environment:
echo 'export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activateReplace YOUR_GEMINI_API_KEY with your actual key.
You need to explicitly use Vertex AI, then provide project and location to use the agent:
export USE_VERTEXAI=true
export VERTEXAI_PROJECT="YOUR_PROJECT_ID"
export VERTEXAI_LOCATION="YOUR_LOCATION"Or to add this to your virtual environment:
echo 'export USE_VERTEXAI=true' >> .venv/bin/activate
echo 'export VERTEXAI_PROJECT="your-project-id"' >> .venv/bin/activate
echo 'export VERTEXAI_LOCATION="your-location"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activateReplace YOUR_PROJECT_ID and YOUR_LOCATION with your actual project and location.
The primary way to use the tool is via the main.py script.
General Command Structure:
python main.py --query "Go to Google and type 'Hello World' into the search bar"Available Environments:
You can specify a particular environment with the --env <environment> flag. Available options:
playwright: Runs the browser locally using Playwright.browserbase: Connects to a Browserbase instance.daytona: Connects to a Daytona sandbox environment for remote computer use.
Local Playwright
Runs the agent using a Chrome browser instance controlled locally by Playwright.
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright"You can also specify an initial URL for the Playwright environment:
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright" --initial_url="https://www.google.com/search?q=latest+AI+news"Browserbase
Runs the agent using Browserbase as the browser backend. Ensure the proper Browserbase environment variables are set:BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID.
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="browserbase"Daytona
Runs the agent using Daytona sandbox as the backend. Ensure the DAYTONA_API_KEY environment variable is set and you have installed the Daytona SDK:
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="daytona"The main.py script is the command-line interface (CLI) for running the browser agent.
| Argument | Description | Required | Default | Supported Environment(s) |
|---|---|---|---|---|
--query |
The natural language query for the browser agent to execute. | Yes | N/A | All |
--env |
The computer use environment to use. Must be one of the following: playwright, browserbase, or daytona |
No | playwright |
All |
--initial_url |
The initial URL to load when the browser starts. | No | https://www.duckduckgo.com | All |
--highlight_mouse |
If specified, the agent will attempt to highlight the mouse cursor's position in the screenshots. This is useful for visual debugging. | No | False (not highlighted) | playwright |
| Variable | Description | Required |
|---|---|---|
| GEMINI_API_KEY | Your API key for the Gemini model. | Yes |
| BROWSERBASE_API_KEY | Your API key for Browserbase. | Yes (when using the browserbase environment) |
| BROWSERBASE_PROJECT_ID | Your Project ID for Browserbase. | Yes (when using the browserbase environment) |
| DAYTONA_API_KEY | Your API key for Daytona. | Yes (when using the daytona environment) |
On certain operating systems, the Playwright browser is unable to capture <select> elements because they are rendered by the operating system. As a result, the agent is unable to send the correct screenshot to the model.
There are several ways to mitigate this.
- Use the Browserbase option instead of Playwright.
- Inject a script like proxy-select to render a custom
<select>element. You must injectproxy-select.cssandproxy-select.jsinto each page that has a non-custom<select>element. You can do this in thePlaywright.__enter__method by adding a few lines of code, like the following (replacingPROXY_SELECT_JSandPROXY_SELECT_CSSwith the appropriate variables):
self._page.add_init_script(PROXY_SELECT_JS)
def inject_style(page):
try:
page.add_style_tag(content=PROXY_SELECT_CSS)
except Exception as e:
print(f"Error injecting style: {e}")
self._page.on('domcontentloaded', inject_style)Note, option 2 does not work 100% of the time, but is a temporary workaround for certain websites. The better option is to use Browserbase.