Computer Use Preview

Quick Start

This section will guide you through setting up and running the Computer Use Preview model. Follow these steps to get started.

1. Installation

Clone the Repository

git clone https://github.com/mquirosbloch/computer-use-public-preview.git
cd computer-use-preview

Set up Python Virtual Environment and Install Dependencies

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install Playwright and Browser Dependencies

# Install system dependencies required by Playwright for Chrome
playwright install-deps chrome

# Install the Chrome browser for Playwright
playwright install chrome

2. Configuration

You can get started using either the Gemini Developer API or Vertex AI.

A. If using the Gemini Developer API:

You need a Gemini API key to use the agent:

export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Or to add this to your virtual environment:

echo 'export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activate

Replace YOUR_GEMINI_API_KEY with your actual key.

B. If using the Vertex AI Client:

You need to explicitly use Vertex AI, then provide project and location to use the agent:

export USE_VERTEXAI=true
export VERTEXAI_PROJECT="YOUR_PROJECT_ID"
export VERTEXAI_LOCATION="YOUR_LOCATION"

Or to add this to your virtual environment:

echo 'export USE_VERTEXAI=true' >> .venv/bin/activate
echo 'export VERTEXAI_PROJECT="your-project-id"' >> .venv/bin/activate
echo 'export VERTEXAI_LOCATION="your-location"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activate

Replace YOUR_PROJECT_ID and YOUR_LOCATION with your actual project and location.

3. Running the Tool

The primary way to use the tool is via the main.py script.

General Command Structure:

python main.py --query "Go to Google and type 'Hello World' into the search bar"

Available Environments:

You can specify a particular environment with the --env <environment> flag. Available options:

playwright: Runs the browser locally using Playwright.
browserbase: Connects to a Browserbase instance.

Local Playwright

Runs the agent using a Chrome browser instance controlled locally by Playwright.

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright"

You can also specify an initial URL for the Playwright environment:

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright" --initial_url="https://www.google.com/search?q=latest+AI+news"

Browserbase

Runs the agent using Browserbase as the browser backend. Ensure the proper Browserbase environment variables are set:BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID.

python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="browserbase"

Available Models:

You can choose the model to use by specifying the --model <model name> flag. Available options:

gemini-2.5-computer-use-preview-10-2025: This is the default model.
gemini-3-flash-preview: The preview version of Gemini 3 Flash.
gemini-3-pro-preview: The preview version of Gemini 3 Pro.

Persistent Browser Mode

The --persistent_browser flag allows the agent to connect to an existing Chromium browser or auto-launch one that persists across runs. This enables:

Chaining tasks across multiple runs without losing browser state
Manual authentication or setup before agent execution
Faster subsequent runs by reusing an existing browser

# First run - auto-launches browser and waits for user confirmation
python main.py --query="Navigate to github.com" --persistent_browser

# Subsequent runs - connects to existing browser immediately (no prompt)
python main.py --query="Search for repositories" --persistent_browser

Behavior:

Auto-launch: If no browser is running on the CDP port, launches Chromium and prompts for user confirmation (allowing manual authentication/setup)
Connect to existing: If browser is already running, connects immediately without prompt and preserves current page state
Custom port: Use --cdp_port to specify a different CDP port (default: 9222)

The browser remains open after script completion for reuse in subsequent runs.

Trajectory Reports

Each agent run generates a detailed HTML trajectory report in the outputs/ directory:

outputs/
└── run_YYYYMMDD_HHMMSS/
    ├── report.html          # Interactive trajectory visualization
    ├── console.log          # Agent console output
    └── screenshots/         # Screenshots from each step

The report includes:

Agent reasoning at each step
Function calls and arguments
Screenshots showing browser state
Complete execution timeline

Open report.html in a browser to review.

Agent CLI

The main.py script is the command-line interface (CLI) for running the browser agent.

Command-Line Arguments

Argument	Description	Required	Default	Supported Environment(s)
`--query`	The natural language query for the browser agent to execute.	Yes	N/A	All
`--env`	The computer use environment to use. Must be one of the following: `playwright`, or `browserbase`	No	N/A	All
`--initial_url`	The initial URL to load when the browser starts.	No	https://www.google.com	All
`--highlight_mouse`	If specified, the agent will attempt to highlight the mouse cursor's position in the screenshots. This is useful for visual debugging.	No	False (not highlighted)	`playwright`
`--model`	The model to use. See the "Available Models" section for more information.	No	`gemini-2.5-computer-use-preview-10-2025`	All
`--persistent_browser`	Connect to existing browser or auto-launch with CDP. Enables browser persistence across runs.	No	False	`playwright`
`--cdp_port`	CDP port to use for persistent browser mode.	No	9222	`playwright` (with `--persistent_browser`)

Environment Variables

Variable	Description	Required
GEMINI_API_KEY	Your API key for the Gemini model.	Yes
BROWSERBASE_API_KEY	Your API key for Browserbase.	Yes (when using the browserbase environment)
BROWSERBASE_PROJECT_ID	Your Project ID for Browserbase.	Yes (when using the browserbase environment)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
computers		computers
outputs		outputs
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
main.py		main.py
orchestrator.py		orchestrator.py
requirements.txt		requirements.txt
test_agent.py		test_agent.py
test_main.py		test_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computer Use Preview

Quick Start

1. Installation

2. Configuration

A. If using the Gemini Developer API:

B. If using the Vertex AI Client:

3. Running the Tool

Trajectory Reports

Agent CLI

Command-Line Arguments

Environment Variables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Computer Use Preview

Quick Start

1. Installation

2. Configuration

A. If using the Gemini Developer API:

B. If using the Vertex AI Client:

3. Running the Tool

Trajectory Reports

Agent CLI

Command-Line Arguments

Environment Variables

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages