This section will guide you through setting up and running the Computer Use Preview model. Follow these steps to get started.
Clone the Repository
git clone https://github.com/mquirosbloch/computer-use-public-preview.git
cd computer-use-previewSet up Python Virtual Environment and Install Dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtInstall Playwright and Browser Dependencies
# Install system dependencies required by Playwright for Chrome
playwright install-deps chrome
# Install the Chrome browser for Playwright
playwright install chromeYou can get started using either the Gemini Developer API or Vertex AI.
You need a Gemini API key to use the agent:
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"Or to add this to your virtual environment:
echo 'export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activateReplace YOUR_GEMINI_API_KEY with your actual key.
You need to explicitly use Vertex AI, then provide project and location to use the agent:
export USE_VERTEXAI=true
export VERTEXAI_PROJECT="YOUR_PROJECT_ID"
export VERTEXAI_LOCATION="YOUR_LOCATION"Or to add this to your virtual environment:
echo 'export USE_VERTEXAI=true' >> .venv/bin/activate
echo 'export VERTEXAI_PROJECT="your-project-id"' >> .venv/bin/activate
echo 'export VERTEXAI_LOCATION="your-location"' >> .venv/bin/activate
# After editing, you'll need to deactivate and reactivate your virtual
# environment if it's already active:
deactivate
source .venv/bin/activateReplace YOUR_PROJECT_ID and YOUR_LOCATION with your actual project and location.
The primary way to use the tool is via the main.py script.
General Command Structure:
python main.py --query "Go to Google and type 'Hello World' into the search bar"Available Environments:
You can specify a particular environment with the --env <environment> flag. Available options:
playwright: Runs the browser locally using Playwright.browserbase: Connects to a Browserbase instance.
Local Playwright
Runs the agent using a Chrome browser instance controlled locally by Playwright.
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright"You can also specify an initial URL for the Playwright environment:
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="playwright" --initial_url="https://www.google.com/search?q=latest+AI+news"Browserbase
Runs the agent using Browserbase as the browser backend. Ensure the proper Browserbase environment variables are set:BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID.
python main.py --query="Go to Google and type 'Hello World' into the search bar" --env="browserbase"Available Models:
You can choose the model to use by specifying the --model <model name> flag. Available options:
gemini-2.5-computer-use-preview-10-2025: This is the default model.gemini-3-flash-preview: The preview version of Gemini 3 Flash.gemini-3-pro-preview: The preview version of Gemini 3 Pro.
Persistent Browser Mode
The --persistent_browser flag allows the agent to connect to an existing Chromium browser or auto-launch one that persists across runs. This enables:
- Chaining tasks across multiple runs without losing browser state
- Manual authentication or setup before agent execution
- Faster subsequent runs by reusing an existing browser
# First run - auto-launches browser and waits for user confirmation
python main.py --query="Navigate to github.com" --persistent_browser
# Subsequent runs - connects to existing browser immediately (no prompt)
python main.py --query="Search for repositories" --persistent_browserBehavior:
- Auto-launch: If no browser is running on the CDP port, launches Chromium and prompts for user confirmation (allowing manual authentication/setup)
- Connect to existing: If browser is already running, connects immediately without prompt and preserves current page state
- Custom port: Use
--cdp_portto specify a different CDP port (default: 9222)
The browser remains open after script completion for reuse in subsequent runs.
Each agent run generates a detailed HTML trajectory report in the outputs/ directory:
outputs/
└── run_YYYYMMDD_HHMMSS/
├── report.html # Interactive trajectory visualization
├── console.log # Agent console output
└── screenshots/ # Screenshots from each step
The report includes:
- Agent reasoning at each step
- Function calls and arguments
- Screenshots showing browser state
- Complete execution timeline
Open report.html in a browser to review.
The main.py script is the command-line interface (CLI) for running the browser agent.
| Argument | Description | Required | Default | Supported Environment(s) |
|---|---|---|---|---|
--query |
The natural language query for the browser agent to execute. | Yes | N/A | All |
--env |
The computer use environment to use. Must be one of the following: playwright, or browserbase |
No | N/A | All |
--initial_url |
The initial URL to load when the browser starts. | No | https://www.google.com | All |
--highlight_mouse |
If specified, the agent will attempt to highlight the mouse cursor's position in the screenshots. This is useful for visual debugging. | No | False (not highlighted) | playwright |
--model |
The model to use. See the "Available Models" section for more information. | No | gemini-2.5-computer-use-preview-10-2025 |
All |
--persistent_browser |
Connect to existing browser or auto-launch with CDP. Enables browser persistence across runs. | No | False | playwright |
--cdp_port |
CDP port to use for persistent browser mode. | No | 9222 | playwright (with --persistent_browser) |
| Variable | Description | Required |
|---|---|---|
| GEMINI_API_KEY | Your API key for the Gemini model. | Yes |
| BROWSERBASE_API_KEY | Your API key for Browserbase. | Yes (when using the browserbase environment) |
| BROWSERBASE_PROJECT_ID | Your Project ID for Browserbase. | Yes (when using the browserbase environment) |