Agent Browser Protocol - Chromium Fork

A Chromium fork implementing the Agent Browser Protocol (ABP) - a REST-based API for AI agent browser control at the C++ engine level. Unlike CDP or browser extensions, ABP operates directly in the browser engine for lower latency and greater capability.

Current Implementation Status

Working Features

Tab Management: List, create, close, get info, activate, stop loading
Navigation: Navigate to URL, back, forward, reload
Screenshots: Capture viewport with optional element markup overlays (GET binary or POST with action envelope)
Mouse Input: Click, move, scroll (native mouse wheel events via RenderWidgetHost)
Keyboard Input: Type text, press/down/up key events with modifier support
JavaScript Execution: Execute scripts and get results (via CDP Runtime.evaluate)
Text Extraction: Get page text or text from a CSS selector
Dialogs: Get pending dialog info, accept, dismiss (alert/confirm/prompt/beforeunload)
Downloads: List, get status, cancel
File Chooser: Provide files to native file picker dialogs
Permissions: Intercept permission prompts, grant/deny with type validation (geolocation grant accepts lat/lng/accuracy, auto-deny non-geolocation)
Execution Control: Pause/resume JS execution with virtual time for deterministic state
Wait: Duration-based wait with action envelope
History: Session, action, and event history with SQLite storage
Input Mode: Toggle between agent (ABP-controlled), human (user-controlled), and cdp (remote debugging) input modes via API or toolbar icon. Human mode allows direct user interaction (e.g., for authentication), suspends execution control, and shows a yellow gradient border overlay.
Browser Management: Status check, graceful shutdown
Console Capture: In-memory 5000-entry FIFO buffer capturing console.log/warn/error, CORS errors, CSP violations, uncaught exceptions via WebContentsObserver (non-fingerprintable, no CDP)
MCP Server: Embedded MCP (JSON-RPC over HTTP) with 20 tools at /mcp

Architecture

┌─────────────────────────────────────────────┐
│         HTTP Client (curl/agent/MCP)        │
└─────────────────┬───────────────────────────┘
                  │ GET/POST /api/v1/* or /mcp
                  ▼
┌─────────────────────────────────────────────┐
│  AbpHttpServer (IO thread)                  │
│  - net::HttpServer on localhost:8222        │
│  - Routes REST + MCP requests               │
└─────────────────┬───────────────────────────┘
                  │ PostTask to UI thread
                  ▼
┌─────────────────────────────────────────────┐
│  AbpController (UI thread)                  │
│  - Direct access to Browser, TabStripModel  │
│  - Uses DevToolsAgentHost for CDP commands  │
│  - AbpActionContext for action lifecycle    │
│  - AbpInputDispatcher for native input      │
│  - AbpEventObserver for CDP event streams   │
│  - AbpEventCollector for action events      │
├─────────────────────────────────────────────┤
│  AbpMcpHandler - Embedded MCP server        │
│  AbpHistoryController - Session/action log  │
│  AbpDownloadObserver - Download tracking    │
└─────────────────────────────────────────────┘

Project Structure

ABP Source Code

chrome/browser/abp/
├── BUILD.gn                     # Build configuration
├── abp_switches.h/cc            # --abp-port, --abp-session-dir flags
├── abp_http_server.h/cc         # HTTP server (IO thread)
├── abp_controller.h/cc          # Request handler + CDP client (UI thread)
├── abp_action_context.h/cc      # Action lifecycle (pause/resume/screenshot)
├── abp_input_dispatcher.h/cc    # Native input dispatch (click/scroll/keys)
├── abp_location_provider.h/cc   # Mock geolocation provider (coordinates set via permission grant)
├── abp_system_geolocation_source.h/cc # Bypasses macOS system location dialog
├── abp_permission_observer.h/cc # Permission prompt interception
├── abp_event_observer.h/cc      # CDP event client per tab
├── abp_event_collector.h/cc     # Collects events during actions
├── abp_mcp_handler.h/cc         # Embedded MCP server (JSON-RPC over HTTP)
├── abp_tool_builder.h/cc        # MCP tool schema builder
├── abp_history_controller.h/cc  # Session/action history API
├── abp_history_database.h/cc    # SQLite history storage
├── abp_download_observer.h/cc   # Download tracking
├── abp_console_capture.h/cc     # Console message capture (WebContentsObserver)
├── abp_config.h/cc              # Runtime configuration
└── abp_types.h                  # Shared type definitions

Design Documentation

plans/
├── README.md                    # Overview
├── agent-browser-protocol.md    # Core ABP architecture
├── API.md                       # Full REST API specification
├── mcp.md                       # MCP server specification
└── implementation.md            # Minimal implementation plan

Build Setup

Prerequisites

depot_tools (Google's build toolchain):

git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git ~/depot_tools
echo 'export PATH="$HOME/depot_tools:$PATH"' >> ~/.bashrc
source ~/.bashrc

Build dependencies (Ubuntu/Debian):

sudo ./build/install-build-deps.sh --no-prompt

Directory Structure

/home/paladin/src/
├── .gclient           # gclient configuration
├── src -> chromium    # symlink required by gclient
└── chromium/          # source code
    ├── out/Default/   # build output
    ├── chrome/browser/abp/  # ABP implementation
    ├── tools/abp-mcp-server/  # MCP server
    └── plans/         # Design docs

Sync Dependencies

cd /home/paladin/src
gclient sync --no-history

Configure Build

Debug component build (faster incremental builds):

cd /home/paladin/src/src
gn gen out/Default --args='is_debug=true is_component_build=true symbol_level=1 dcheck_always_on=true'

Build Chromium

cd /home/paladin/src/src
autoninja -C out/Default chrome

First build: ~4-6 hours. Incremental builds: seconds to minutes.

Running ABP

Start ABP

./out/Default/ABP.app/Contents/MacOS/ABP --abp-session-dir=sessions/$(date +%Y%m%d_%H%M%S)

Session data (database and screenshots) will be stored in sessions/<timestamp>/.

To use the default /tmp/abp-<UUID>/ directory instead:

./out/Default/ABP.app/Contents/MacOS/ABP

REST API Examples

# Check browser readiness
curl http://localhost:8222/api/v1/browser/status

# List all tabs
curl http://localhost:8222/api/v1/tabs

# Create new tab
curl -X POST http://localhost:8222/api/v1/tabs \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com"}'

# Navigate existing tab
curl -X POST http://localhost:8222/api/v1/tabs/{tab_id}/navigate \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com"}'

# Take screenshot with element markup
curl -X POST http://localhost:8222/api/v1/tabs/{tab_id}/screenshot \
  -H "Content-Type: application/json" \
  -d '{"screenshot":{"markup":"interactive","format":"webp"}}'

# Binary screenshot (returns image/webp)
curl http://localhost:8222/api/v1/tabs/{tab_id}/screenshot?markup=interactive -o screenshot.webp

# Click at coordinates
curl -X POST http://localhost:8222/api/v1/tabs/{tab_id}/click \
  -H "Content-Type: application/json" \
  -d '{"x":100,"y":200}'

# Type text
curl -X POST http://localhost:8222/api/v1/tabs/{tab_id}/type \
  -H "Content-Type: application/json" \
  -d '{"text":"hello world"}'

# Press key combo (Ctrl+A)
curl -X POST http://localhost:8222/api/v1/tabs/{tab_id}/keyboard/press \
  -H "Content-Type: application/json" \
  -d '{"key":"a","modifiers":["Control"]}'

# Scroll down at coordinates
curl -X POST http://localhost:8222/api/v1/tabs/{tab_id}/scroll \
  -H "Content-Type: application/json" \
  -d '{"x":500,"y":400,"delta_y":300}'

# Execute JavaScript (note: parameter is "script", not "expression")
curl -X POST http://localhost:8222/api/v1/tabs/{tab_id}/execute \
  -H "Content-Type: application/json" \
  -d '{"script":"document.title"}'

# Get page text
curl -X POST http://localhost:8222/api/v1/tabs/{tab_id}/text \
  -H "Content-Type: application/json" \
  -d '{}'

# Close tab
curl -X DELETE http://localhost:8222/api/v1/tabs/{tab_id}

MCP Server

The MCP server is embedded directly in Chrome—no separate process needed. It's available at /mcp on the same port as the REST API.

Configure in Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "browser": {
      "transport": "streamable-http",
      "url": "http://localhost:8222/mcp"
    }
  }
}

Test the MCP endpoint:

# Initialize MCP session
curl -X POST http://localhost:8222/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","clientInfo":{"name":"test","version":"1.0"},"capabilities":{}}}'

# List available tools
curl -X POST http://localhost:8222/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'

API Reference

See plans/API.md for the complete REST API specification. All endpoints:

Method	Path	Description
Browser
GET	`/api/v1/browser/status`	Get browser readiness status
GET	`/api/v1/browser/session-data`	Get session data file paths
POST	`/api/v1/browser/shutdown`	Graceful shutdown
GET	`/api/v1/browser/input-mode`	Get current input mode (agent/human)
POST	`/api/v1/browser/input-mode`	Set input mode (toggles human/agent control)
POST	`/api/v1/browser/cdp-mode/enter`	Enter CDP mode (starts remote debugging server)
POST	`/api/v1/browser/cdp-mode/exit`	Exit CDP mode (returns control to ABP)
Tabs
GET	`/api/v1/tabs`	List all tabs
GET	`/api/v1/tabs/{id}`	Get tab details
POST	`/api/v1/tabs`	Create new tab
DELETE	`/api/v1/tabs/{id}`	Close tab
POST	`/api/v1/tabs/{id}/activate`	Switch to tab
POST	`/api/v1/tabs/{id}/stop`	Stop loading
Navigation
POST	`/api/v1/tabs/{id}/navigate`	Navigate to URL
POST	`/api/v1/tabs/{id}/reload`	Reload page
POST	`/api/v1/tabs/{id}/back`	Go back
POST	`/api/v1/tabs/{id}/forward`	Go forward
Mouse
POST	`/api/v1/tabs/{id}/click`	Click at coordinates
POST	`/api/v1/tabs/{id}/move`	Mouse move
POST	`/api/v1/tabs/{id}/scroll`	Scroll (mouse wheel)
Keyboard
POST	`/api/v1/tabs/{id}/type`	Type text
POST	`/api/v1/tabs/{id}/keyboard/press`	Press key combo
POST	`/api/v1/tabs/{id}/keyboard/down`	Key down
POST	`/api/v1/tabs/{id}/keyboard/up`	Key up
Screenshots
GET	`/api/v1/tabs/{id}/screenshot`	Binary WebP screenshot
POST	`/api/v1/tabs/{id}/screenshot`	Screenshot via action envelope
Content
POST	`/api/v1/tabs/{id}/execute`	Execute JavaScript
POST	`/api/v1/tabs/{id}/text`	Get page text
Wait
POST	`/api/v1/tabs/{id}/wait`	Wait for duration
Dialogs
GET	`/api/v1/tabs/{id}/dialog`	Get pending dialog
POST	`/api/v1/tabs/{id}/dialog/accept`	Accept dialog
POST	`/api/v1/tabs/{id}/dialog/dismiss`	Dismiss dialog
Execution Control
GET	`/api/v1/tabs/{id}/execution`	Get execution state
POST	`/api/v1/tabs/{id}/execution`	Set execution state
Downloads
GET	`/api/v1/downloads`	List downloads
GET	`/api/v1/downloads/{id}`	Get download status
POST	`/api/v1/downloads/{id}/cancel`	Cancel download
File Chooser
POST	`/api/v1/file-chooser/{id}`	Provide files to dialog
Popups
POST	`/api/v1/select/{id}`	Respond to select popup
Permissions
GET	`/api/v1/permissions`	List pending permission requests
POST	`/api/v1/permissions/{id}/grant`	Grant permission (requires permission_type; geolocation requires lat/lng)
POST	`/api/v1/permissions/{id}/deny`	Deny permission (requires permission_type)
Console
GET	`/api/v1/console`	Query console messages (level, pattern, tab_id, limit, after_id)
DELETE	`/api/v1/console`	Clear console buffer (optional tab_id filter)
Network
GET	`/api/v1/network`	Query saved network calls with regex filters
POST	`/api/v1/network/save`	Retroactively tag & persist in-memory buffer
DELETE	`/api/v1/network`	Clear saved calls (by tag or all)
POST	`/api/v1/tabs/{id}/curl`	Execute HTTP request using tab's session
History
GET	`/api/v1/history/sessions`	List sessions
GET	`/api/v1/history/sessions/current`	Get current session
GET	`/api/v1/history/sessions/{id}`	Get session by ID
GET	`/api/v1/history/sessions/{id}/export`	Export session
GET	`/api/v1/history/actions`	List actions
GET	`/api/v1/history/actions/{id}`	Get action by ID
GET	`/api/v1/history/actions/{id}/screenshot`	Get action screenshot
DELETE	`/api/v1/history/actions`	Delete actions
GET	`/api/v1/history/events`	List events
GET	`/api/v1/history/events/{id}`	Get event by ID
DELETE	`/api/v1/history/events`	Delete events
DELETE	`/api/v1/history`	Delete all history
MCP
POST	`/mcp`	MCP JSON-RPC endpoint (20 tools, includes `cdp_mode`)

Development Notes

Source is at /home/paladin/src/chromium (symlinked as src for gclient)
Build output at out/Default/
Use autoninja (not ninja) for automatic parallelism
Run gclient sync after pulling changes to update dependencies
ABP uses CDP (Chrome DevTools Protocol) internally for JS evaluation and debugger control
Mouse/keyboard input is dispatched natively via RenderWidgetHost (bypasses CDP)
Screenshots use ForceRedraw + GrabViewSnapshot (not CDP Page.captureScreenshot)
Tab IDs are DevToolsAgentHost IDs (stable for the session)
Execution control uses Debugger.pause/resume + Emulation.setVirtualTimePolicy for deterministic state
Session history is stored in SQLite via AbpHistoryDatabase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent Browser Protocol - Chromium Fork

Current Implementation Status

Working Features

Architecture

Project Structure

ABP Source Code

Design Documentation

Build Setup

Prerequisites

Directory Structure

Sync Dependencies

Configure Build

Build Chromium

Running ABP

Start ABP

REST API Examples

MCP Server

API Reference

Development Notes

Uh oh!

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Agent Browser Protocol - Chromium Fork

Current Implementation Status

Working Features

Architecture

Project Structure

ABP Source Code

Design Documentation

Build Setup

Prerequisites

Directory Structure

Sync Dependencies

Configure Build

Build Chromium

Running ABP

Start ABP

REST API Examples

MCP Server

API Reference

Development Notes