Releases · theredsix/agent-browser-protocol

28 Mar 23:36

theredsix

v0.1.10

cafce8a

v0.1.10 Latest

Latest

v0.1.10

Added CDP mode to allow commercial CAPTCHA solvers to take over when necessary

Full Changelog: v0.1.20...v0.1.10

Assets 5

25 Mar 06:41

theredsix

v0.1.9

51eda22

v0.1.9

Multi-Instance Support

Automatic port detection — When no --port or ABP_PORT is specified, ABP now automatically finds
an available TCP port starting at 15678, probing up to 100 candidates. This allows launching
multiple ABP instances without manual port coordination.
Isolated user data directories — Each launch now always sets --user-data-dir to a unique temp
directory (/tmp/abp-*) when not explicitly provided. Previously, omitting --user-data-dir caused
Chrome's single-instance lock to silently exit if another instance was already running.
New exports — findAvailablePort() and DEFAULT_START_PORT are now exported from the package for
programmatic use.

MCP Proxy Improvements

Removed stale instance detection — The MCP proxy (mcp-proxy.ts) no longer probes for an existing
ABP instance on startup. It always launches a fresh, isolated browser, eliminating race conditions
when multiple MCP clients start simultaneously.
Correct port forwarding — MCP requests now use the actual resolved port from the launched browser
rather than the configured default, fixing failures when auto-port-detection selects a non-default
port.

CLI Changes

Port is now optional in --port flag (default: auto-detect starting at 15678)
Startup log now shows "Finding available port..." when no explicit port is set
API/MCP URLs printed after launch reflect the actual port used

Full Changelog: v0.1.8...v0.1.9

Assets 5

19 Mar 04:42

theredsix

v0.1.8

38aec89

v0.1.8

New Features

Animation wait support for browser_wait — New animation parameter lets agents wait for CSS/DOM
animations to complete after network activity settles, improving reliability on animation-heavy
pages.

Bug Fixes

Fixed debugger hang during cross-origin navigation — Navigation actions (navigate, reload, back,
forward) now send Debugger.disable before executing when the debugger is paused, preventing the
renderer main thread from blocking on beforeunload handlers during cross-process navigation.
Fixed use-after-free in error response handling — Cleanup was happening before the error response
was fully sent; reordered to prevent crash.
Linux rendering validation — ABP now detects the unsupported --disable-gpu +
--disable-software-rasterizer flag combination on Linux and exits with a clear error message instead
of producing blank screenshots silently.

Improvements

Linux packaging — Added optional bundling of vk_swiftshader_icd.json, WidevineCdm, MEIPreload, and
other runtime files for more complete Linux distributions.
Linux troubleshooting docs — Added guidance on common IBUS warnings, EGL errors, and GPU driver
fallback configuration to MANUAL_INSTALL.md.

Full Changelog: v0.1.6...v0.1.8

Assets 5

14 Mar 07:14

theredsix

v0.1.7

52ef26c

v0.1.7

agent-browser-protocol v0.1.7

Console Capture

In-memory 5000-entry FIFO ring buffer capturing console.log/warn/error, CORS errors, CSP
violations, and uncaught exceptions
Browser-process WebContentsObserver implementation — no CDP dependency, non-fingerprintable
REST API: GET /api/v1/console with level, pattern (RE2 regex), tab_id, limit, and after_id
pagination
REST API: DELETE /api/v1/console to clear buffer (optional tab_id filter)
MCP: new browser_console tool (19th tool) with query and clear support
Renderer fix: CORS and CSP violation messages now forwarded to browser process

Animation Wait

New animation parameter on browser_wait MCP tool and action envelope
Configurable post-network animation settle time wired through action lifecycle
Dedicated OnAnimationWaitTimeElapsed handler ensures CSS/JS animations complete before
screenshot capture

Bug Fixes

UA client hints fingerprinting: Sec-CH-UA and navigator.userAgentData.brands now include
"Google Chrome" brand entry, matching a standard Chrome build
Build fix: out-of-line constructors for Options struct to avoid linker issues

Full Changelog: v0.1.6...v0.1.7

Assets 5

09 Mar 18:00

theredsix

v0.1.6

c4e196c

v0.1.6

agent-browser-protocol v0.1.6

Human Input Mode

Agent/human toggle via REST API (/browser/input-mode) and toolbar icon in the address bar
Yellow gradient border overlay when in human mode for clear visual feedback
Robot/human icon in omnibox toggles mode on click, with yellow icon color in human mode
MCP tools blocked during human mode to prevent interference with user interaction
Replaces deprecated --allow-system-inputs flag

Network Capture

Per-tab in-memory ring buffer (1000 requests) capturing CDP Network events
SQLite-backed persistent storage with tagging via POST /network/save
Query saved and in-memory buffer results via GET /network with regex filters
Document type added to default capture types
network_tag parameter on all action tools for automatic tagging

Session-aware Curl

POST /tabs/{id}/curl executes HTTP requests using the tab's cookies and session state
Works while JS execution is paused

MCP Tools

New browser_network tool (query, save, clear network captures)
New browser_curl tool (session-aware HTTP requests)

NPM SDK

New methods: slider(), clearText(), batch(), waitForNetwork(), permissions(), selectPopup(),
download content
New launch options: zoom, configFile, disablePause, allowSystemInputs
Updated default timing: 150ms action delay, 350ms screenshot delay
Fix: permissions.list() now correctly unwraps {permissions: [...]} envelope
CLI flags: --zoom, --config, --disable-pause, --allow-system-inputs passed through MCP proxy

Bug Fixes

Fix human mode toggle not switching back to agent mode (observer registration timing)
Fix yellow border overlay not appearing immediately on mode switch
Fix history use-after-free when recording errors
Fix curl max body size to 5MB (SimpleURLLoader limit)
Fix network MCP tool URL encoding and header handling

Full Changelog: v0.1.5...v0.1.6

Assets 5

04 Mar 21:31

theredsix

v0.1.5

327fdbc

v0.1.5

ABP v0.1.5 Release Notes

Multi-Scroll with Intermediate Screenshots

browser_scroll now accepts an array of scroll steps instead of a single delta. Each step captures an intermediate
screenshot, giving agents visual feedback as content scrolls into view. A mouse move is dispatched before wheel
events for accurate scroll targeting.

New scrolls array parameter with {delta_px, direction} per step
Intermediate screenshots returned as MCP image blocks
Mouse move event dispatched to scroll target before wheel events

browser_wait Tool

New browser_wait MCP tool that waits for network activity to settle before returning. Tracks same-site in-flight
requests persistently per tab to detect when a page has finished loading dynamic content.

5s maximum wait with 150ms pre-wait and 350ms post-settle window
Persistent per-tab network request tracking
New REST endpoint: POST /api/v1/tabs/{id}/wait

Frozen Screenshot Lifecycle

Action lifecycle reordered to freeze execution before capturing screenshots. Markup overlays now persist across the
pause boundary, so the screenshot always reflects the markup the agent will see.

ForceRedrawForTab and CleanupMarkupForTab extracted for per-tab control
Force-redraw and pause profiling added for performance visibility

Multi-Tab Execution Control

Execution control now handles multiple tabs correctly. When switching tabs, the previously active tab is
backgrounded and its pause/resume lifecycle is suspended, preventing CDP callbacks from interfering with the active
tab.

backgrounded flag on TabState gates execution control
Action lifecycle skips pause when tab is backgrounded
Browser tests added for multi-tab handoff

Action IDs

Every action response now includes a unique action_id string, making it easy to correlate actions with history
entries and screenshots.

String-based action IDs (replacing integer IDs in history)
action_id included in both success and error responses

browser_clear_text Tool

New MCP tool to clear text from input fields — select-all + delete in a single action.

NPM Package: User Profile Options

The @anthropic-ai/abp npm package now supports custom user data directories, profile directories, and user agent
strings.

New CLI flags: --user-data-dir, --profile-directory, --user-agent
Environment variables: ABP_USER_DATA_DIR, PROFILE_DIRECTORY, USER_AGENT
LaunchOptions API: userDataDir, profileDirectory, userAgent

Bug Fixes

Fixed custom app icon missing from release builds
Improved drag/slider timing (50 steps over 500ms)
Made macOS notarization more resilient
Fixed Linux release script
Fixed unsafe buffer access in GenerateActionId

Documentation

Added ABP-specific license file
Updated README and quickstart docs

Full Changelog: v0.1.4...v0.1.5

Assets 5

25 Feb 07:44

theredsix

v0.1.4

9d75fac

v0.1.4

ABP v0.1.4 Release Notes

Permission Interception

Full permission prompt interception — ABP now intercepts browser permission requests (geolocation,
notifications, camera, etc.) instead of letting them block the page. New REST endpoints and MCP tools
allow agents to grant or deny permissions programmatically.

AbpPermissionObserver intercepts permission prompts at the engine level
Granting geolocation accepts latitude, longitude, and accuracy parameters
AbpLocationProvider provides mock geolocation coordinates — no OS-level location dialog
New endpoints: GET /api/v1/permissions, POST /api/v1/permissions/{id}/grant, POST
/api/v1/permissions/{id}/deny
New MCP tool: respond_to_permission

Configurable Wait Timings

Action wait timings are now configurable via CLI flags, environment variables, or the SDK.

--min-wait — minimum wait time after actions (default 500ms)
--tracking-timeout — network/DOM tracking timeout (default 3000ms)
--post-settle — post-settle delay (default 500ms)
Also configurable via ABP_MIN_WAIT, ABP_TRACKING_TIMEOUT, ABP_POST_SETTLE env vars
SDK LaunchOptions now exposes these as typed options

Human-Speed Typing

Keyboard type action now types at a more human-like speed to avoid breaking autocomplete and other
input-sensitive page behaviors.

ABP Branding

Custom ABP logo replaces default Chromium icons across all platforms
Icon generation script at tools/abp/

Bug Fixes

Fixed Environment::GetVar API usage and Options type include
Fixed Windows packaging — correct exe name, added manifests
Include chrome_crashpad_handler in Linux and Windows packages
Added mouse_drag fallback guidance to browser_slider tool description

Documentation

Updated quickstart with Claude Code, Codex, and HTTP mode MCP paths
Fixed MCP tool count in docs (12/13 → 14)

Full Changelog: v0.1.3...v0.1.4

Assets 5

24 Feb 04:21

theredsix

v0.1.3

e4d72b1

v0.1.3

Added support for native elements Improved download and file upload handling Full Changelog: https://github.com/theredsix/agent-browser-protocol/compare/v0.1.2...v0.1.3

Assets 5

22 Feb 05:03

theredsix

v0.1.2

921c752

v0.1.2

Added browser_slider action
Improved release scripts

Full Changelog: v0.1.1...v0.1.2

Assets 5

21 Feb 03:13

theredsix

v0.1.1

055aa11

v0.1.1

What's New

ABP v0.1.1 is a major quality and usability release — 94 commits covering performance,
stability, a streamlined MCP surface, and cross-platform builds.

MCP Tool Consolidation

Consolidated 30 MCP tools down to 12 with a new dispatch table architecture. Simpler tool
surface, same full capability. The embedded MCP server is available at /mcp with no sidecar
process.

Batch Execution

New /api/v1/tabs/{id}/batch endpoint for executing multiple actions in a single request —
reduces round-trips for multi-step workflows.

Performance

WebP encoding with Lanczos3 scaling for smaller, sharper screenshots
CoreAnimation delay reduced from 167ms to 50ms
Compositor-based scroll position — eliminates JS round-trip
Parallel pause + markup opt-in — action latency significantly reduced

Screenshot Markup Overhaul

Markup enabled by default — screenshots include interactive element overlays out of the box
Composable markup tags — mix and match interactive, text, image, selected, etc. as an array
disable_markup parameter to opt out of specific tags instead of opting in
selected tag — highlights the element under the cursor

New Actions

Drag and drop — POST /api/v1/tabs/{id}/drag
NormalizeKey utility for consistent keyboard input handling across platforms
US keyboard layout for type action with proper virtual cursor scaling

Stability Fixes

Prevent renderer DCHECK crashes during rapid cross-origin navigation with debugger pause
Fix Performance::SuspendObserver idempotency to prevent Blink DCHECK crash
Event-driven ForceRedraw recovery during navigation (no more dropped screenshots)
Fix cross-process navigation renderer crashes
Suppress crash recovery popup and session restore on startup
Auto-reattach debug server when ABP relaunches with new session
Global monotonicity clamp for BeginFrameArgs frame_time

Developer Experience

Renamed binary from chrome/chromium to abp (Linux/Windows) and ABP.app (macOS)
ABP always starts — no more --enable-abp flag required
Fixed viewport — 1280x800 locked at 100% zoom, no user resize
Debug UI — HTML dashboard with action forms, history panel, and SSE live updates
agent-browser-protocol npm package with Claude Code plugin and stdio MCP proxy

Cross-Platform Builds

Linux build working — tools/abp/release-linux.sh
Windows build working — tools/abp/release-win.ps1
Release scripts auto-read version from package.json
Simplified archive naming: abp-{version}-{platform}-{arch}

Documentation

Rewritten README as MCP-first landing page
Standalone REST API reference
Standalone MCP server reference
Build-from-source guide for all platforms
Training data guide with SQLite schema and abp-debug

Full Changelog: 0.1.0...v0.1.1

Assets 5

Uh oh!

Releases: theredsix/agent-browser-protocol

v0.1.10

Uh oh!

v0.1.9

Uh oh!

v0.1.8

Uh oh!

v0.1.7

Uh oh!

v0.1.6

Uh oh!

v0.1.5

Uh oh!

v0.1.4

Uh oh!

v0.1.3

Uh oh!

v0.1.2

Uh oh!

v0.1.1

Uh oh!