From b10de32a32af70ff744cc3bcfa1442eda2c73070 Mon Sep 17 00:00:00 2001 From: bcode Date: Sat, 9 May 2026 00:44:07 +0000 Subject: [PATCH] docs(skills): polish BROWSER.md Way 1 and Way 2 from harness install.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bring the canonical Way 1/Way 2 facts from browser-use/browser-harness:install.md into BROWSER.md, translated to the session.connect() API. Surfaced during v0.1.0 testing where the agent was failing to connect locally. Way 1 - Explicit per-profile + sticky framing for the chrome://inspect checkbox. - Chrome 144+ popup-may-reappear caveat with concrete causes (daemon restart, browser restart, time elapsed, "Allow for N hours"). Tells the agent to expect this rather than treating a 403 on a previously working connection as a hard failure. - Failure-mode list mapping the two distinct error strings to the right user instruction. Way 2 - Per-platform launch command (Linux, macOS, Windows cmd, Windows PowerShell) so the agent can pick the right shell without guessing. - Default profileDir-based connect (reads DevToolsActivePort) instead of the ws:// URL form, which is more reliable. - Two precisions verbatim from harness install.md: 1. --user-data-dir must not be the platform default; Chrome 136+ silently no-ops the port flag in that case. Platform default paths listed for all three OSes. 2. Cookie encryption is bound to the original profile dir; copying a profile to a custom dir transfers bookmarks and extensions but not logins. Tells the agent to fall back to Way 1 if logins are needed instead of attempting the copy. - Fix old example: bare ws://host:port/devtools/browser without a UUID suffix does not work — Chrome's browser-level endpoint includes a per-process UUID. The wsUrl form is now scoped as an escape hatch with a working example. Top-of-Connecting decision rule (Way 1 for real-browser tasks, Way 2 for unattended automation, cloud opt-in only) now appears before the three Way blocks. Source: https://github.com/browser-use/browser-harness/blob/main/install.md +32 lines net (162 -> 194). No code touched; tests still 8 pass + 5 chrome-gated skip; resolveSkillsDir still substitutes {{SKILLS_DIR}} to zero literal matches in materialized BROWSER.md. --- packages/bcode-browser/skills/BROWSER.md | 52 +++++++++++++++++++----- 1 file changed, 42 insertions(+), 10 deletions(-) diff --git a/packages/bcode-browser/skills/BROWSER.md b/packages/bcode-browser/skills/BROWSER.md index 9a77d845d..6f3452652 100644 --- a/packages/bcode-browser/skills/BROWSER.md +++ b/packages/bcode-browser/skills/BROWSER.md @@ -13,31 +13,63 @@ Use the `browser_execute` tool to run JavaScript against a connected browser via ## Connecting -You always call `session.connect(...)` once at the start of your work. The `Session` is fresh on the first `browser_execute` call of an opencode session; subsequent calls reuse it. Three connection methods, in order of preference for typical tasks: +You always call `session.connect(...)` once at the start of your work. The `Session` is fresh on the first `browser_execute` call of an opencode session; subsequent calls reuse it. Three connection methods, in order of preference for typical tasks. -**Way 1 — connect to the user's running Chrome (real profile, popup-gated).** Best when the task involves the user's actual logged-in sites. +For most tasks where the agent acts on behalf of the user in their normal browser, use **Way 1**. For automation that runs without the user watching, or any case where popup interruptions are unacceptable, use **Way 2** or a cloud browser. Cloud is only used when the user opts in. + +**Way 1 — connect to the user's running Chrome (real profile, popup-gated).** Inherits the user's everyday Chrome logins, extensions, history, and bookmarks. Right choice when the task involves the user's actual logged-in sites. ```js // Auto-detect the most-recently-launched Chrome with remote debugging enabled. await session.connect() ``` -The user must have ticked "Allow remote debugging for this browser instance" once at `chrome://inspect/#remote-debugging` (sticky per-profile), and on Chrome 144+ click "Allow" on the in-browser popup at first attach. If `connect()` fails with a 403/permission message, ask the user to do this. To wait for the click instead of erroring fast, pass `{ profileDir: "/abs/path", timeoutMs: 30000 }`. +For this to work the user must have, **once**, navigated to `chrome://inspect/#remote-debugging` in their target Chrome and ticked "Allow remote debugging for this browser instance". This setting is per-profile and sticky: tick it once and it persists across every future Chrome launch of that profile. On Chrome 144 and later, the first attach also triggers an in-browser "Allow remote debugging?" popup that the user must click Allow on. The popup may reappear on later attaches under conditions that are not fully characterized — daemon restart, browser restart, time elapsed, version-dependent options like "Allow for N hours" — so be ready to ask the user to click Allow again if a previously working connection starts 403'ing. + +Failure modes and what they mean: + +- **`connect()` throws "No running browser with remote debugging detected"** — the checkbox at `chrome://inspect/#remote-debugging` has not been ticked in any running Chrome profile, or no Chrome is running. Ask the user to open their target Chrome and tick the box. +- **`connect()` throws with "403" / "permission" / "WS closed before open"** — the checkbox is ticked but the user hasn't clicked Allow on the popup yet. By default `connect()` errors fast (5s per candidate). To wait up to 30s for the click: pass `{ profileDir: "", timeoutMs: 30000 }`. Passing `profileDir` skips the OS scan and reads the WebSocket URL straight from `/DevToolsActivePort` — works on every Chrome version including 144+ which doesn't serve `/json/version`. + +**Way 2 — connect to a Chrome you (or the user) launched with a debug port (isolated profile, no popups, ever).** Right choice for unattended automation, or whenever popup interruptions are unacceptable. -**Way 2 — connect to a Chrome you (or the user) launched with a debug port (isolated profile, no popups).** Best for unattended automation. +Launch Chrome with `--remote-debugging-port= --user-data-dir=`: ```bash -# User runs this once (or you run it via the `bash` tool): +# Linux google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/bcode-chrome + +# macOS +"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ + --remote-debugging-port=9222 --user-data-dir=/tmp/bcode-chrome + +# Windows (cmd.exe) +"C:\Program Files\Google\Chrome\Application\chrome.exe" ^ + --remote-debugging-port=9222 --user-data-dir=C:\bcode-chrome + +# Windows (PowerShell) +& "C:\Program Files\Google\Chrome\Application\chrome.exe" ` + --remote-debugging-port=9222 --user-data-dir=C:\bcode-chrome ``` +Then connect to it from a snippet — pass the same `--user-data-dir` value as `profileDir` and `connect()` reads the live WebSocket URL out of `/DevToolsActivePort`: + +```js +await session.connect({ profileDir: "/tmp/bcode-chrome" }) // or "C:\\bcode-chrome" on Windows +``` + +Two precisions on the `--user-data-dir`: + +- **It must not be Chrome's platform default.** Chrome 136 and later silently no-op the `--remote-debugging-port` flag when `--user-data-dir` is the platform default, even if you pass it explicitly. The platform defaults are `%LOCALAPPDATA%\Google\Chrome\User Data` on Windows, `~/Library/Application Support/Google/Chrome` on macOS, `~/.config/google-chrome` on Linux. An empty or new path gives a fresh clean profile that Chrome will persist there across future launches. +- **You cannot reuse the user's everyday Chrome profile by copying its files into a custom directory.** Chrome will accept the flag and start, so it looks like it works — but cookies are encrypted under a key bound to the *original* directory and will not survive the copy. Bookmarks and extensions transfer; logged-in sessions do not. If you need the user's real logins, use Way 1. + +If you have a `wsUrl` directly (e.g. from `fetch("http://127.0.0.1:9222/json/version").then(r => r.json()).then(j => j.webSocketDebuggerUrl)`), you can also pass it as the escape hatch: + ```js -await session.connect({ wsUrl: "ws://127.0.0.1:9222/devtools/browser" }) -// or, if you know the profile dir: -await session.connect({ profileDir: "/tmp/bcode-chrome" }) +await session.connect({ wsUrl: "ws://127.0.0.1:9222/devtools/browser/" }) ``` -The `--user-data-dir` must NOT be Chrome's platform default (`%LOCALAPPDATA%\Google\Chrome\User Data` on Windows, `~/Library/Application Support/Google/Chrome` on macOS, `~/.config/google-chrome` on Linux) — Chrome 136+ silently no-ops the port flag in that case. +The bare `ws://host:port/devtools/browser` form (no UUID suffix) does not work — Chrome's browser-level endpoint includes a per-process UUID. Prefer `{ profileDir }` unless you specifically need the WS URL form. **Way 3 — provision and connect to a Browser Use cloud browser.** Best when the user can't see the browser, you need a clean profile, geo-located proxy, or fingerprint isolation. Read `{{SKILLS_DIR}}/cloud-browser.md` for the full pattern (provision, stop, swap profile/proxy). Briefly: @@ -158,5 +190,5 @@ Cache-bust (`?t=${Date.now()}`) is your responsibility: without it, edits to the - **`session.Page.navigate` hangs forever** → the page is showing a native dialog. Use `session.Page.handleJavaScriptDialog({ accept: true })` to dismiss. - **Selectors don't find elements that you can see** → likely an iframe or shadow DOM. Read `{{SKILLS_DIR}}/interaction-skills/iframes.md` or `shadow-dom.md`. - **Actions silently no-op** → the page is mid-load. After `Page.navigate`, await `session.waitFor("Page.loadEventFired")` before driving inputs. -- **Connection refused or 403 on connect()** → Chrome wasn't started with `--remote-debugging-port`, or the user hasn't clicked "Allow" on the remote-debugging prompt. Pass `{ profileDir, timeoutMs: 30000 }` to wait for the click, or fall back to Way 2. +- **Connection refused, 403, or `WS closed before open` on connect()** → see the Way 1 failure-mode list above. Most often: the `chrome://inspect/#remote-debugging` checkbox isn't ticked, or the Chrome 144+ "Allow remote debugging?" popup hasn't been clicked. Pass `{ profileDir, timeoutMs: 30000 }` to wait up to 30s for the click, or fall back to Way 2. - **Cloud `connect()` fails after a successful provision** → check that `cdp_url` came back in the POST response; some BU regions return `cdpUrl` (camelCase) — accept both. See `{{SKILLS_DIR}}/cloud-browser.md`.