From a9a69778e65b9a7ed4be6001303260ce05d7b59f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Magnus=20M=C3=BCller?= Date: Sun, 10 May 2026 15:45:54 +0000 Subject: [PATCH] docs: add self-modifying harness edge benchmark --- README.md | 5 + docs/edge-case-benchmark.html | 328 +++++++++++++++++++++++++++++++++ docs/self-modifying-harness.md | 148 +++++++++++++++ 3 files changed, 481 insertions(+) create mode 100644 docs/edge-case-benchmark.html create mode 100644 docs/self-modifying-harness.md diff --git a/README.md b/README.md index ab7f6a5d..399bdcd5 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,11 @@ Click Allow when the per-attach popup appears (Chrome 144+): See [agent-workspace/domain-skills/](agent-workspace/domain-skills/) for example tasks. +For the core self-healing pattern, see +[docs/self-modifying-harness.md](docs/self-modifying-harness.md). It includes +four concrete edge cases and a standalone benchmark page for canvas signature, +file upload, drag-and-drop, and coordinate-click tasks. + ## Free Browser Use Cloud browsers Stealth, sub-agents, or headless deployment.
diff --git a/docs/edge-case-benchmark.html b/docs/edge-case-benchmark.html new file mode 100644 index 00000000..3abf7051 --- /dev/null +++ b/docs/edge-case-benchmark.html @@ -0,0 +1,328 @@ + + + + + + Browser Harness Edge-case Benchmark + + + +
+
+

Browser Harness Edge-case Benchmark

+

Four browser mechanics that usually force an agent to inspect, patch, retry, and verify.

+
+ +
+
+

1. Canvas signature

+ +
Draw a stroke across the signature pad.
+
+ +
+

2. File upload

+
+ +
+
Upload any local file.
+
+ +
+

3. Drag and drop

+
+
DRAG ME
+
Drop here
+
+
Move the card into the drop zone.
+
+ +
+

4. Coordinate target

+
+ +
+
Click the red target by visible coordinates.
+
+
+ +
+
+ + + + diff --git a/docs/self-modifying-harness.md b/docs/self-modifying-harness.md new file mode 100644 index 00000000..11ae13b7 --- /dev/null +++ b/docs/self-modifying-harness.md @@ -0,0 +1,148 @@ +# Self-modifying browser harness + +Browser Harness is intentionally thin: it gives the agent direct Chrome +DevTools Protocol access plus an editable workspace. When a browser task hits a +missing mechanic, the agent should add the smallest helper or skill that makes +the task work, use it, and keep that reusable code in `agent-workspace/`. + +That changes how to think about edge cases. A signature pad, a canvas-only UI, +a custom drag target, or a hidden file input is not a permanent product limit. +It is a prompt to inspect the page, patch the harness, retry, and save the +working path. + +## The loop + +1. Reproduce the blocked interaction in the browser. +2. Inspect with screenshots first, then DOM and raw CDP only when needed. +3. Add a focused helper in `agent-workspace/agent_helpers.py` or a durable + site note in `agent-workspace/domain-skills//`. +4. Run the helper against the real page. +5. Verify with a screenshot or page-state read. +6. Keep the helper small enough that the next agent can understand and edit it. + +Core helpers stay generic. Site-specific selectors, timing, and private API +knowledge belong in the agent workspace or domain skills. + +## Example: signature or canvas field + +Problem: there is no real `` to fill. The site expects pointer events on +a canvas. + +Patch shape: + +```python +def draw_signature_on_canvas(selector, points): + box = js(f""" + (() => {{ + const c = document.querySelector({selector!r}); + const r = c.getBoundingClientRect(); + return {{x:r.left, y:r.top, w:r.width, h:r.height}}; + }})() + """) + for i, (x, y) in enumerate(points): + event_type = "mousePressed" if i == 0 else "mouseMoved" + cdp("Input.dispatchMouseEvent", type=event_type, + x=box["x"] + x, y=box["y"] + y, button="left", clickCount=1) + cdp("Input.dispatchMouseEvent", type="mouseReleased", + x=box["x"] + points[-1][0], y=box["y"] + points[-1][1], + button="left", clickCount=1) +``` + +Use screenshot coordinates to choose the visible stroke path, then verify by +reading the page state or taking another screenshot. + +## Example: file upload + +Problem: the visible button opens an OS picker, which an agent cannot use +directly. + +Patch shape: + +```python +def upload_visible_or_hidden_file(selector, path): + upload_file(selector, path) + js(f""" + (() => {{ + const input = document.querySelector({selector!r}); + input.dispatchEvent(new Event("input", {{bubbles:true}})); + input.dispatchEvent(new Event("change", {{bubbles:true}})); + }})() + """) +``` + +Prefer `DOM.setFileInputFiles` through `upload_file()`. If the file input is +created lazily, first click the visible upload button, wait for the input, then +set the file. + +## Example: drag and drop + +Problem: the site uses custom drag events or a drop zone that does not respond +to a simple click. + +Patch shape: + +```python +def drag_center_to_center(source_selector, target_selector): + boxes = js(f""" + (() => {{ + const s = document.querySelector({source_selector!r}).getBoundingClientRect(); + const t = document.querySelector({target_selector!r}).getBoundingClientRect(); + return {{ + sx: s.left + s.width / 2, sy: s.top + s.height / 2, + tx: t.left + t.width / 2, ty: t.top + t.height / 2 + }}; + }})() + """) + cdp("Input.dispatchMouseEvent", type="mousePressed", + x=boxes["sx"], y=boxes["sy"], button="left", clickCount=1) + cdp("Input.dispatchMouseEvent", type="mouseMoved", + x=boxes["tx"], y=boxes["ty"], button="left") + cdp("Input.dispatchMouseEvent", type="mouseReleased", + x=boxes["tx"], y=boxes["ty"], button="left", clickCount=1) +``` + +If compositor-level movement does not trigger the app, inspect whether the app +expects HTML5 `DataTransfer` events and add a DOM-specific helper for that +site. + +## Example: coordinate-only target + +Problem: a visible control has no stable selector, sits inside a canvas, or is +inside cross-origin UI where DOM inspection is the wrong tool. + +Patch shape: + +```python +def click_visible_point(x, y): + click_at_xy(x, y) + wait(0.2) + capture_screenshot() +``` + +Use `capture_screenshot()` to locate the visible target. Keep the coordinate in +the task script, not in a public domain skill, unless the layout is fixed and +the skill also records viewport assumptions. + +## Local benchmark + +`docs/edge-case-benchmark.html` is a standalone page that exercises the four +mechanics above: + +- canvas signature +- file upload +- drag and drop +- coordinate click + +Open it with Browser Harness when changing helpers: + +```bash +browser-harness -c ' +new_tab("file:///absolute/path/to/docs/edge-case-benchmark.html") +wait_for_load() +print(page_info()) +' +``` + +The page exposes `window.edgeBenchmark.results` for quick verification from +`js(...)`. +