Add ui-test skill — adversarial UI testing with browse CLI#56
Open
Add ui-test skill — adversarial UI testing with browse CLI#56
Conversation
Builds on #52 with three key additions: 1. Local/remote mode selection — localhost uses local browser (no API key), deployed sites use Browserbase via cookie-sync for authenticated testing 2. Diff-driven testing — analyze git diff, generate targeted tests for what changed, execute with before/after snapshot comparison 3. Structured assertion protocol — STEP_PASS/STEP_FAIL markers with evidence, deterministic checks (axe-core, console errors, overflow detection), and adversarial testing patterns (XSS, empty submit, rapid click, keyboard-only) Smoke-tested against a local Next.js app: found real bugs (Escape not closing modals, undersized mobile touch targets) that confirmed the adversarial patterns work. Fixed browse eval recipes (no top-level await, console capture on-page not about:blank). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ssions Enables concurrent test execution by leveraging browse CLI's --session flag to spin up independent Browserbase browsers per test group, with fan-out via Agent tool and merged result reporting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents how to add Bash(browse:*) to project or user settings so users don't get prompted on every browse snapshot/click/eval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t figure it out - Remove .ui-tests/suite.yml format and generation pipeline - Replace Workflow B (8-step codebase analysis) with lightweight exploratory testing - Simplify references/codebase-analysis.md to quick hints (framework detection, route finding) - Remove example YAML suite file - Update README to reflect no-artifacts philosophy - Drop Write tool from allowed-tools (no files to generate) The codegen/suite approach can ship as v2 later. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- XSS check: replace false-positive inline script count with input value check - Console capture: preserve original console.error in Examples 6 snippets - Form labels: use native i.labels API in browser-recipes.md (matches SKILL.md) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Also strengthens auto-select rule: localhost → browse env local, deployed URLs → browse env remote, applied consistently across all workflows including parallel sessions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove "remote only" restriction — named sessions work with local mode - Add BROWSE_SESSION=* permission pattern to avoid approval fatigue on parallel runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
If references/design-system.md exists, use it as ground truth. Otherwise, screenshot 2-3 existing pages to establish baseline patterns (spacing, radii, colors, typography) and compare the changed page against them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a test step fails, the skill now instructs the agent to take a screenshot and save it to .context/ui-test-screenshots/<step-id>.png, referenced in the STEP_FAIL marker and final report so developers can see exactly what went wrong. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extracted from browserbase-brand-guidelines skill: colors, typography, border radii, spacing grid, component patterns, and visual principles. The ui-test skill checks changed pages against this when it exists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite Budget & Limits section to use a coordinator/sub-agent model: main agent plans and delegates, sub-agents do the actual testing with a 20-step hard cap each. Wall clock target ~10 min for default runs.
e563ad6 to
dc1eedd
Compare
Integrates two external frameworks into the testing skill: - Judgement (Emil Kowalski + Josh Puckett + UI Wiki): 9 reference files covering animations, forms, touch/a11y, typography, polish, component design, marketing, performance, and 152 UI wiki rules. Adds deterministic eval checks for touch targets, iOS zoom, transition:all, z-index abuse, and form labels. Adds screenshot-based critique methodology. - Luck (soleio): Assembly Theory meta-evaluation lens — 7 facets adapted to UI (solvency, gradient coupling, compatibility, niche construction, circulation, integration, path sensitivity) for "will this UI thrive?" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip out the Craft Quality Judgement section, Luck Lens meta-evaluation, and all references/judgement/ files. Keep the skill focused on functional testing, accessibility, and UX heuristics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0cfbc08 to
d94fa1b
Compare
Rename design-system.md → design-system.example.md with instructions for users to copy and fill in their own brand tokens. The skill reads design-system.md (user-created), not the example. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add browse wait timeout 3000 after axe-core script injection (SKILL.md, browser-recipes.md) - Fix form label check to include aria-label and aria-labelledby (SKILL.md) - Fix focus ring detection to check box-shadow too, not just outline (browser-recipes.md) - Fix window.__capturedErrors → window.__logs in Example 8 (EXAMPLES.md) - design-system.md already fixed in prior commit (renamed to .example.md) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add aria-labelledby to hasLabel check in browser-recipes.md - Add browse wait timeout 3000 after axe-core injection in Examples 4 and 7 - hasFocus box-shadow check was already fixed in prior commit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Adds a
ui-testskill that uses thebrowseCLI to run AI-powered adversarial UI tests in a real browser. The agent analyzes git diffs to test only what changed (or explores the full app), checking functional correctness, accessibility, responsive layout, and UX heuristics.Key features:
git diff→ targeted tests for changed pages/componentsSTEP_PASS|id|evidence/STEP_FAIL|id|expected → actualwith screenshot evidenceFiles
Test plan
🤖 Generated with Claude Code
Note
Low Risk
Low risk because this PR only adds new documentation/skill content and does not modify runtime application logic. The main risk is maintenance/expectation mismatch if the documented
browse/Browserbase behaviors change.Overview
Introduces a new
skills/ui-testskill that standardizes agentic UI QA in a real browser via thebrowseCLI, including diff-driven, exploratory, and parallel (multi-session) workflows with a strict before/after assertion protocol.Adds extensive supporting docs: worked end-to-end examples (
EXAMPLES.md), deterministic check recipes (axe-core, console errors, broken images, responsive/overflow/touch-target checks), and UX/a11y heuristic reference material, plus MIT licensing and install/usage guidance.Written by Cursor Bugbot for commit 28e875a. This will update automatically on new commits. Configure here.