You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Restructure budget system for sub-agent architecture
Rewrite Budget & Limits section to use a coordinator/sub-agent model:
main agent plans and delegates, sub-agents do the actual testing with
a 20-step hard cap each. Wall clock target ~10 min for default runs.
Copy file name to clipboardExpand all lines: skills/ui-test/SKILL.md
+49-14Lines changed: 49 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,24 +20,59 @@ Three workflows:
20
20
21
21
## Budget & Limits
22
22
23
-
Every test run has a budget. Finish and report within these limits — don't try to test everything.
23
+
Every test run has a budget. The main agent **coordinates** — it analyzes the diff, plans test groups, and fans out work to sub-agents. Sub-agents do the actual testing.
24
24
25
-
| Limit | Default | Notes |
26
-
|-------|---------|-------|
27
-
|**Max test steps**| 25 per run | Each STEP_PASS/STEP_FAIL counts as one step |
28
-
|**Max steps per agent**| 15 | For parallel runs (Workflow C), each sub-agent gets 15 |
29
-
|**Stop-early on failures**| 5 failures | Once you hit 5 STEP_FAILs, stop testing and report what you have |
30
-
|**Max pages to visit**| 8 | Don't crawl the entire app — focus on changed or high-value pages |
25
+
### Time math
31
26
32
-
**How to stay within budget:**
33
-
- Prioritize ruthlessly. Test the riskiest changes first (forms, auth, data mutation), skip low-risk cosmetic checks if you're running out of steps.
34
-
- Count your steps as you go. When you're at 80% of your budget, wrap up the current test group and move to reporting.
35
-
- Prefer deterministic checks (axe-core, console errors) over manual exploration — they cover more ground per step.
36
-
- If the user provides a custom budget (e.g., "quick test" or "thorough test"), adjust: quick = 10 steps, thorough = 40 steps.
27
+
Each `browse` command takes ~30 seconds. A sub-agent doing 20 steps ≈ 10 min. The bottleneck is the **slowest sub-agent** — if one runs away, the whole run stalls waiting for it.
37
28
38
-
**At the end of every run, include a budget line in your report:**
29
+
### Budget structure
30
+
31
+
| Role | Limit | Why |
32
+
|------|-------|-----|
33
+
|**Main agent**| Coordinator only — no `browse` commands | It plans, delegates, merges. Zero testing. |
34
+
|**Sub-agent**|**20 steps max** (~10 min each) | Hard cap. Stop and report at 20 even if there's more to test. |
35
+
|**Max sub-agents**| 5 per run | More agents × fewer steps = fast wall clock |
36
+
|**Max pages per agent**| 3 | Keep each agent tightly focused |
37
+
38
+
**Total cap: ~100 test steps per run** (5 agents × 20 steps). Wall clock target: **~10 min**.
39
+
40
+
No early stopping on failures — find as many bugs as possible within the step budget.
41
+
42
+
The key constraint is **per-agent**: 20 steps max, no exceptions. It's better to split work across more focused agents than to let one agent go deep.
43
+
44
+
### How the main agent should work
45
+
46
+
1.**Analyze** — read the diff, categorize changes, identify URLs to test
47
+
2.**Plan** — split into small, focused groups (1-2 pages per group, one test category each)
48
+
3.**Delegate** — launch up to 5 sub-agents in parallel, each with a tight scope and 20-step budget
49
+
4.**Merge** — collect results, produce the final report
50
+
51
+
The main agent should NOT run `browse` commands itself (except to verify the dev server is up). All testing happens in sub-agents.
52
+
53
+
**Splitting rules:**
54
+
- Each sub-agent gets 1-2 pages and one test category (e.g., "signup form validation", "dashboard accessibility", "nav + routing")
55
+
- If a page needs both functional and accessibility testing, split into two agents
56
+
- Prefer 5 agents × 15 steps over 3 agents × 20 steps — smaller scope = faster, more focused
57
+
58
+
### Adjusting the budget
59
+
60
+
| User says | Steps per agent | Max agents | Wall clock |
0 commit comments