Skip to content

Commit dc1eedd

Browse files
shubh24shrey150
authored andcommitted
Restructure budget system for sub-agent architecture
Rewrite Budget & Limits section to use a coordinator/sub-agent model: main agent plans and delegates, sub-agents do the actual testing with a 20-step hard cap each. Wall clock target ~10 min for default runs.
1 parent 2e2f7c8 commit dc1eedd

1 file changed

Lines changed: 49 additions & 14 deletions

File tree

skills/ui-test/SKILL.md

Lines changed: 49 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,24 +20,59 @@ Three workflows:
2020

2121
## Budget & Limits
2222

23-
Every test run has a budget. Finish and report within these limits — don't try to test everything.
23+
Every test run has a budget. The main agent **coordinates** — it analyzes the diff, plans test groups, and fans out work to sub-agents. Sub-agents do the actual testing.
2424

25-
| Limit | Default | Notes |
26-
|-------|---------|-------|
27-
| **Max test steps** | 25 per run | Each STEP_PASS/STEP_FAIL counts as one step |
28-
| **Max steps per agent** | 15 | For parallel runs (Workflow C), each sub-agent gets 15 |
29-
| **Stop-early on failures** | 5 failures | Once you hit 5 STEP_FAILs, stop testing and report what you have |
30-
| **Max pages to visit** | 8 | Don't crawl the entire app — focus on changed or high-value pages |
25+
### Time math
3126

32-
**How to stay within budget:**
33-
- Prioritize ruthlessly. Test the riskiest changes first (forms, auth, data mutation), skip low-risk cosmetic checks if you're running out of steps.
34-
- Count your steps as you go. When you're at 80% of your budget, wrap up the current test group and move to reporting.
35-
- Prefer deterministic checks (axe-core, console errors) over manual exploration — they cover more ground per step.
36-
- If the user provides a custom budget (e.g., "quick test" or "thorough test"), adjust: quick = 10 steps, thorough = 40 steps.
27+
Each `browse` command takes ~30 seconds. A sub-agent doing 20 steps ≈ 10 min. The bottleneck is the **slowest sub-agent** — if one runs away, the whole run stalls waiting for it.
3728

38-
**At the end of every run, include a budget line in your report:**
29+
### Budget structure
30+
31+
| Role | Limit | Why |
32+
|------|-------|-----|
33+
| **Main agent** | Coordinator only — no `browse` commands | It plans, delegates, merges. Zero testing. |
34+
| **Sub-agent** | **20 steps max** (~10 min each) | Hard cap. Stop and report at 20 even if there's more to test. |
35+
| **Max sub-agents** | 5 per run | More agents × fewer steps = fast wall clock |
36+
| **Max pages per agent** | 3 | Keep each agent tightly focused |
37+
38+
**Total cap: ~100 test steps per run** (5 agents × 20 steps). Wall clock target: **~10 min**.
39+
40+
No early stopping on failures — find as many bugs as possible within the step budget.
41+
42+
The key constraint is **per-agent**: 20 steps max, no exceptions. It's better to split work across more focused agents than to let one agent go deep.
43+
44+
### How the main agent should work
45+
46+
1. **Analyze** — read the diff, categorize changes, identify URLs to test
47+
2. **Plan** — split into small, focused groups (1-2 pages per group, one test category each)
48+
3. **Delegate** — launch up to 5 sub-agents in parallel, each with a tight scope and 20-step budget
49+
4. **Merge** — collect results, produce the final report
50+
51+
The main agent should NOT run `browse` commands itself (except to verify the dev server is up). All testing happens in sub-agents.
52+
53+
**Splitting rules:**
54+
- Each sub-agent gets 1-2 pages and one test category (e.g., "signup form validation", "dashboard accessibility", "nav + routing")
55+
- If a page needs both functional and accessibility testing, split into two agents
56+
- Prefer 5 agents × 15 steps over 3 agents × 20 steps — smaller scope = faster, more focused
57+
58+
### Adjusting the budget
59+
60+
| User says | Steps per agent | Max agents | Wall clock |
61+
|-----------|----------------|------------|------------|
62+
| "quick test" | 10 | 2 | ~5 min |
63+
| (default) | 20 | 5 | ~10 min |
64+
| "thorough test" | 30 | 5 | ~15 min |
65+
66+
### Budget reporting
67+
68+
**Every sub-agent must include a budget line when reporting back:**
69+
```
70+
Budget: 14/20 steps used | 2 pages visited | 3 failures
71+
```
72+
73+
**The main agent includes a total in the final report:**
3974
```
40-
Budget: 18/25 steps used | 3 pages visited | 1 failure
75+
Total budget: 62/100 steps across 5 agents | ~10 min wall clock | 7 failures
4176
```
4277

4378
## Testing Philosophy

0 commit comments

Comments
 (0)