Skip to content

Commit 80755c3

Browse files
authored
Merge pull request #186 from maystudios/worktree-agent-a0bcccc3
fix(self-improvement): TSV columns, two-batch wizard, two-stage review
2 parents 86e0d71 + eab545a commit 80755c3

File tree

2 files changed

+83
-9
lines changed

2 files changed

+83
-9
lines changed

templates/commands/maxsim/improve.md

Lines changed: 15 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,20 @@ Invoke the `autoresearch` skill to drive the optimization loop. Invoke the `veri
2525
**Phase 1 — Setup (Plan Mode)**
2626

2727
1. Enter Plan Mode via EnterPlanMode
28-
2. Gather loop parameters via AskUserQuestion:
29-
- **Metric command** — the command whose output is the optimization target (from $ARGUMENTS or ask)
30-
- **Guard command** — regression check that must always pass (e.g., `npm test`)
31-
- **Direction** — minimize or maximize the metric
32-
- **Iteration budget** — max iterations before stopping (default: 20)
33-
- **Scope** — which files/directories are in-scope for modification
34-
3. Show the proposed loop configuration and confirm with user
35-
4. Exit Plan Mode via ExitPlanMode
28+
2. Gather loop parameters via two AskUserQuestion calls:
29+
**Batch 1** (required — 4 questions):
30+
- Metric command (the command to run and extract a number from)
31+
- Guard command (regression check, e.g., `npm test`)
32+
- Metric direction (`lower_is_better` or `higher_is_better`)
33+
- Iteration budget (default: 20)
34+
35+
**Batch 2** (scope and constraints — 3 questions):
36+
- Scope (files/directories to modify)
37+
- Files to NEVER modify (test files, guard files, config)
38+
- Starting approach (optional — first idea to try)
39+
3. Dry-run: Execute the metric command once to establish baseline. Execute the guard command to confirm it passes. If either fails, ask the user to fix before proceeding.
40+
4. Show the proposed loop configuration and confirm with user
41+
5. Exit Plan Mode via ExitPlanMode
3642

3743
**Phase 2 — Optimization Loop**
3844

@@ -46,7 +52,7 @@ Run the 8-phase autoresearch loop, one iteration at a time:
4652
6. **Guard** — run the guard command to check for regressions
4753
- Guard failure + verify pass → rework (max 2 attempts), then discard
4854
7. **Decide** — metric improved AND guard passed → keep; otherwise → `git revert HEAD --no-edit`
49-
8. **Log** — append iteration result to the TSV file (date, iteration, approach, metric-value, outcome, commit-hash, notes)
55+
8. **Log** — append iteration result to the TSV file (iteration, commit, metric, delta, guard, status, description)
5056

5157
**Stuck Detection:**
5258
After 5 consecutive discards or crashes:

templates/workflows/verify-phase.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,68 @@ Agent(
226226

227227
Wait for all three review agents to complete before proceeding.
228228

229+
### Step 4b — Two-Stage Sequential Review (Optional)
230+
231+
When `verification.strict_mode` is enabled in the project config, run an additional two-stage sequential review after the parallel agents complete. Each stage uses a fresh verifier subagent to prevent anchoring bias.
232+
233+
**Stage 1 — Spec Compliance:**
234+
235+
Spawn a fresh verifier agent:
236+
```
237+
Agent(
238+
subagent_type="Explore",
239+
model="{verifier_model}",
240+
prompt="
241+
You are performing a spec compliance review for phase {phase_number}: {phase_name}.
242+
243+
Read the phase requirements from GitHub Issue #{phase_issue_number}.
244+
Read all files modified in this phase.
245+
246+
For EACH requirement listed in the issue, verify it is implemented with evidence:
247+
248+
CLAIM: Requirement [ID] — [description]
249+
EVIDENCE: [file:line or command]
250+
OUTPUT: [actual result observed]
251+
VERDICT: PASS | FAIL — [reason]
252+
253+
End with: SPEC COMPLIANCE: PASS or SPEC COMPLIANCE: FAIL — [list of unmet requirements]
254+
"
255+
)
256+
```
257+
258+
Wait for Stage 1 to complete. If it fails, include the failures in the final report.
259+
260+
**Stage 2 — Code Quality (fresh subagent):**
261+
262+
Spawn a NEW verifier agent (do NOT reuse the Stage 1 agent):
263+
```
264+
Agent(
265+
subagent_type="Explore",
266+
model="{verifier_model}",
267+
prompt="
268+
You are performing a code quality deep review for phase {phase_number}: {phase_name}.
269+
270+
Context: Spec compliance review has already been completed.
271+
Read all files modified in this phase.
272+
273+
Focus on implementation quality beyond spec compliance:
274+
- Architecture and design pattern adherence
275+
- Error handling completeness
276+
- Edge case coverage
277+
- Code maintainability and clarity
278+
- No dead code, no unnecessary complexity
279+
280+
For each finding:
281+
CLAIM: [what was checked]
282+
EVIDENCE: [file:line]
283+
OUTPUT: [observed behavior or code pattern]
284+
VERDICT: PASS | FAIL — [reason]
285+
286+
End with: CODE QUALITY: PASS or CODE QUALITY: FAIL — [issues found]
287+
"
288+
)
289+
```
290+
229291
## Step 5 — Identify Human Verification Items
230292

231293
Some checks cannot be automated. Flag these for human review:
@@ -258,6 +320,7 @@ Why manual: {why automated checks cannot cover this}
258320
- Security review: PASS
259321
- Quality review: PASS (no blockers)
260322
- Efficiency review: PASS (no blockers)
323+
- If strict_mode was on: Spec compliance review PASS and Code quality review PASS
261324

262325
**FAIL** — Any of:
263326
- Any must-have truth: FAILED
@@ -267,6 +330,7 @@ Why manual: {why automated checks cannot cover this}
267330
- Build: FAIL
268331
- Any Blocker anti-pattern
269332
- Security or Quality review: FAIL with blockers
333+
- If strict_mode was on: Spec compliance review FAIL or Code quality review FAIL
270334

271335
**HUMAN_NEEDED** — All automated checks PASS but human verification items remain unreviewed.
272336

@@ -292,6 +356,8 @@ checks:
292356
security_review: pass | fail
293357
quality_review: pass | fail
294358
efficiency_review: pass | fail
359+
spec_compliance_review: pass | fail | skipped
360+
code_quality_review: pass | fail | skipped
295361
---
296362
297363
## Verification: Phase {phase_number} — {phase_name}
@@ -328,6 +394,8 @@ checks:
328394
| Security | {PASS/FAIL} | {issues if fail} |
329395
| Quality | {PASS/FAIL} | {blockers if fail} |
330396
| Efficiency | {PASS/FAIL} | {blockers if fail} |
397+
| Spec Compliance | {PASS/FAIL/SKIPPED} | strict_mode only; {unmet requirements if fail} |
398+
| Code Quality (deep) | {PASS/FAIL/SKIPPED} | strict_mode only; {issues if fail} |
331399
332400
## Anti-Patterns Found
333401

0 commit comments

Comments
 (0)