Merge pull request #186 from maystudios/worktree-agent-a0bcccc3

maystudios · web-flow · commit 80755c381b5a · 2026-03-25T18:39:19.000+01:00
fix(self-improvement): TSV columns, two-batch wizard, two-stage review
diff --git a/templates/commands/maxsim/improve.md b/templates/commands/maxsim/improve.md
@@ -25,14 +25,20 @@ Invoke the `autoresearch` skill to drive the optimization loop. Invoke the `veri
 **Phase 1 — Setup (Plan Mode)**
 
 1. Enter Plan Mode via EnterPlanMode
-2. Gather loop parameters via AskUserQuestion:
-   - **Metric command** — the command whose output is the optimization target (from $ARGUMENTS or ask)
-   - **Guard command** — regression check that must always pass (e.g., `npm test`)
-   - **Direction** — minimize or maximize the metric
-   - **Iteration budget** — max iterations before stopping (default: 20)
-   - **Scope** — which files/directories are in-scope for modification
-3. Show the proposed loop configuration and confirm with user
-4. Exit Plan Mode via ExitPlanMode
+2. Gather loop parameters via two AskUserQuestion calls:
+   **Batch 1** (required — 4 questions):
+   - Metric command (the command to run and extract a number from)
+   - Guard command (regression check, e.g., `npm test`)
+   - Metric direction (`lower_is_better` or `higher_is_better`)
+   - Iteration budget (default: 20)
+
+   **Batch 2** (scope and constraints — 3 questions):
+   - Scope (files/directories to modify)
+   - Files to NEVER modify (test files, guard files, config)
+   - Starting approach (optional — first idea to try)
+3. Dry-run: Execute the metric command once to establish baseline. Execute the guard command to confirm it passes. If either fails, ask the user to fix before proceeding.
+4. Show the proposed loop configuration and confirm with user
+5. Exit Plan Mode via ExitPlanMode
 
 **Phase 2 — Optimization Loop**
 
@@ -46,7 +52,7 @@ Run the 8-phase autoresearch loop, one iteration at a time:
 6. **Guard** — run the guard command to check for regressions
    - Guard failure + verify pass → rework (max 2 attempts), then discard
 7. **Decide** — metric improved AND guard passed → keep; otherwise → `git revert HEAD --no-edit`
-8. **Log** — append iteration result to the TSV file (date, iteration, approach, metric-value, outcome, commit-hash, notes)
+8. **Log** — append iteration result to the TSV file (iteration, commit, metric, delta, guard, status, description)
 
 **Stuck Detection:**
 After 5 consecutive discards or crashes:
diff --git a/templates/workflows/verify-phase.md b/templates/workflows/verify-phase.md
@@ -226,6 +226,68 @@ Agent(
 
 Wait for all three review agents to complete before proceeding.
 
+### Step 4b — Two-Stage Sequential Review (Optional)
+
+When `verification.strict_mode` is enabled in the project config, run an additional two-stage sequential review after the parallel agents complete. Each stage uses a fresh verifier subagent to prevent anchoring bias.
+
+**Stage 1 — Spec Compliance:**
+
+Spawn a fresh verifier agent:
+```
+Agent(
+  subagent_type="Explore",
+  model="{verifier_model}",
+  prompt="
+    You are performing a spec compliance review for phase {phase_number}: {phase_name}.
+
+    Read the phase requirements from GitHub Issue #{phase_issue_number}.
+    Read all files modified in this phase.
+
+    For EACH requirement listed in the issue, verify it is implemented with evidence:
+
+    CLAIM: Requirement [ID] — [description]
+    EVIDENCE: [file:line or command]
+    OUTPUT: [actual result observed]
+    VERDICT: PASS | FAIL — [reason]
+
+    End with: SPEC COMPLIANCE: PASS or SPEC COMPLIANCE: FAIL — [list of unmet requirements]
+  "
+)
+```
+
+Wait for Stage 1 to complete. If it fails, include the failures in the final report.
+
+**Stage 2 — Code Quality (fresh subagent):**
+
+Spawn a NEW verifier agent (do NOT reuse the Stage 1 agent):
+```
+Agent(
+  subagent_type="Explore",
+  model="{verifier_model}",
+  prompt="
+    You are performing a code quality deep review for phase {phase_number}: {phase_name}.
+
+    Context: Spec compliance review has already been completed.
+    Read all files modified in this phase.
+
+    Focus on implementation quality beyond spec compliance:
+    - Architecture and design pattern adherence
+    - Error handling completeness
+    - Edge case coverage
+    - Code maintainability and clarity
+    - No dead code, no unnecessary complexity
+
+    For each finding:
+    CLAIM: [what was checked]
+    EVIDENCE: [file:line]
+    OUTPUT: [observed behavior or code pattern]
+    VERDICT: PASS | FAIL — [reason]
+
+    End with: CODE QUALITY: PASS or CODE QUALITY: FAIL — [issues found]
+  "
+)
+```
+
 ## Step 5 — Identify Human Verification Items
 
 Some checks cannot be automated. Flag these for human review:
@@ -258,6 +320,7 @@ Why manual: {why automated checks cannot cover this}
 - Security review: PASS
 - Quality review: PASS (no blockers)
 - Efficiency review: PASS (no blockers)
+- If strict_mode was on: Spec compliance review PASS and Code quality review PASS
 
 **FAIL** — Any of:
 - Any must-have truth: FAILED
@@ -267,6 +330,7 @@ Why manual: {why automated checks cannot cover this}
 - Build: FAIL
 - Any Blocker anti-pattern
 - Security or Quality review: FAIL with blockers
+- If strict_mode was on: Spec compliance review FAIL or Code quality review FAIL
 
 **HUMAN_NEEDED** — All automated checks PASS but human verification items remain unreviewed.
 
@@ -292,6 +356,8 @@ checks:
   security_review: pass | fail
   quality_review: pass | fail
   efficiency_review: pass | fail
+  spec_compliance_review: pass | fail | skipped
+  code_quality_review: pass | fail | skipped
 ---
 
 ## Verification: Phase {phase_number} — {phase_name}
@@ -328,6 +394,8 @@ checks:
 | Security | {PASS/FAIL} | {issues if fail} |
 | Quality | {PASS/FAIL} | {blockers if fail} |
 | Efficiency | {PASS/FAIL} | {blockers if fail} |
+| Spec Compliance | {PASS/FAIL/SKIPPED} | strict_mode only; {unmet requirements if fail} |
+| Code Quality (deep) | {PASS/FAIL/SKIPPED} | strict_mode only; {issues if fail} |
 
 ## Anti-Patterns Found