Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 15 additions & 9 deletions templates/commands/maxsim/improve.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,20 @@ Invoke the `autoresearch` skill to drive the optimization loop. Invoke the `veri
**Phase 1 — Setup (Plan Mode)**

1. Enter Plan Mode via EnterPlanMode
2. Gather loop parameters via AskUserQuestion:
- **Metric command** — the command whose output is the optimization target (from $ARGUMENTS or ask)
- **Guard command** — regression check that must always pass (e.g., `npm test`)
- **Direction** — minimize or maximize the metric
- **Iteration budget** — max iterations before stopping (default: 20)
- **Scope** — which files/directories are in-scope for modification
3. Show the proposed loop configuration and confirm with user
4. Exit Plan Mode via ExitPlanMode
2. Gather loop parameters via two AskUserQuestion calls:
**Batch 1** (required — 4 questions):
- Metric command (the command to run and extract a number from)
Comment on lines +28 to +30
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 2 now implies the metric command is always collected via AskUserQuestion, but the command’s <context> section says $ARGUMENTS should be treated as the metric command when provided. To keep the setup flow consistent, update this step to only ask for the metric command when $ARGUMENTS is empty (or explicitly say it’s pre-filled from $ARGUMENTS).

Suggested change
2. Gather loop parameters via two AskUserQuestion calls:
**Batch 1** (required — 4 questions):
- Metric command (the command to run and extract a number from)
2. Gather loop parameters (using AskUserQuestion where needed):
**Batch 1** (required — 4 parameters):
- Metric command (if `$ARGUMENTS` is provided, treat it as the metric command and optionally confirm/edit with the user; otherwise, ask the user for the metric command to run and extract a number from)

Copilot uses AI. Check for mistakes.
- Guard command (regression check, e.g., `npm test`)
- Metric direction (`lower_is_better` or `higher_is_better`)
- Iteration budget (default: 20)

**Batch 2** (scope and constraints — 3 questions):
- Scope (files/directories to modify)
- Files to NEVER modify (test files, guard files, config)
- Starting approach (optional — first idea to try)
3. Dry-run: Execute the metric command once to establish baseline. Execute the guard command to confirm it passes. If either fails, ask the user to fix before proceeding.
4. Show the proposed loop configuration and confirm with user
5. Exit Plan Mode via ExitPlanMode

**Phase 2 — Optimization Loop**

Expand All @@ -46,7 +52,7 @@ Run the 8-phase autoresearch loop, one iteration at a time:
6. **Guard** — run the guard command to check for regressions
- Guard failure + verify pass → rework (max 2 attempts), then discard
7. **Decide** — metric improved AND guard passed → keep; otherwise → `git revert HEAD --no-edit`
8. **Log** — append iteration result to the TSV file (date, iteration, approach, metric-value, outcome, commit-hash, notes)
8. **Log** — append iteration result to the TSV file (iteration, commit, metric, delta, guard, status, description)

**Stuck Detection:**
After 5 consecutive discards or crashes:
Expand Down
68 changes: 68 additions & 0 deletions templates/workflows/verify-phase.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,68 @@ Agent(

Wait for all three review agents to complete before proceeding.

### Step 4b — Two-Stage Sequential Review (Optional)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new section is the only step header that uses ### rather than the file’s standard ## Step N — ... format, which makes the step structure inconsistent and can break skimming/TOC behavior. Consider promoting this to ## Step 4b — ... (or renumbering) to match the rest of the document.

Suggested change
### Step 4b — Two-Stage Sequential Review (Optional)
## Step 4b — Two-Stage Sequential Review (Optional)

Copilot uses AI. Check for mistakes.

When `verification.strict_mode` is enabled in the project config, run an additional two-stage sequential review after the parallel agents complete. Each stage uses a fresh verifier subagent to prevent anchoring bias.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config key referenced here (verification.strict_mode) doesn’t match the config schema and other docs, which use execution.verification.strict_mode. This mismatch can lead to strict-mode behavior being configured incorrectly (or not at all); update the key/path in the text accordingly.

Suggested change
When `verification.strict_mode` is enabled in the project config, run an additional two-stage sequential review after the parallel agents complete. Each stage uses a fresh verifier subagent to prevent anchoring bias.
When `execution.verification.strict_mode` is enabled in the project config, run an additional two-stage sequential review after the parallel agents complete. Each stage uses a fresh verifier subagent to prevent anchoring bias.

Copilot uses AI. Check for mistakes.

**Stage 1 — Spec Compliance:**

Spawn a fresh verifier agent:
```
Agent(
subagent_type="Explore",
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subagent_type is set to "Explore", but existing workflow templates consistently use subagent_type="verifier" for verifier agents. If "Explore" isn’t a valid agent type, this will fail at runtime; switch this to verifier (and keep the model as {verifier_model}).

Suggested change
subagent_type="Explore",
subagent_type="verifier",

Copilot uses AI. Check for mistakes.
model="{verifier_model}",
prompt="
You are performing a spec compliance review for phase {phase_number}: {phase_name}.

Read the phase requirements from GitHub Issue #{phase_issue_number}.
Read all files modified in this phase.

For EACH requirement listed in the issue, verify it is implemented with evidence:

CLAIM: Requirement [ID] — [description]
EVIDENCE: [file:line or command]
OUTPUT: [actual result observed]
VERDICT: PASS | FAIL — [reason]

End with: SPEC COMPLIANCE: PASS or SPEC COMPLIANCE: FAIL — [list of unmet requirements]
"
)
```

Wait for Stage 1 to complete. If it fails, include the failures in the final report.

**Stage 2 — Code Quality (fresh subagent):**

Spawn a NEW verifier agent (do NOT reuse the Stage 1 agent):
```
Agent(
subagent_type="Explore",
model="{verifier_model}",
prompt="
Comment on lines +264 to +267
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as Stage 1: subagent_type="Explore" is inconsistent with other verifier spawns (subagent_type="verifier") and may not be a valid agent type. Use verifier here as well to avoid agent-spawn failures.

Copilot uses AI. Check for mistakes.
You are performing a code quality deep review for phase {phase_number}: {phase_name}.

Context: Spec compliance review has already been completed.
Read all files modified in this phase.

Focus on implementation quality beyond spec compliance:
- Architecture and design pattern adherence
- Error handling completeness
- Edge case coverage
- Code maintainability and clarity
- No dead code, no unnecessary complexity

For each finding:
CLAIM: [what was checked]
EVIDENCE: [file:line]
OUTPUT: [observed behavior or code pattern]
VERDICT: PASS | FAIL — [reason]

End with: CODE QUALITY: PASS or CODE QUALITY: FAIL — [issues found]
"
)
```

## Step 5 — Identify Human Verification Items

Some checks cannot be automated. Flag these for human review:
Expand Down Expand Up @@ -258,6 +320,7 @@ Why manual: {why automated checks cannot cover this}
- Security review: PASS
- Quality review: PASS (no blockers)
- Efficiency review: PASS (no blockers)
- If strict_mode was on: Spec compliance review PASS and Code quality review PASS

**FAIL** — Any of:
- Any must-have truth: FAILED
Expand All @@ -267,6 +330,7 @@ Why manual: {why automated checks cannot cover this}
- Build: FAIL
- Any Blocker anti-pattern
- Security or Quality review: FAIL with blockers
- If strict_mode was on: Spec compliance review FAIL or Code quality review FAIL

**HUMAN_NEEDED** — All automated checks PASS but human verification items remain unreviewed.

Expand All @@ -292,6 +356,8 @@ checks:
security_review: pass | fail
quality_review: pass | fail
efficiency_review: pass | fail
spec_compliance_review: pass | fail | skipped
code_quality_review: pass | fail | skipped
Comment on lines +359 to +360
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these new check fields allow skipped, it would help to explicitly state (near the strict_mode section or here) that they MUST be set to skipped when strict_mode is disabled, so the verification YAML and the checks_passed/checks_total math stay consistent.

Copilot uses AI. Check for mistakes.
---

## Verification: Phase {phase_number} — {phase_name}
Expand Down Expand Up @@ -328,6 +394,8 @@ checks:
| Security | {PASS/FAIL} | {issues if fail} |
| Quality | {PASS/FAIL} | {blockers if fail} |
| Efficiency | {PASS/FAIL} | {blockers if fail} |
| Spec Compliance | {PASS/FAIL/SKIPPED} | strict_mode only; {unmet requirements if fail} |
| Code Quality (deep) | {PASS/FAIL/SKIPPED} | strict_mode only; {issues if fail} |

## Anti-Patterns Found

Expand Down
Loading