Merge pull request #174 from maystudios/worktree-agent-ae8e00cb

maystudios · web-flow · commit d1806d86210f · 2026-03-25T13:50:51.000+01:00
docs: add missing documentation sections to skills and references
diff --git a/templates/references/self-improvement.md b/templates/references/self-improvement.md
@@ -43,6 +43,10 @@ at the end of every session in a MaxsimCLI project. Each entry records:
 - Which tasks were attempted repeatedly without a commit (likely failed).
 - Long-term trends across many sessions.
 
+### MEMORY.md Size Limit
+
+Keep MEMORY.md under 200 lines (the Claude Code context loading limit). The `maxsim-capture-learnings` Stop hook enforces this by pruning at 180 lines, leaving headroom. When the file approaches the limit, the oldest entries are removed first. Each entry should be concise (3-5 lines) to maximize the number of sessions that fit.
+
 ---
 
 ## 3. Results Tracking
diff --git a/templates/skills/autoresearch/references/loop-protocol.md b/templates/skills/autoresearch/references/loop-protocol.md
@@ -93,10 +93,11 @@ If verification exceeds 2x normal time, kill and treat as crash.
 
 Some metrics are inherently noisy (benchmark times, ML accuracy). Strategies:
 
-- **Multi-run verification:** Run verify N times, use the median.
-- **Minimum improvement threshold:** Ignore improvements smaller than the noise floor.
-- **Confirmation run:** Re-verify before making a final keep decision.
-- **Environment pinning:** Pin random seeds, use deterministic test ordering, flush caches.
+- **For improvements of 1–5%:** Run the verify command 3 times and use the median result.
+- **For improvements >5%:** Run the verify command 5 times and use the median result.
+- **Minimum improvement threshold:** Ignore improvements smaller than the noise floor (typically 0.5% for benchmarks).
+- **Confirmation run:** After accepting an improvement, re-verify once more before making the final keep decision.
+- **Environment pinning:** Pin random seeds, use deterministic test ordering, flush caches between runs.
 
 ## Phase 5.5: Guard (Regression Check)
 
diff --git a/templates/skills/verification/SKILL.md b/templates/skills/verification/SKILL.md
@@ -167,3 +167,42 @@ Do not attempt a 4th run without user acknowledgment and revised instructions.
 | Skipping Gate 4 after Gate 3 passes | Declaring done without regression check | Gate 3 and Gate 4 are both required; neither is optional |
 | Conflating "no errors" with "correct output" | Exit code 0 but wrong behavior | Evidence must show correct output, not just absence of error |
 | Writing evidence after the fact | Constructing output from memory | Run the command, capture the output, paste it verbatim |
+
+---
+
+## 5-Step Verification Process
+
+When verification fails, follow this structured process:
+
+1. **Run the check command one final time** — capture fresh output as evidence
+2. **Construct diagnostic summary** — compare spec expectations vs actual output
+3. **Identify root cause** — is it a spec problem, environment problem, or implementation problem?
+4. **Propose next step** — rewrite spec, fix environment, reduce scope, or escalate
+5. **Escalate if unresolved** — create a diagnostic GitHub Issue with all evidence
+
+---
+
+## GitHub Issue Escalation
+
+When a task fails verification after 3 attempts, escalate by creating (or commenting on) a GitHub Issue:
+
+1. **Original task spec** — quoted from the plan comment
+2. **What was attempted** — brief factual summary of each attempt
+3. **The specific gate that failed** — exact error output from each run
+4. **Root cause analysis** — spec/environment/implementation classification
+5. **Proposed next step** — rewrite spec, fix environment, reduce scope, or request user input
+
+Label the issue with `type:bug` and `maxsim:auto`.
+
+---
+
+## Fresh Executor Context
+
+Each retry attempt MUST use a fresh executor agent:
+
+- Do NOT reuse the previous executor (spawn a new one)
+- Provide the full task spec (do not assume prior context carries over)
+- Include the diagnostic summary from the failed run
+- Include revised instructions based on root cause analysis
+
+Treat each fresh executor as a cold start. Do NOT reference or build upon any previous attempt's reasoning or partial work.