Skip to content

Commit d1806d8

Browse files
authored
Merge pull request #174 from maystudios/worktree-agent-ae8e00cb
docs: add missing documentation sections to skills and references
2 parents 1430ff2 + d784cb3 commit d1806d8

File tree

3 files changed

+48
-4
lines changed

3 files changed

+48
-4
lines changed

templates/references/self-improvement.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ at the end of every session in a MaxsimCLI project. Each entry records:
4343
- Which tasks were attempted repeatedly without a commit (likely failed).
4444
- Long-term trends across many sessions.
4545

46+
### MEMORY.md Size Limit
47+
48+
Keep MEMORY.md under 200 lines (the Claude Code context loading limit). The `maxsim-capture-learnings` Stop hook enforces this by pruning at 180 lines, leaving headroom. When the file approaches the limit, the oldest entries are removed first. Each entry should be concise (3-5 lines) to maximize the number of sessions that fit.
49+
4650
---
4751

4852
## 3. Results Tracking

templates/skills/autoresearch/references/loop-protocol.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,11 @@ If verification exceeds 2x normal time, kill and treat as crash.
9393

9494
Some metrics are inherently noisy (benchmark times, ML accuracy). Strategies:
9595

96-
- **Multi-run verification:** Run verify N times, use the median.
97-
- **Minimum improvement threshold:** Ignore improvements smaller than the noise floor.
98-
- **Confirmation run:** Re-verify before making a final keep decision.
99-
- **Environment pinning:** Pin random seeds, use deterministic test ordering, flush caches.
96+
- **For improvements of 1–5%:** Run the verify command 3 times and use the median result.
97+
- **For improvements >5%:** Run the verify command 5 times and use the median result.
98+
- **Minimum improvement threshold:** Ignore improvements smaller than the noise floor (typically 0.5% for benchmarks).
99+
- **Confirmation run:** After accepting an improvement, re-verify once more before making the final keep decision.
100+
- **Environment pinning:** Pin random seeds, use deterministic test ordering, flush caches between runs.
100101

101102
## Phase 5.5: Guard (Regression Check)
102103

templates/skills/verification/SKILL.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,3 +167,42 @@ Do not attempt a 4th run without user acknowledgment and revised instructions.
167167
| Skipping Gate 4 after Gate 3 passes | Declaring done without regression check | Gate 3 and Gate 4 are both required; neither is optional |
168168
| Conflating "no errors" with "correct output" | Exit code 0 but wrong behavior | Evidence must show correct output, not just absence of error |
169169
| Writing evidence after the fact | Constructing output from memory | Run the command, capture the output, paste it verbatim |
170+
171+
---
172+
173+
## 5-Step Verification Process
174+
175+
When verification fails, follow this structured process:
176+
177+
1. **Run the check command one final time** — capture fresh output as evidence
178+
2. **Construct diagnostic summary** — compare spec expectations vs actual output
179+
3. **Identify root cause** — is it a spec problem, environment problem, or implementation problem?
180+
4. **Propose next step** — rewrite spec, fix environment, reduce scope, or escalate
181+
5. **Escalate if unresolved** — create a diagnostic GitHub Issue with all evidence
182+
183+
---
184+
185+
## GitHub Issue Escalation
186+
187+
When a task fails verification after 3 attempts, escalate by creating (or commenting on) a GitHub Issue:
188+
189+
1. **Original task spec** — quoted from the plan comment
190+
2. **What was attempted** — brief factual summary of each attempt
191+
3. **The specific gate that failed** — exact error output from each run
192+
4. **Root cause analysis** — spec/environment/implementation classification
193+
5. **Proposed next step** — rewrite spec, fix environment, reduce scope, or request user input
194+
195+
Label the issue with `type:bug` and `maxsim:auto`.
196+
197+
---
198+
199+
## Fresh Executor Context
200+
201+
Each retry attempt MUST use a fresh executor agent:
202+
203+
- Do NOT reuse the previous executor (spawn a new one)
204+
- Provide the full task spec (do not assume prior context carries over)
205+
- Include the diagnostic summary from the failed run
206+
- Include revised instructions based on root cause analysis
207+
208+
Treat each fresh executor as a cold start. Do NOT reference or build upon any previous attempt's reasoning or partial work.

0 commit comments

Comments
 (0)