docs: add missing documentation sections to skills and references by maystudios · Pull Request #174 · maystudios/maxsimcli

maystudios · 2026-03-25T12:44:45Z

Summary

verification SKILL.md: Added 5-Step Verification Process, GitHub Issue Escalation, and Fresh Executor Context sections to complete the retry/escalation documentation.
loop-protocol.md: Replaced generic noise-handling guidance with quantified thresholds (run counts, 0.5% noise floor, confirmation runs, environment pinning details).
self-improvement.md: Added MEMORY.md Size Limit subsection documenting the 200-line hard limit and 180-line pruning behavior.

Test plan

All three files are Markdown templates -- no runtime tests needed
verification SKILL.md stays under 500-line skill limit (208 lines)
No YAML frontmatter changes -- existing name: and description: preserved
No @ imports added

🤖 Generated with Claude Code

Add 5-step verification process, GitHub Issue escalation, and fresh executor context sections to verification SKILL.md. Add quantified noise-handling thresholds to autoresearch loop-protocol.md. Add MEMORY.md 200-line size limit documentation to self-improvement.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates MaxsimCLI’s documentation templates to fill in missing guidance around verification retries/escalation, handling noisy metrics in autoresearch loops, and the enforced size limit behavior for agent MEMORY.md.

Changes:

Expanded verification skill documentation with a structured post-failure process, GitHub Issue escalation checklist, and “fresh executor” guidance.
Replaced generic noise-handling advice in autoresearch loop protocol with explicit run-count thresholds and a stated noise floor.
Documented the MEMORY.md 200-line hard limit and the hook’s pruning buffer in the self-improvement reference.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
templates/skills/verification/SKILL.md	Adds verification failure process + escalation and retry-context guidance.
templates/skills/autoresearch/references/loop-protocol.md	Introduces quantified multi-run verification rules for noisy metrics.
templates/references/self-improvement.md	Documents MEMORY.md hard limit and pruning behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T12:48:17Z

templates/skills/autoresearch/references/loop-protocol.md

- **Environment pinning:** Pin random seeds, use deterministic test ordering, flush caches.
+- **For improvements of 1–5%:** Run the verify command 3 times and use the median result.
+- **For improvements >5%:** Run the verify command 5 times and use the median result.
+- **Minimum improvement threshold:** Ignore improvements smaller than the noise floor (typically 0.5% for benchmarks).


The multi-run guidance covers improvements of 1–5% and >5%, but doesn’t say what to do for small-but-real changes between the noise floor and 1% (e.g., 0.6% when noise floor is 0.5%). Consider adding an explicit rule for 0.5–1% (or “<1%”) improvements (e.g., treat as noise unless confirmed with 5 runs, or require an extra confirmation run) so the protocol is complete.

Suggested change

- **Minimum improvement threshold:** Ignore improvements smaller than the noise floor (typically 0.5% for benchmarks).

- **For improvements between the noise floor and 1% (e.g., 0.5–1%):** Treat as noise by default. Only consider keeping if you run at least 5 verification runs and the median improvement remains above the noise floor.

- **Minimum improvement threshold:** Ignore improvements smaller than the noise floor (typically 0.5% for benchmarks), and apply the above rule for any borderline improvements between the noise floor and 1%.

Copilot · 2026-03-25T12:48:18Z

templates/references/self-improvement.md


+### MEMORY.md Size Limit
+
+Keep MEMORY.md under 200 lines (the Claude Code context loading limit). The `maxsim-capture-learnings` Stop hook enforces this by pruning at 180 lines, leaving headroom. When the file approaches the limit, the oldest entries are removed first. Each entry should be concise (3-5 lines) to maximize the number of sessions that fit.


This says “Each entry should be concise (3–5 lines)”, but the maxsim-capture-learnings Stop hook’s actual format includes a session header plus a bullet per commit (and optional pattern), so entries can easily exceed 5 lines even in normal usage. Suggest adjusting this guidance to match the real output (e.g., encourage keeping commit lists short / summarizing patterns), or clarify that the hook-generated entry may be longer than 3–5 lines.

Suggested change

Keep MEMORY.md under 200 lines (the Claude Code context loading limit). The `maxsim-capture-learnings` Stop hook enforces this by pruning at 180 lines, leaving headroom. When the file approaches the limit, the oldest entries are removed first. Each entry should be concise (3-5 lines) to maximize the number of sessions that fit.

Keep MEMORY.md under 200 lines (the Claude Code context loading limit). The `maxsim-capture-learnings` Stop hook enforces this by pruning at 180 lines, leaving headroom. When the file approaches the limit, the oldest entries are removed first. Aim to keep each entry concise (around 3–5 lines) by keeping commit lists short or summarizing patterns, but note that hook-generated entries may be longer when there are many commits.

Copilot · 2026-03-25T12:48:18Z

templates/skills/verification/SKILL.md

+1. **Original task spec** — quoted from the plan comment
+2. **What was attempted** — brief factual summary of each attempt
+3. **The specific gate that failed** — exact error output from each run
+4. **Root cause analysis** — spec/environment/implementation classification
+5. **Proposed next step** — rewrite spec, fix environment, reduce scope, or request user input


In this escalation checklist, “quoted from the plan comment” is ambiguous in contexts that don’t have a distinct plan comment (e.g., quick tasks or non-GitHub flows). Consider wording this as “quoted from the plan comment (if present) or the issue/task description” so the instruction is actionable in all cases.

Copilot · 2026-03-25T12:48:18Z

templates/skills/verification/SKILL.md

+
+## GitHub Issue Escalation
+
+When a task fails verification after 3 attempts, escalate by creating (or commenting on) a GitHub Issue:


This section largely duplicates the earlier “After 3 Failures” escalation list but changes the escalation target to a GitHub Issue. Because similar retry/escalation logic in templates/workflows/execute.md triggers a diagnostic issue only after 4 total attempts, it’s easy for these docs to drift or conflict. Suggest consolidating into a single canonical escalation section (or explicitly scoping it: single-task vs /maxsim:execute phase execution) and aligning the attempt threshold wording.

Suggested change

When a task fails verification after 3 attempts, escalate by creating (or commenting on) a GitHub Issue:

When verification still fails after the final allowed attempt (as defined by the workflow configuration), escalate by creating (or commenting on) a GitHub Issue:

Copilot · 2026-03-25T12:48:19Z

templates/skills/verification/SKILL.md

+Each retry attempt MUST use a fresh executor agent:
+
+- Do NOT reuse the previous executor (spawn a new one)
+- Provide the full task spec (do not assume prior context carries over)
+- Include the diagnostic summary from the failed run
+- Include revised instructions based on root cause analysis
+
+Treat each fresh executor as a cold start. Do NOT reference or build upon any previous attempt's reasoning or partial work.


“Each retry attempt MUST use a fresh executor agent” conflicts with the earlier per-attempt rules in this same file (Attempt 2 is described as “Fresh context” while Attempt 3 is “Fresh agent context”). Either update the Attempt 2/3 bullets to match this requirement, or narrow this section to the specific attempts/workflows where a new executor is actually spawned, so readers don’t get contradictory instructions.

Suggested change

Each retry attempt MUST use a fresh executor agent:

- Do NOT reuse the previous executor (spawn a new one)

- Provide the full task spec (do not assume prior context carries over)

- Include the diagnostic summary from the failed run

- Include revised instructions based on root cause analysis

Treat each fresh executor as a cold start. Do NOT reference or build upon any previous attempt's reasoning or partial work.

For retries that spawn a new executor agent, treat that agent as a fresh executor:

- When you spawn a new executor, do NOT reuse any previous executor state

- Provide the full task spec (do not assume prior context carries over)

- Include the diagnostic summary from the failed run

- Include revised instructions based on root cause analysis

Treat each fresh executor as a cold start. Do NOT reference or build upon any previous attempt's reasoning or partial work within that executor.

github-actions · 2026-03-25T12:53:36Z

🎉 This PR is included in version 5.13.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Copilot AI review requested due to automatic review settings March 25, 2026 12:44

Copilot started reviewing on behalf of maystudios March 25, 2026 12:45 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

maystudios merged commit d1806d8 into main Mar 25, 2026
5 checks passed

github-actions bot added the released label Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add missing documentation sections to skills and references#174

docs: add missing documentation sections to skills and references#174
maystudios merged 1 commit intomainfrom
worktree-agent-ae8e00cb

maystudios commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- Minimum improvement threshold: Ignore improvements smaller than the noise floor (typically 0.5% for benchmarks).
	- For improvements between the noise floor and 1% (e.g., 0.5–1%): Treat as noise by default. Only consider keeping if you run at least 5 verification runs and the median improvement remains above the noise floor.
	- Minimum improvement threshold: Ignore improvements smaller than the noise floor (typically 0.5% for benchmarks), and apply the above rule for any borderline improvements between the noise floor and 1%.


		### MEMORY.md Size Limit

		Keep MEMORY.md under 200 lines (the Claude Code context loading limit). The `maxsim-capture-learnings` Stop hook enforces this by pruning at 180 lines, leaving headroom. When the file approaches the limit, the oldest entries are removed first. Each entry should be concise (3-5 lines) to maximize the number of sessions that fit.


		## GitHub Issue Escalation

		When a task fails verification after 3 attempts, escalate by creating (or commenting on) a GitHub Issue:

	When a task fails verification after 3 attempts, escalate by creating (or commenting on) a GitHub Issue:
	When verification still fails after the final allowed attempt (as defined by the workflow configuration), escalate by creating (or commenting on) a GitHub Issue:

Conversation

maystudios commented Mar 25, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants