Skip to content

Commit 96680df

Browse files
maystudiosclaude
andcommitted
feat: close all audit gaps — tests, spec docs, config, verification, website polish
Priority 1 — Test infrastructure: - E2E test suite (tests/e2e/) with real GitHub API tests for labels, milestones, issues - Integration tests (tests/integration/) for full install→verify→uninstall→verify cycle - Hook unit tests for maxsim-session-start, maxsim-task-completed, maxsim-teammate-idle - CLI entrypoint tests for argument parsing, command routing, error output - State transition tests for all enums, profiles, parallelism limits, DEFAULT_CONFIG Priority 2 — Spec document cleanup: - init-process-design.md: replaced 34+ stale .planning/ references with GitHub-first v6 equivalents - skills-specification.md: added autoresearch (skill 15), updated command count 9→13 Priority 3 — Polish: - Fix cli.ts resolve-model bug (toUpperCase→toLowerCase for lowercase enum values) - Fix isNaN→Number.isNaN lint issue in cli.ts - Add model_overrides to config.json template for discoverability - Surface per-profile parallelism limits in /maxsim:settings (quality 20-40, balanced 10-20, budget 5-10) - Remove strict_mode gate on spec compliance verification (now unconditional per spec §10.2) - Fix website DocsPage meta description ("planning directory" → "GitHub Project Board") - Remove non-existent maxsim-sync-reminder hook reference from website docs - Remove non-existent --full/--todo flags from quick-tasks docs - Fix worktree config key naming in website docs to match v6 schema 773 tests passing across 24 test files. Build clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0fa1e14 commit 96680df

23 files changed

+3856
-287
lines changed

docs/spec/init-process-design.md

Lines changed: 198 additions & 237 deletions
Large diffs are not rendered by default.

docs/spec/skills-specification.md

Lines changed: 86 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# MaxsimCLI Skills Specification
22

3-
**Version:** 5.0
4-
**Date:** 2026-03-22
5-
**Status:** Authoritative design spec for the 14-skill target state
3+
**Version:** 6.0
4+
**Date:** 2026-03-26
5+
**Status:** Authoritative design spec for the 15-skill target state
66

7-
This document defines the exact content structure for each of the 14 MaxsimCLI skills. Each entry covers: Anthropic-compliant name and description, section outline with key content, agent preload assignments, cross-skill references, and estimated line count.
7+
This document defines the exact content structure for each of the 15 MaxsimCLI skills. Each entry covers: Anthropic-compliant name and description, section outline with key content, agent preload assignments, cross-skill references, and estimated line count.
88

99
---
1010

@@ -38,8 +38,9 @@ All skills follow Anthropic's Claude Code skill conventions:
3838
| 10 | `github-operations` | Agent-internal | NEW — merge of 2 existing skills |
3939
| 11 | `research` | Agent-internal | NEW — merge of 2 existing skills |
4040
| 12 | `project-memory` | User-facing | NEW skill |
41-
| 13 | `using-maxsim` | User-facing | UPDATE for v5 commands |
41+
| 13 | `using-maxsim` | User-facing | UPDATE for v6 commands |
4242
| 14 | `maxsim-simplify` | User-facing | Keep as-is |
43+
| 15 | `autoresearch` | User-facing | NEW skill |
4344

4445
---
4546

@@ -649,7 +650,7 @@ Preloaded by **researcher** agent. Listed as an on-demand skill for planner agen
649650

650651
### Rationale for Creation
651652

652-
The existing `memory-management` skill defines a local-file-based persistence model (CLAUDE.md, STATE.md, LESSONS.md). In v5, MAXSIM uses GitHub Issues as the single source of truth for project artifacts. A new skill is needed that: (1) establishes GitHub Issues as the canonical store for cross-session learnings, (2) defines what categories of knowledge to persist, (3) specifies the GitHub-native write pattern, and (4) explains the relationship between local files and GitHub state. This replaces `memory-management` in the 14-skill target set.
653+
The existing `memory-management` skill defines a local-file-based persistence model (CLAUDE.md, STATE.md, LESSONS.md). In v5, MAXSIM uses GitHub Issues as the single source of truth for project artifacts. A new skill is needed that: (1) establishes GitHub Issues as the canonical store for cross-session learnings, (2) defines what categories of knowledge to persist, (3) specifies the GitHub-native write pattern, and (4) explains the relationship between local files and GitHub state. This replaces `memory-management` in the 15-skill target set.
653654

654655
### Frontmatter
655656

@@ -736,7 +737,7 @@ Not preloaded. User-invocable on-demand. Executor agent may receive it via orche
736737

737738
---
738739

739-
## Skill 13: `using-maxsim` *(UPDATE for v5)*
740+
## Skill 13: `using-maxsim` *(UPDATE for v6)*
740741

741742
### Frontmatter
742743

@@ -752,7 +753,7 @@ description: >-
752753

753754
### Disposition
754755

755-
Update to accurately reflect the v5 command surface (9 commands) and the 14-skill target set. The current skill references outdated skill names (`verification-before-completion`, `sdd`, `memory-management`) that do not exist in the target state. The routing table and agent model sections are correct. The skills table needs to be updated.
756+
Update to accurately reflect the v6 command surface (13 commands) and the 15-skill target set. The current skill references outdated skill names (`verification-before-completion`, `sdd`, `memory-management`) that do not exist in the target state. The routing table and agent model sections are correct. The skills table needs to be updated.
756757

757758
### Section Outline
758759

@@ -763,7 +764,7 @@ Update to accurately reflect the v5 command surface (9 commands) and the 14-skil
763764
- Check STATE.md for last checkpoint
764765
- Check current phase in ROADMAP.md
765766
- Route using the command table
766-
4. **Command Surface (9 commands)** — updated routing table:
767+
4. **Command Surface (13 commands)** — updated routing table:
767768

768769
| Situation | Command |
769770
|-----------|---------|
@@ -778,9 +779,13 @@ Update to accurately reflect the v5 command surface (9 commands) and the 14-skil
778779
| Don't know what to do next | `/maxsim:go` |
779780
| Change workflow settings | `/maxsim:settings` |
780781
| Need command reference | `/maxsim:help` |
782+
| Optimize code against a metric | `/maxsim:improve` |
783+
| Iteratively fix errors until zero remain | `/maxsim:fix-loop` |
784+
| Autonomous bug hunting with hypothesis testing | `/maxsim:debug-loop` |
785+
| Security audit (STRIDE + OWASP + red-team) | `/maxsim:security` |
781786

782787
5. **Agent Model (4 agents)** — keep existing table (executor / planner / researcher / verifier) — this is correct in the current skill
783-
6. **Skills** *(UPDATE — replace old skill names with v5 target names)*:
788+
6. **Skills** *(UPDATE — replace old skill names with v6 target names)*:
784789

785790
| Skill | When It Activates |
786791
|-------|-------------------|
@@ -806,7 +811,7 @@ Not preloaded. User-invocable on-demand (this is the orientation/routing skill f
806811
- Check the routing table before starting any task — do not proceed ad-hoc
807812
- Explicit user approval required before working outside the current phase
808813
- STATE.md checkpoints from previous sessions must be acknowledged before proceeding
809-
- The 9-command surface is complete — there is no other entry point for MAXSIM work
814+
- The 13-command surface is complete — there is no other entry point for MAXSIM work
810815

811816
### Estimated Line Count
812817

@@ -866,6 +871,71 @@ Not preloaded. User-invocable on-demand. Verifier agent may receive it as a sugg
866871

867872
---
868873

874+
## Skill 15: `autoresearch` *(NEW)*
875+
876+
### Rationale for Creation
877+
878+
v6 introduces four autonomous loop commands (`/maxsim:improve`, `/maxsim:fix-loop`, `/maxsim:debug-loop`, `/maxsim:security`) that share a common constraint-driven iteration pattern: modify, verify, keep or discard, repeat. Rather than embedding the loop protocol in each command's agent prompt, a dedicated skill centralizes the iteration mechanics, decision rules, and results-logging format. Six reference workflows in `references/` provide domain-specific protocols that the skill dispatches to based on the command invoked.
879+
880+
### Frontmatter
881+
882+
```yaml
883+
---
884+
name: autoresearch
885+
description: >-
886+
Autonomous optimization loop with reference workflows. Powers /maxsim:improve,
887+
/maxsim:fix-loop, /maxsim:debug-loop, /maxsim:security. Used when running
888+
autonomous optimization, error repair, bug hunting, or security audit loops.
889+
---
890+
```
891+
892+
### Section Outline
893+
894+
1. **When to Activate** — trigger table mapping each of the 4 commands plus general "repeated iteration with measurable outcomes" trigger
895+
2. **Subcommands** — routing table: `/maxsim:improve` (default loop), `/maxsim:debug-loop``references/debug.md`, `/maxsim:fix-loop``references/fix.md`, `/maxsim:security``references/security.md`
896+
3. **Interactive Setup Gate** — required context per command: improve (Goal, Scope, Metric, Direction, Verify), debug-loop (Issue/Symptom, Scope), fix-loop (Target, Scope), security (Scope, Depth)
897+
4. **Bounded Iterations**`Iterations: N` for bounded runs; default is unbounded (loop until interrupted); early completion on goal achieved
898+
5. **Setup Phase** — inline config extraction or interactive 2-batch collection; dry-run verify command; 7 setup steps (read scope, define goal, define scope, define guard, create results log, establish baseline, confirm and begin)
899+
6. **The Loop**`LOOP (FOREVER or N times)`: Review → Ideate → Modify (ONE change) → Commit → Verify → Guard → Decide (keep/discard/revert/crash-fix) → Log → Repeat; references `references/loop-protocol.md`
900+
7. **Critical Rules** — 8 rules: loop until done, read before write, one change per iteration, mechanical verification only, automatic rollback, simplicity wins, git is memory (`experiment:` prefix, `git revert` not `git reset --hard`), when stuck think harder
901+
8. **Principles Reference** — points to `references/core-principles.md` (7 generalizable principles)
902+
9. **Adapting to Different Domains** — table mapping domain (backend, frontend, performance, refactoring, security, debugging, fixing) to metric, scope, verify command, and guard
903+
10. **Debug Loop Summary** — autonomous bug-hunting: scientific method, hypothesis testing, classify as confirmed/disproven/inconclusive; references `references/debug.md`
904+
11. **Fix Loop Summary** — autonomous error repair: detect, prioritize (build > types > tests > lint), fix ONE, commit, verify, guard, decide, log; references `references/fix.md`
905+
12. **Security Audit Summary** — STRIDE + OWASP + red-team adversarial analysis; 4 red-team lenses; code evidence required; composite metric; `--diff`, `--fix`, `--fail-on` flags; references `references/security.md`
906+
13. **Results Logging** — TSV format per `references/results-logging.md`; valid statuses: baseline, keep, keep (reworked), discard, crash, no-op, hook-blocked
907+
908+
### Reference Workflows (6 files in `references/`)
909+
910+
| File | Purpose |
911+
|------|---------|
912+
| `loop-protocol.md` | Core iteration protocol: review, ideate, modify, commit, verify, guard, decide, log |
913+
| `debug.md` | Debug loop: scientific method with hypothesis testing and classification |
914+
| `fix.md` | Fix loop: error detection, prioritization, atomic repair, verification |
915+
| `security.md` | Security audit: STRIDE + OWASP + red-team adversarial analysis |
916+
| `results-logging.md` | TSV results log format and protocol for all loop types |
917+
| `core-principles.md` | 7 generalizable principles behind autonomous iteration |
918+
919+
### Agent Preload Assignment
920+
921+
Not preloaded. User-invocable on-demand. Activates when any of the 4 autonomous loop commands is invoked (`/maxsim:improve`, `/maxsim:fix-loop`, `/maxsim:debug-loop`, `/maxsim:security`).
922+
923+
### Key Behavioral Rules
924+
925+
- One change per iteration — atomic changes for clear causality
926+
- Mechanical verification only — no subjective judgments, use metrics
927+
- Automatic rollback on failure — `git revert` (not `git reset --hard`) preserves experiment history
928+
- Every experiment committed with `experiment:` prefix before verification
929+
- Results log updated after every iteration — no silent iterations
930+
- Bounded loops stop after N iterations and print a final summary
931+
- Security audit is read-only by default — `--fix` flag required to auto-remediate
932+
933+
### Estimated Line Count
934+
935+
~169 lines (SKILL.md body, excluding reference files)
936+
937+
---
938+
869939
## Summary Table
870940

871941
| # | Skill Name | user-invocable | Preloaded By | Disposition | Target Lines |
@@ -882,18 +952,19 @@ Not preloaded. User-invocable on-demand. Verifier agent may receive it as a sugg
882952
| 10 | `github-operations` | false | none (available_skills) | NEW merge of 2 skills | ~160 |
883953
| 11 | `research` | false | researcher | NEW merge of 2 skills | ~190 |
884954
| 12 | `project-memory` | true | none | NEW skill | ~110 |
885-
| 13 | `using-maxsim` | true | none | Update skills table for v5 | ~85 |
955+
| 13 | `using-maxsim` | true | none | Update skills table for v6 | ~85 |
886956
| 14 | `maxsim-simplify` | true | none | Keep as-is | ~91 |
957+
| 15 | `autoresearch` | true | none | NEW skill | ~169 |
887958

888-
**Total estimated lines across all 14 skills:** ~1,570 lines
889-
**Maximum allowed (14 × 500):** 7,000 lines
959+
**Total estimated lines across all 15 skills:** ~1,739 lines
960+
**Maximum allowed (15 × 500):** 7,500 lines
890961
**All skills well within the 500-line body limit.**
891962

892963
---
893964

894965
## Skills Being Retired
895966

896-
The following skills exist in the current codebase but are not in the 14-skill target set:
967+
The following skills exist in the current codebase but are not in the 15-skill target set:
897968

898969
| Skill | Reason for Retirement |
899970
|-------|----------------------|

packages/cli/src/cli.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ const command = args[0];
1010

1111
const COMMANDS: Record<string, () => void> = {
1212
'resolve-model': () => {
13-
const agentType = args[1]?.toUpperCase() as AgentType;
13+
const agentType = args[1]?.toLowerCase() as AgentType;
1414
if (!agentType || !Object.values(AgentType).includes(agentType)) {
1515
console.error(`Invalid agent type: ${args[1]}`);
1616
process.exit(1);
@@ -31,7 +31,7 @@ const COMMANDS: Record<string, () => void> = {
3131

3232
const fileCountIdx = args.indexOf('--file-count');
3333
const fileCount = fileCountIdx >= 0 ? parseInt(args[fileCountIdx + 1], 10) : 0;
34-
if (fileCountIdx >= 0 && (isNaN(fileCount) || fileCount < 0)) {
34+
if (fileCountIdx >= 0 && (Number.isNaN(fileCount) || fileCount < 0)) {
3535
console.error('--file-count must be a non-negative integer');
3636
process.exit(1);
3737
}

packages/cli/src/core/version.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/** MaxsimCLI version — auto-injected from package.json at build time. */
2-
export const VERSION = '5.13.1';
2+
export const VERSION = '5.14.1';
33

44
/**
55
* Parse a semantic version string into components.

0 commit comments

Comments
 (0)