diff --git a/.selfmodel/playbook/evolution-protocol.md b/.selfmodel/playbook/evolution-protocol.md new file mode 100644 index 0000000..fc80425 --- /dev/null +++ b/.selfmodel/playbook/evolution-protocol.md @@ -0,0 +1,681 @@ +# Evolution Protocol + +Evolution-to-PR Pipeline: detect local improvements, classify generalizability, +package patches, and submit PRs to upstream selfmodel. + +Trigger: Every 10 MERGED Sprints (orchestration-loop.md Step 8.5). +Manual trigger: `/selfmodel:evolve` command. + +--- + +## Pipeline Overview + +``` +DETECT → STAGE → SUBMIT → TRACK + │ │ │ │ + │ scan │ class │ PR │ monitor + │ diffs │ ify │ create │ status + ▼ ▼ ▼ ▼ +CANDIDATE STAGED SUBMITTED ACCEPTED/REJECTED +``` + +Four phases, each with clear input/output boundaries. Detection is fully automated +and read-only. Staging requires interactive classification. Submission requires +human approval. Tracking is passive monitoring. + +--- + +## Phase 1: DETECT + +**Purpose**: Compare local playbook, hooks, and scripts against upstream baseline +to discover improvements worth contributing back. + +**Trigger conditions**: +- Automatic: orchestration-loop.md Step 8.5 fires every 10 MERGED sprints +- Manual: `/selfmodel:evolve --detect` +- Post-update: after `selfmodel update --remote` refreshes the baseline + +**Input sources** (scanned in order): + +| Source | What to scan | Signal type | +|--------|-------------|-------------| +| Playbook diffs | `git diff upstream/main -- .selfmodel/playbook/` | Direct improvement | +| Hook diffs | `git diff upstream/main -- .selfmodel/hooks/` or `scripts/*.sh` | Bug fix / enhancement | +| Script diffs | `git diff upstream/main -- scripts/` | Tool improvement | +| Validated lessons | `lessons-learned.md` entries with Result: improved | Proven pattern | +| Hook intercept patterns | `hook-intercepts.log` repeated blocks for same reason | False positive fix | +| Quality trends | `quality.jsonl` systematic score shifts | Calibration data | + +**Detection algorithm**: + +``` +1. Establish upstream baseline: + a. If git remote "upstream" exists: git fetch upstream && use upstream/main + b. Else if .selfmodel/state/upstream-baseline.sha exists: use stored SHA + c. Else: SKIP detection, output "no upstream baseline — run selfmodel update --remote" + +2. For each source, generate candidate list: + a. Playbook/hook/script diffs: + - git diff ..HEAD -- + - For each file with changes, create one CANDIDATE entry + b. Validated lessons: + - Parse lessons-learned.md for entries with "Result: improved" + - Cross-reference: if lesson's Action already reflected in a diff, skip (covered by diff) + - If lesson has no corresponding diff (pure process change), create CANDIDATE + c. Hook intercept patterns: + - Parse hook-intercepts.log for repeated blocks (same hook + same reason, 3+ occurrences) + - If corresponding hook script has a diff: enhance that CANDIDATE's evidence + - If no diff but pattern is clear: create CANDIDATE with category=hook_improvement + d. Quality trends: + - Parse quality.jsonl for systematic shifts (5+ sprints with same dimension trending) + - If quality-gates.md has threshold changes matching the trend: create CANDIDATE + +3. For each candidate, run 5 generalizability heuristics (see below) + +4. Write CANDIDATE entries to evolution.jsonl (append-only) + +5. Output summary: + "Detected N candidates: X playbook patches, Y hook fixes, Z new lessons" +``` + +**Output**: CANDIDATE entries in `evolution.jsonl`. Detection is read-only except +for appending to evolution.jsonl. + +--- + +## Phase 2: STAGE + +**Purpose**: Interactive classification of CANDIDATE entries. Human and Leader +collaborate to decide which improvements are generalizable. + +**Trigger**: `/selfmodel:evolve --stage` or automatic after DETECT when candidates exist. + +**Classification flow**: + +``` +For each CANDIDATE in evolution.jsonl (sorted by generalizability_score DESC): + + 1. Display summary: + [evo-2026-04-06-001] playbook_patch score=0.85 + .selfmodel/playbook/quality-gates.md + "Added AI Slop detection scoring rubric with 8 patterns" + + 2. Show diff preview: + git diff ..HEAD -- | head -40 + + 3. Leader recommends classification based on heuristics: + - score >= 0.7 → recommend STAGE + - score < 0.3 → recommend REJECT_PROJECT_SPECIFIC + - 0.3 <= score < 0.7 → recommend manual review + + 4. User decides: + [S]tage → status=STAGED, generate patch + [R]eject → status=REJECTED_PROJECT_SPECIFIC + [K]eep → status=CANDIDATE (revisit later) + [E]dit → modify summary/description before staging + + 5. For STAGED entries: + a. Generate patch file: + git diff ..HEAD -- > \ + .selfmodel/state/evolution-staging//.patch + b. Strip project-specific content from patch: + - Replace absolute paths with placeholder: /Users/*/project/ → / + - Replace project name with placeholder: + - Flag any remaining project-specific references for manual review + c. Update evolution.jsonl entry: staged_at= +``` + +**Output**: STAGED entries in evolution.jsonl, patch files in +`.selfmodel/state/evolution-staging//`. + +--- + +## Phase 3: SUBMIT + +**Purpose**: Create upstream PR from STAGED patches. Requires explicit human approval. + +**Trigger**: `/selfmodel:evolve --submit` (never automatic). + +**Submission flow**: + +``` +1. Collect all STAGED entries from evolution.jsonl + +2. Group by upstream_file for efficient PR packaging: + - Multiple changes to same file → single combined patch + - Unrelated files → can be in same PR if logically cohesive + +3. Pre-submission checks: + a. shellcheck: run shellcheck on all .sh files in staging + - FAIL → block submission, output errors, user must fix + b. Path audit: grep for absolute paths, project names, credentials + - Found → block submission, list violations + c. Patch applicability: attempt dry-run apply against upstream baseline + - CONFLICT → mark entries as CONFLICT, user must resolve or SUPERSEDE + +4. Generate PR content (see PR Template Format below) + +5. HUMAN APPROVAL GATE: + Display full PR preview (title, body, file list, diff stats) + "Submit this PR to upstream? [yes/no/edit]" + - no → abort, entries stay STAGED + - edit → user modifies PR content, re-display + - yes → proceed to submission + +6. Submit: + a. Fork upstream if not already forked + b. Create branch: evolution/- + c. Apply patches + d. Commit with evidence-rich message + e. gh pr create --repo --title --body <body> + f. Update evolution.jsonl entries: + - status=SUBMITTED + - submitted_at=<now> + - pr_url=<gh output> + - pr_status=open +``` + +**Output**: SUBMITTED entries in evolution.jsonl, open PR on upstream repo. + +--- + +## Phase 4: TRACK + +**Purpose**: Monitor submitted PRs and update local state accordingly. + +**Trigger**: `/selfmodel:evolve --track` or automatic during DETECT phase. + +**Tracking flow**: + +``` +1. For each SUBMITTED entry in evolution.jsonl: + + a. Query PR status: + gh pr view <pr_url> --json state,mergedAt,reviews + + b. Map status: + - merged → ACCEPTED, record reviewed_by from PR reviews + - closed (not merged) → REJECTED_UPSTREAM, record reviewer comments + - open + changes_requested → stays SUBMITTED, flag for user attention + - open + approved → stays SUBMITTED (waiting for maintainer merge) + - conflict detected → CONFLICT (upstream changed target file) + + c. Update evolution.jsonl entry with new status and metadata + +2. Handle CONFLICT: + - If upstream changed the target file after PR submission: + a. Mark old entry as SUPERSEDED + b. Create new CANDIDATE with updated diff against new upstream + c. Output: "evo-2026-04-06-001 superseded — upstream changed target, re-detect needed" + +3. Output summary: + "Tracked N PRs: X accepted, Y rejected, Z pending, W conflicted" +``` + +**Output**: Updated status in evolution.jsonl. + +--- + +## evolution.jsonl Schema + +Path: `.selfmodel/state/evolution.jsonl` + +Each line is a single JSON object. File is append-only (new entries appended, +status updates rewrite the specific line via `jq` or equivalent). + +```json +{ + "id": "evo-YYYY-MM-DD-NNN", + "status": "<status>", + "category": "<category>", + "source_file": "<relative path>", + "upstream_file": "<relative path>", + "summary": "<one-line description>", + "description": "<detailed explanation>", + "evidence": { + "sprints_affected": [], + "quality_trend": "<string or null>", + "hook_intercepts": 0, + "lessons_learned_ref": "<string or null>" + }, + "heuristic": "<heuristic_name>", + "generalizability_score": 0.0, + "generalizability_reason": "<why generalizable or not>", + "diff_stats": "+N -M lines in file.md", + "detected_at_sprint": 0, + "detected_at": "<ISO8601>", + "staged_at": null, + "submitted_at": null, + "pr_url": null, + "pr_status": null, + "reviewed_by": null, + "project_name": "<derived from git remote>", + "selfmodel_version": "<current version>" +} +``` + +### Field Reference + +| Field | Type | Description | +|-------|------|-------------| +| `id` | string | Unique identifier. Format: `evo-YYYY-MM-DD-NNN` where NNN is zero-padded sequence within the day. Example: `evo-2026-04-06-001` | +| `status` | enum | Lifecycle state. Values: `CANDIDATE`, `STAGED`, `SUBMITTED`, `ACCEPTED`, `REJECTED_PROJECT_SPECIFIC`, `REJECTED_UPSTREAM`, `CONFLICT`, `SUPERSEDED` | +| `category` | enum | Type of improvement. Values: `playbook_patch`, `hook_improvement`, `script_fix`, `new_lesson`, `new_playbook_page` | +| `source_file` | string | Relative path to the locally modified file (from project root). Example: `.selfmodel/playbook/quality-gates.md` | +| `upstream_file` | string | Relative path in the upstream selfmodel repo where the change would apply. Often identical to `source_file` but may differ if local structure diverges | +| `summary` | string | One-line human-readable description of the improvement. Max 120 characters | +| `description` | string | Detailed explanation of what changed, why it matters, and what problem it solves. No length limit | +| `evidence` | object | Supporting data for the improvement (see Evidence sub-schema) | +| `evidence.sprints_affected` | number[] | Sprint numbers where this improvement was relevant or would have prevented issues. Example: `[65, 66, 76]` | +| `evidence.quality_trend` | string\|null | Description of quality score trend that motivated this change. Example: `"code_quality avg dropped from 8.2 to 6.5 over sprints 40-50"`. Null if not trend-driven | +| `evidence.hook_intercepts` | number | Count of hook interceptions related to this improvement. From `hook-intercepts.log`. Zero if not hook-related | +| `evidence.lessons_learned_ref` | string\|null | Reference to lessons-learned.md entry. Example: `"Sprint 65-76: Merge conflicts"`. Null if no linked lesson | +| `heuristic` | enum | Which generalizability heuristic was applied. Values: `path_detection`, `project_name_detection`, `generic_pattern`, `hook_fix`, `scoring_calibration` | +| `generalizability_score` | float | Score from 0.0 (fully project-specific) to 1.0 (fully generalizable). Computed by heuristic rules | +| `generalizability_reason` | string | Human-readable explanation of why the score was assigned. Must reference specific evidence | +| `diff_stats` | string | Git diff statistics. Format: `+N -M lines in <filename>`. Example: `+45 -12 lines in quality-gates.md` | +| `detected_at_sprint` | number | Sprint number at which detection occurred. Derived from current sprint count in team.json | +| `detected_at` | string | ISO 8601 timestamp of detection. Example: `2026-04-06T14:30:00Z` | +| `staged_at` | string\|null | ISO 8601 timestamp of staging. Null until STAGED | +| `submitted_at` | string\|null | ISO 8601 timestamp of PR submission. Null until SUBMITTED | +| `pr_url` | string\|null | GitHub PR URL. Null until SUBMITTED. Example: `https://github.com/org/selfmodel/pull/42` | +| `pr_status` | string\|null | Current PR state. Values: `open`, `merged`, `closed`, `changes_requested`. Null until SUBMITTED | +| `reviewed_by` | string\|null | GitHub username(s) who reviewed the PR. Null until reviewed. Comma-separated for multiple reviewers | +| `project_name` | string | Derived from `git remote get-url origin` — extract repo/org name. Example: `myproject` | +| `selfmodel_version` | string | Version from `VERSION` file at time of detection. Example: `0.3.0` | + +### Status Lifecycle + +``` +CANDIDATE ──┬── STAGED ──── SUBMITTED ──┬── ACCEPTED + │ ├── REJECTED_UPSTREAM + │ └── CONFLICT ──── SUPERSEDED + ├── REJECTED_PROJECT_SPECIFIC + └── (stays CANDIDATE until revisited) +``` + +Transitions: +- `CANDIDATE → STAGED`: User approves during Stage phase +- `CANDIDATE → REJECTED_PROJECT_SPECIFIC`: User rejects as project-specific +- `STAGED → SUBMITTED`: PR created after human approval +- `SUBMITTED → ACCEPTED`: Upstream merges the PR +- `SUBMITTED → REJECTED_UPSTREAM`: Upstream closes PR without merging +- `SUBMITTED → CONFLICT`: Upstream changed target file, patch no longer applies +- `CONFLICT → SUPERSEDED`: New CANDIDATE created with updated diff +- Any status can stay indefinitely (no forced timeout) + +--- + +## Generalizability Heuristics + +Five heuristics evaluate whether a local improvement is worth contributing upstream. +Each heuristic produces a score component and reason. The final `generalizability_score` +is the weighted combination of applicable heuristics. + +### Heuristic 1: PATH_DETECTION + +**Purpose**: Detect absolute paths that make the change project-specific. + +**Rule**: Scan the diff hunks for patterns matching absolute filesystem paths +outside of example blocks or template placeholders. + +**Detection patterns**: +- `/Users/<username>/` — macOS home directory +- `/home/<username>/` — Linux home directory +- `/var/`, `/opt/`, `/srv/` followed by project-specific directory names +- Any path containing the project directory name derived from `git rev-parse --show-toplevel` + +**Exclusions** (do not flag): +- Paths inside fenced code blocks marked as example/template (` ```example `) +- Paths using placeholder syntax: `<project-root>/`, `$HOME/`, `~/.config/` +- Paths in comments explaining what to replace + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| No absolute paths in diff | +0.0 (neutral, no penalty) | +| Absolute paths only in examples/templates | +0.0 (neutral) | +| 1-2 absolute paths in non-example code | -0.4 (likely project-specific) | +| 3+ absolute paths in non-example code | -0.8 (definitely project-specific) | + +**Example**: +```diff +- timeout 180 gemini "@/Users/vvedition/Desktop/myproject/.selfmodel/inbox/gemini/sprint-1.md" ++ timeout 180 gemini "@${WORKTREE}/.selfmodel/inbox/gemini/sprint-1.md" +``` +This diff contains `/Users/vvedition/Desktop/myproject/` — a hardcoded path. +Score impact: -0.4. Reason: "diff contains 1 absolute path reference to project directory." + +### Heuristic 2: PROJECT_NAME_DETECTION + +**Purpose**: Detect references to the current project name that make the change project-specific. + +**Rule**: Extract the project name from two sources, then scan diff hunks for occurrences. + +**Name extraction**: +```bash +# Source 1: git remote URL +git remote get-url origin 2>/dev/null | sed 's/.*\///' | sed 's/\.git$//' + +# Source 2: directory name +basename "$(git rev-parse --show-toplevel)" +``` + +**Detection**: Case-insensitive search in diff hunks (excluding file path components, +since paths are handled by PATH_DETECTION). + +**Exclusions**: +- Project name appearing in a generic context: `"derived from git remote"` is fine +- Project name in CHANGELOG entries (project-specific by nature, but the pattern is generic) + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| No project name references | +0.0 (neutral) | +| Project name only in metadata fields (project_name, CHANGELOG) | +0.0 (neutral) | +| Project name in logic/rules/conditions | -0.5 (likely project-specific) | +| Project name in hardcoded strings/paths | -0.7 (definitely project-specific) | + +**Example**: +```diff ++ if [ "$PROJECT" = "vibe-sensei" ]; then ++ SPECIAL_FLAG=true ++ fi +``` +Score impact: -0.7. Reason: "diff contains hardcoded project name 'vibe-sensei' in conditional logic." + +### Heuristic 3: GENERIC_PATTERN + +**Purpose**: Identify new playbook sections, rules, or patterns that contain no +project-specific nouns and are broadly applicable. + +**Rule**: If the diff adds a new section (detected by heading markers: `## `, `### `) +or a substantial block (10+ lines) to a playbook file, and that block passes both +PATH_DETECTION and PROJECT_NAME_DETECTION with no penalties, it is a generic pattern. + +**Positive signals** (increase score): +- New section with heading that describes a universal concept (e.g., "AI Slop Detection", "Drift Detection") +- References to general software engineering practices, not project-specific workflows +- Uses placeholder variables instead of hardcoded values +- Block is self-contained (does not depend on project-specific sections) + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| New self-contained section, no project references | +0.8 (highly generalizable) | +| New section with minor project context that can be stripped | +0.5 (generalizable with edits) | +| Modification to existing section, generic improvement | +0.6 (likely generalizable) | +| Modification deeply intertwined with project-specific logic | +0.2 (low generalizability) | + +**Example**: +Adding an "AI Slop Detection" section to quality-gates.md with 8 universal patterns +and scoring rubric. No project names, no absolute paths. +Score impact: +0.8. Reason: "new self-contained section describing universal code quality patterns." + +### Heuristic 4: HOOK_FIX + +**Purpose**: Identify hook script changes that fix false positives or improve accuracy, +based on empirical evidence from intercept logs. + +**Rule**: A hook script diff qualifies when `hook-intercepts.log` contains entries +showing the old pattern caused incorrect blocks (false positives). + +**Evidence requirements**: +- At least 3 intercept log entries for the same hook + same reason pattern +- The diff modifies the matching logic (grep patterns, conditionals, allowlists) +- The fix does not introduce project-specific paths or names + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| Hook fix with 5+ false positive intercepts in log | +0.9 (strong evidence) | +| Hook fix with 3-4 false positive intercepts | +0.7 (moderate evidence) | +| Hook fix with <3 intercepts | +0.3 (weak evidence, may be coincidental) | +| Hook change with no intercept log correlation | +0.1 (speculative) | + +**Example**: +`enforce-agent-rules.sh` modified to accept `inbox/research/` directory for gemini agent. +Hook-intercepts.log shows 5 entries: `hook=enforce-agent-rules tool=Bash reason=gemini-no-inbox-file`. +Score impact: +0.9. Reason: "hook fix backed by 5 false positive intercepts; Researcher uses inbox/research/ not inbox/gemini/." + +### Heuristic 5: SCORING_CALIBRATION + +**Purpose**: Identify quality-gates.md threshold changes that are motivated by +empirical quality data trends. + +**Rule**: A threshold change in quality-gates.md qualifies when `quality.jsonl` +shows a systematic trend that motivated the recalibration. + +**Evidence requirements**: +- `quality.jsonl` contains 5+ entries showing a consistent pattern in the affected dimension +- The diff modifies scoring thresholds, rubric text, or calibration examples +- The change direction aligns with the observed trend (tightening if scores inflated, loosening if too harsh) + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| Threshold change with 10+ quality entries showing trend | +0.85 (strong calibration) | +| Threshold change with 5-9 quality entries showing trend | +0.65 (moderate calibration) | +| Threshold change with weak/ambiguous trend | +0.3 (speculative) | +| New calibration example based on real sprint data | +0.75 (empirical anchor) | + +**Example**: +quality-gates.md adds "AI Slop Detection" scoring penalties. quality.jsonl shows +Code Quality dimension averaged 8.5 over last 10 sprints despite visible slop patterns. +Score impact: +0.85. Reason: "scoring calibration backed by 10-sprint quality trend showing inflated Code Quality scores." + +### Score Combination + +When multiple heuristics apply (common: GENERIC_PATTERN + PATH_DETECTION): + +``` +final_score = base_positive + sum(negative_impacts) +``` + +Where: +- `base_positive` = highest positive score from applicable heuristics (GENERIC_PATTERN, HOOK_FIX, or SCORING_CALIBRATION) +- `negative_impacts` = sum of all negative scores from PATH_DETECTION and PROJECT_NAME_DETECTION + +Clamped to [0.0, 1.0]. + +**Example**: A new playbook section (GENERIC_PATTERN: +0.8) that contains one absolute path +(PATH_DETECTION: -0.4). Final score = 0.8 + (-0.4) = 0.4. Reason incorporates both: +"new generic section (+0.8) but contains 1 absolute path (-0.4); strip path before staging." + +--- + +## PR Template Format + +Used in SUBMIT phase when creating upstream PRs. + +```markdown +## Summary + +Community-discovered improvements from project usage (<project_name>, <N> sprints). + +These changes were detected by selfmodel's evolution pipeline, classified as +generalizable by heuristic analysis, and verified against local sprint data. + +## Changes + +| # | Category | File | Summary | Evidence | +|---|----------|------|---------|----------| +| 1 | <category> | <upstream_file> | <summary> | <sprints_affected>, <quality_trend or hook_intercepts> | +| 2 | ... | ... | ... | ... | + +## Per-Change Details + +### Change 1: <summary> + +**Category**: <category> +**Heuristic**: <heuristic> (score: <generalizability_score>) +**Reason**: <generalizability_reason> + +**What changed**: +<description> + +**Evidence**: +- Sprints affected: <sprints_affected> +- Quality trend: <quality_trend> +- Hook intercepts: <hook_intercepts> +- Lessons learned ref: <lessons_learned_ref> + +**Diff stats**: <diff_stats> + +--- + +(repeat for each change) + +## Testing + +- [ ] shellcheck passes on all modified .sh files +- [ ] `selfmodel status` runs without errors after applying changes +- [ ] No absolute paths or project-specific names in submitted code +- [ ] Patches apply cleanly to upstream HEAD + +## Context + +- selfmodel version: <selfmodel_version> +- Detection sprint: <detected_at_sprint> +- Project sprint count: <total_sprints> +- Evolution entries: <count of entries in this PR> +``` + +--- + +## Integration Points + +| System | Integration | Direction | +|--------|-------------|-----------| +| orchestration-loop.md Step 8.5 | Auto-triggers DETECT phase every 10 MERGED sprints. Leader checks `team.json.evolution.last_review_sprint` and compares to current merged count. If `current - last_review >= 10`, run DETECT | orchestration-loop → evolution | +| `/selfmodel:status` | Displays evolution pipeline status: counts by status (N candidates, M staged, K submitted, J accepted). Reads from `evolution.jsonl` | evolution → status display | +| `selfmodel update --remote` | Refreshes upstream baseline SHA. After update, stores new baseline in `.selfmodel/state/upstream-baseline.sha`. Enables DETECT to compute accurate diffs | update → evolution baseline | +| `team.json` evolution section | Stores persistent state: `last_review_sprint` (sprint number of last DETECT run), `candidate_count`, `staged_count`, `submitted_count`, `accepted_count`. Updated by each phase | evolution ↔ team.json | +| `CONTRIBUTING.md` (upstream) | Evolution PRs follow the upstream project's contributing standards. PR template references CONTRIBUTING.md if it exists | evolution → upstream standards | +| `lessons-learned.md` | DETECT phase scans for entries with `Result: improved` as evolution candidates. ACCEPTED upstream changes are cross-referenced back as validated lessons | lessons-learned ↔ evolution | +| `quality.jsonl` | DETECT phase analyzes quality trends for SCORING_CALIBRATION heuristic. Trend data becomes evidence in evolution entries | quality.jsonl → evolution evidence | +| `hook-intercepts.log` | DETECT phase scans for repeated false positive patterns for HOOK_FIX heuristic. Intercept counts become evidence in evolution entries | hook-intercepts.log → evolution evidence | + +### team.json Evolution Section Schema + +```json +{ + "evolution": { + "last_review_sprint": 0, + "candidate_count": 0, + "staged_count": 0, + "submitted_count": 0, + "accepted_count": 0, + "rejected_project_specific_count": 0, + "rejected_upstream_count": 0, + "last_detect_at": null, + "last_submit_at": null + } +} +``` + +--- + +## Safety Rules + +1. **Human MUST approve before any PR submission** — The SUBMIT phase has a mandatory + human approval gate. Leader displays the full PR preview and waits for explicit `yes`. + No automated submission. No "approve all" batch mode. + +2. **Detection is read-only** — The DETECT phase only reads from source files, logs, + and quality data. The only write operation is appending CANDIDATE entries to + `evolution.jsonl`. No source files are modified during detection. + +3. **Never submit project-specific content** — Before PR creation, the pipeline runs + a mandatory audit for: + - Absolute filesystem paths (`/Users/`, `/home/`, project root paths) + - Project name references in logic or hardcoded strings + - Credentials, API keys, tokens, or secrets + - Internal team member names or identifiers + Any finding blocks submission until resolved. + +4. **All patches must pass shellcheck before submission** — Every `.sh` file included + in a PR must pass `shellcheck` with zero warnings. This is enforced in the + SUBMIT phase pre-submission checks. No override mechanism. + +5. **evolution.jsonl is append-only** — New entries are appended. Status updates + modify existing entries in-place but never delete entries. The full history is + preserved for audit and trend analysis. Rotation policy: retain all entries + (per quality-gates.md log maintenance rules). + +6. **Upstream conflict means SUPERSEDE, never force** — When an upstream change + conflicts with a submitted patch: + - The existing entry transitions to SUPERSEDED status + - A new CANDIDATE is created with an updated diff against the new upstream state + - Force push is never used on upstream branches + - The old PR is closed with a comment explaining the supersession + +--- + +## Directory Structure + +``` +.selfmodel/ +├── state/ +│ ├── evolution.jsonl # Evolution entries (append-only) +│ ├── evolution-staging/ # Generated during STAGE phase +│ │ └── evo-2026-04-06-001/ +│ │ └── quality-gates.md.patch # Stripped patch file +│ └── upstream-baseline.sha # Upstream reference SHA +└── playbook/ + └── evolution-protocol.md # This file +``` + +--- + +## Operational Notes + +### First-Time Setup + +Before evolution can run, the project needs an upstream baseline: + +```bash +# Option A: Add upstream remote (preferred) +git remote add upstream <selfmodel-upstream-url> +git fetch upstream + +# Option B: Manual baseline (if no upstream remote) +echo "<known-good-sha>" > .selfmodel/state/upstream-baseline.sha +``` + +### Manual Invocation Examples + +```bash +# Full pipeline (interactive) +/selfmodel:evolve + +# Detection only (safe, read-only) +/selfmodel:evolve --detect + +# Stage candidates (interactive classification) +/selfmodel:evolve --stage + +# Submit staged patches (requires human approval) +/selfmodel:evolve --submit + +# Track submitted PRs +/selfmodel:evolve --track + +# Show pipeline status +/selfmodel:evolve --status +``` + +### Automatic Invocation + +The orchestration loop triggers DETECT automatically at Step 8.5 when the merged +sprint count crosses a 10-sprint boundary. The check: + +``` +current_merged = count(plan.md sprints with MERGED status) +last_review = team.json.evolution.last_review_sprint +if (current_merged - last_review) >= 10: + run DETECT phase + update team.json.evolution.last_review_sprint = current_merged +``` + +STAGE and SUBMIT are never automatic — they always require user interaction. diff --git a/.selfmodel/playbook/orchestration-loop.md b/.selfmodel/playbook/orchestration-loop.md index 89b6296..a669336 100644 --- a/.selfmodel/playbook/orchestration-loop.md +++ b/.selfmodel/playbook/orchestration-loop.md @@ -220,6 +220,16 @@ LOOP: - Append to quality.jsonl - Append to orchestration.log + 8.5. EVOLUTION CHECK (every 10 MERGED Sprints) + a. Read team.json → evolution.last_review_sprint + b. Count MERGED sprints since last review (from quality.jsonl or plan.md) + c. If count >= 10: + i. Run evolution detection (equivalent to selfmodel evolve --detect) + ii. Log: phase=<N> event=evolution_detect candidates=<N> + iii. If candidates > 0: notify user "N evolution candidates. Run /selfmodel:evolve" + iv. Update team.json: evolution.last_review_sprint = current_sprint + d. If count < 10: skip + 9. CHECK context health - Phase boundary (all sprints in current phase MERGED) → Phase Gate → FORCE RESET - Context > 70% → FORCE RESET diff --git a/.selfmodel/state/evolution.jsonl b/.selfmodel/state/evolution.jsonl new file mode 100644 index 0000000..e69de29 diff --git a/CLAUDE.md b/CLAUDE.md index 8a327b9..816a92d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -248,6 +248,7 @@ Contract template → read `.selfmodel/playbook/sprint-template.md` | Quality review + scoring | `.selfmodel/playbook/quality-gates.md` | | Sprint contract creation | `.selfmodel/playbook/sprint-template.md` | | Lessons learned + evolution | `.selfmodel/playbook/lessons-learned.md` | +| Evolution pipeline + upstream PR | `.selfmodel/playbook/evolution-protocol.md` | | Independent evaluation + skeptical prompt | `.selfmodel/playbook/evaluator-prompt.md` | | Automated orchestration loop (large projects) | `.selfmodel/playbook/orchestration-loop.md` | | E2E 验证协议 v2 | `.selfmodel/playbook/e2e-protocol-v2.md` | @@ -295,8 +296,9 @@ Contract template → read `.selfmodel/playbook/sprint-template.md` ## Evolution -**Trigger**: Every 10 Sprints completed +**Trigger**: Every 10 Sprints completed (auto-detected at orchestration-loop Step 8.5) **Cycle**: `MEASURE → DIAGNOSE → PROPOSE → EXPERIMENT → EVALUATE → SELECT` +**Pipeline**: `DETECT → STAGE → SUBMIT → TRACK` (for upstream contributions) 1. **MEASURE** — Extract trends from quality.jsonl 2. **DIAGNOSE** — Identify systemic bottlenecks @@ -305,6 +307,10 @@ Contract template → read `.selfmodel/playbook/sprint-template.md` 5. **EVALUATE** — Validate with data 6. **SELECT** — Effective → write to lessons-learned.md | Ineffective → discard with record +**Upstream contribution**: Validated improvements (Result: improved) are candidates +for upstream PRs. Run `selfmodel evolve --detect` or `/selfmodel:evolve` to scan. +Human approval required before any PR submission. Full protocol: `playbook/evolution-protocol.md`. + **Skill discovery**: New need → try existing skill → evaluate → keep or discard ## Danger Zones @@ -314,6 +320,7 @@ Contract template → read `.selfmodel/playbook/sprint-template.md` - Modifying `CLAUDE.md` (this file) - Modifying `.selfmodel/playbook/` rule files - Deleting `.selfmodel/state/` state files +- Submitting evolution PRs to upstream (`selfmodel evolve --submit`) - Force push to main ### ABSOLUTELY FORBIDDEN @@ -347,6 +354,7 @@ selfmodel/ ├── state/dispatch-config.json # Dispatch gate config (cap, convergence files) ├── state/quality.jsonl # Quality score history ├── state/evolution.jsonl # Evolution log + ├── state/evolution-staging/ # Staged evolution patches (pre-PR) ├── state/orchestration.log # Orchestration loop event log ├── reviews/ # Review records └── playbook/ # On-demand loaded rules diff --git a/README.md b/README.md index 3da786b..0cfd5cb 100644 --- a/README.md +++ b/README.md @@ -189,6 +189,15 @@ Evaluator E2E Agent v2 - **Self-evolution** — Every 10 sprints: MEASURE → DIAGNOSE → PROPOSE → EXPERIMENT → EVALUATE → SELECT. Hook interception logs feed into evolution analysis. - **Chaos testing (/rampage)** — "Be Water" philosophy. 4 surface engines (WEB, CLI, API, LIB) × 7 user personas (Impatient, Confused, Explorer, Multitasker, Edge Case, Abandoner, Speedrunner). Maps all user journeys, then walks each with chaotic behaviors. Advisory quality gate after E2E pass. +## Evolution Pipeline + +Every 10 completed sprints, selfmodel can turn validated local process improvements into upstream contributions through the Evolution Pipeline. It scans local diffs and lessons learned for reusable changes, stages only generalizable patches, and records pipeline state in `.selfmodel/state/evolution.jsonl`. Run `/selfmodel:evolve` for the guided workflow; full protocol: [`.selfmodel/playbook/evolution-protocol.md`](.selfmodel/playbook/evolution-protocol.md). + +- **DETECT** — Compare local playbook, hook, script, and lessons-learned changes against the upstream baseline to create CANDIDATE entries. +- **STAGE** — Interactively classify candidates, strip project-specific details, and generate patch files in `.selfmodel/state/evolution-staging/`. +- **SUBMIT** — Package staged patches into an upstream PR after path audits and applicability checks. Human approval is required before any submission. +- **TRACK** — Monitor open PRs and sync ACCEPTED, REJECTED, or CONFLICT states back into `evolution.jsonl`. + ## Chaos Testing: /rampage `/rampage` is a standalone Claude Code skill that acts as the most chaotic, boundary-pushing user imaginable. It finds bugs that systematic QA never catches: race conditions, state corruption, navigation traps, input edge cases. diff --git a/commands/evolve.md b/commands/evolve.md new file mode 100644 index 0000000..7dc4710 --- /dev/null +++ b/commands/evolve.md @@ -0,0 +1,100 @@ +--- +description: "Evolution-to-PR pipeline: detect local improvements, classify, and submit upstream" +allowed-tools: ["Read", "Write", "Edit", "Bash", "Glob", "Grep"] +argument-hint: "[--detect] [--stage] [--submit] [--track] [--status]" +--- + +# /selfmodel:evolve + +Run the Evolution-to-PR Pipeline per `{baseDir}/references/evolution-protocol.md`. + +## Prerequisites +- Git repo with selfmodel initialized (`.selfmodel/` exists) +- Upstream baseline available (git remote `upstream` or `.selfmodel/state/upstream-baseline.sha`) +- For `--submit`: `gh` CLI authenticated with upstream repo access + +## Modes + +### Default (no flags) +Run full interactive pipeline: DETECT → STAGE (interactive) → offer SUBMIT. + +### `--detect` +Detection only. Scan local diffs against upstream baseline. Append CANDIDATE entries +to `evolution.jsonl`. Read-only except for evolution.jsonl writes. Safe to run anytime. + +### `--stage` +Interactive classification. Walk through CANDIDATE entries, display diffs, recommend +classification based on generalizability heuristics. User decides: Stage / Reject / Keep. +Generate patch files for STAGED entries in `.selfmodel/state/evolution-staging/`. + +### `--submit` +Create upstream PR from STAGED patches. Pre-submission checks: shellcheck, path audit, +patch applicability. **Requires explicit human approval** before `gh pr create`. + +### `--track` +Monitor submitted PRs. Query status via `gh pr view`, update evolution.jsonl entries +to ACCEPTED / REJECTED_UPSTREAM / CONFLICT. Handle CONFLICT by creating SUPERSEDED +entries and new CANDIDATE entries with updated diffs. + +### `--status` +Display pipeline status summary without running any phase: +``` +Evolution Pipeline Status +───────────────────────── +CANDIDATE: 3 +STAGED: 2 +SUBMITTED: 1 (PR #42 open) +ACCEPTED: 5 +REJECTED_PROJECT_SPECIFIC: 4 +REJECTED_UPSTREAM: 0 +CONFLICT: 0 +SUPERSEDED: 1 +───────────────────────── +Last detect: Sprint 30 (2026-04-01) +Last submit: Sprint 20 (2026-03-15) +``` + +## Pipeline Steps + +1. **DETECT**: Compare local playbook/hooks/scripts against upstream baseline. + Sources: playbook diffs, hook diffs, script diffs, validated lessons (Result: improved), + hook intercept patterns, quality trends. Output: CANDIDATE entries in evolution.jsonl. + +2. **STAGE**: Interactive classification. Each CANDIDATE presented with diff preview + and heuristic recommendation. User decides: [S]tage / [R]eject / [K]eep / [E]dit. + STAGED entries produce patches in `.selfmodel/state/evolution-staging/<evo-id>/`. + +3. **SUBMIT**: Human-approved PR creation. Pre-checks (shellcheck, path audit, patch + applicability) → PR preview → human approval gate → `gh pr create`. PR template + includes evidence table from evolution.jsonl entries. + +4. **TRACK**: Monitor submitted PRs. ACCEPTED / REJECTED_UPSTREAM / CONFLICT. + CONFLICT triggers SUPERSEDE flow: old entry marked, new CANDIDATE created. + +## Generalizability Heuristics + +Five heuristics score each candidate (0.0 to 1.0): + +1. **PATH_DETECTION** — Absolute paths outside examples → project-specific +2. **PROJECT_NAME_DETECTION** — Project name in logic/strings → project-specific +3. **GENERIC_PATTERN** — New section without project nouns → generalizable +4. **HOOK_FIX** — Hook change + intercept log false positives → generalizable +5. **SCORING_CALIBRATION** — Threshold change + quality.jsonl trend → generalizable + +## Safety Rules + +- Human MUST approve before any PR submission (SUBMIT has mandatory gate) +- Detection is read-only (only writes evolution.jsonl) +- Never submit project-specific paths, names, or credentials +- All .sh patches must pass shellcheck +- evolution.jsonl is append-only (no deletions) +- Upstream conflict → SUPERSEDE, never force push + +## State Files + +| File | Purpose | +|------|---------| +| `.selfmodel/state/evolution.jsonl` | All evolution entries (append-only) | +| `.selfmodel/state/evolution-staging/<evo-id>/` | Patch files for STAGED entries | +| `.selfmodel/state/upstream-baseline.sha` | Upstream reference point | +| `.selfmodel/state/team.json` → `evolution` | Persistent counters and timestamps | diff --git a/scripts/selfmodel.sh b/scripts/selfmodel.sh index 0735724..2b88c66 100755 --- a/scripts/selfmodel.sh +++ b/scripts/selfmodel.sh @@ -1570,6 +1570,1592 @@ generate_team_table() { esac } +# ─── CMD: evolve ────────────────────────────────────────────────────────────── + +# Establish upstream baseline reference for diffing. +# Priority: git remote "upstream" → upstream-baseline.sha file → no baseline. +# Outputs the baseline ref to stdout; returns 1 if no baseline found. +evolve_establish_baseline() { + local dir="$1" + + # Option A: upstream remote exists + if git -C "$dir" remote get-url upstream &>/dev/null; then + if git -C "$dir" fetch upstream --quiet 2>/dev/null; then + echo "upstream/main" + return 0 + fi + # Fetch failed but remote exists — try anyway + if git -C "$dir" rev-parse upstream/main &>/dev/null; then + echo "upstream/main" + return 0 + fi + fi + + # Option B: stored baseline SHA + local sha_file="$dir/.selfmodel/state/upstream-baseline.sha" + if [[ -f "$sha_file" ]]; then + local sha + sha=$(tr -d '[:space:]' < "$sha_file") + if [[ -n "$sha" ]] && git -C "$dir" rev-parse "$sha" &>/dev/null; then + echo "$sha" + return 0 + fi + fi + + # No baseline available + return 1 +} + +# Scan diffs between baseline and HEAD for playbook/hooks/scripts. +# Outputs one line per changed file: <relative_path>\t<+lines>\t<-lines> +evolve_scan_diffs() { + local dir="$1" + local baseline="$2" + local paths=(".selfmodel/playbook/" ".selfmodel/hooks/" "scripts/") + + for scan_path in "${paths[@]}"; do + local diff_output + diff_output=$(git -C "$dir" diff --numstat "${baseline}..HEAD" -- "$scan_path" 2>/dev/null) || continue + if [[ -n "$diff_output" ]]; then + echo "$diff_output" + fi + done +} + +# Scan lessons-learned.md for entries with "Result: improved" that lack a +# corresponding diff. Outputs one line per lesson: <sprint_ref>\t<summary> +evolve_scan_lessons() { + local dir="$1" + local lessons_file="$dir/.selfmodel/playbook/lessons-learned.md" + + if [[ ! -f "$lessons_file" ]]; then + return 0 + fi + + # Extract lesson blocks that have "Result: improved" or "Result: 改善" + local in_block=false + local block_sprint="" + local block_lesson="" + local block_result_improved=false + + while IFS= read -r line; do + if [[ "$line" =~ ^###\ Sprint ]]; then + # Emit previous block if it was improved + if $block_result_improved && [[ -n "$block_sprint" ]]; then + printf '%s\t%s\n' "$block_sprint" "$block_lesson" + fi + block_sprint="${line#\#\#\# }" + block_lesson="" + block_result_improved=false + in_block=true + elif $in_block; then + if [[ "$line" =~ ^-\ \*\*Lesson\*\*:\ (.+) ]]; then + block_lesson="${BASH_REMATCH[1]}" + elif [[ "$line" =~ Result:.*improv ]] || [[ "$line" =~ Result:.*改善 ]]; then + block_result_improved=true + fi + fi + done < "$lessons_file" + # Emit last block + if $block_result_improved && [[ -n "$block_sprint" ]]; then + printf '%s\t%s\n' "$block_sprint" "$block_lesson" + fi +} + +# Scan hook-intercepts.log for patterns with 3+ occurrences of same hook+reason. +# Outputs one line per pattern: <hook>\t<reason>\t<count> +evolve_scan_intercepts() { + local dir="$1" + local log_file="$dir/.selfmodel/state/hook-intercepts.log" + + if [[ ! -f "$log_file" ]] || [[ ! -s "$log_file" ]]; then + return 0 + fi + + # Extract hook+reason pairs and count occurrences + sed -n 's/.*hook=\([^ ]*\).*reason=\([^ ]*\).*/\1\t\2/p' "$log_file" \ + | sort | uniq -c | sort -rn \ + | while IFS= read -r line; do + local count hook reason + count=$(printf '%s' "$line" | awk '{print $1}') + hook=$(printf '%s' "$line" | awk '{print $2}') + reason=$(printf '%s' "$line" | awk '{print $3}') + if [[ "$count" -ge 3 ]] && [[ -n "$hook" ]]; then + printf '%s\t%s\t%s\n' "$hook" "$reason" "$count" + fi + done +} + +# Run PATH_DETECTION heuristic on a diff string. +# Outputs: <score>\t<reason> +evolve_heuristic_path_detection() { + local dir="$1" + local diff_content="$2" + + local toplevel + toplevel=$(git -C "$dir" rev-parse --show-toplevel 2>/dev/null || echo "$dir") + local project_dir_name + project_dir_name=$(basename "$toplevel") + + # Count absolute paths in added lines (lines starting with +, not ++) + # Exclude lines inside fenced code blocks marked as example/template + # and lines using placeholder syntax + local path_count=0 + local in_example_block=false + + while IFS= read -r line; do + if [[ "$line" =~ ^\+.*\`\`\`.*example ]] || [[ "$line" =~ ^\+.*\`\`\`.*template ]]; then + in_example_block=true + continue + fi + if [[ "$line" =~ ^\+.*\`\`\` ]] && $in_example_block; then + in_example_block=false + continue + fi + if $in_example_block; then + continue + fi + + # Only look at added lines (not header lines ++) + if [[ "$line" =~ ^\+[^+] ]] || [[ "$line" =~ ^\+$ ]]; then + # Skip lines with placeholder syntax + if echo "$line" | grep -qE '<project-root>/|[$]HOME/|~/[.]config/'; then + continue + fi + # Check for absolute paths + if echo "$line" | grep -qE '/Users/[^/]+/|/home/[^/]+/'; then + path_count=$((path_count + 1)) + elif [[ -n "$project_dir_name" ]] && echo "$line" | grep -qE "/($project_dir_name)/"; then + path_count=$((path_count + 1)) + fi + fi + done <<< "$diff_content" + + if [[ "$path_count" -eq 0 ]]; then + printf '0.0\tno absolute paths detected in diff' + elif [[ "$path_count" -le 2 ]]; then + printf '%s\tdiff contains %d absolute path reference(s) — likely project-specific' "-0.4" "$path_count" + else + printf '%s\tdiff contains %d absolute path references — definitely project-specific' "-0.8" "$path_count" + fi +} + +# Run PROJECT_NAME_DETECTION heuristic on a diff string. +# Outputs: <score>\t<reason> +evolve_heuristic_project_name_detection() { + local dir="$1" + local diff_content="$2" + + # Extract project name from git remote and directory + local project_name_remote project_name_dir + project_name_remote=$(git -C "$dir" remote get-url origin 2>/dev/null | sed 's/.*\///' | sed 's/\.git$//' || true) + project_name_dir=$(basename "$(git -C "$dir" rev-parse --show-toplevel 2>/dev/null || echo "$dir")") + + local project_name="${project_name_remote:-$project_name_dir}" + + if [[ -z "$project_name" ]] || [[ "$project_name" == "." ]]; then + printf '0.0\tcould not determine project name' + return + fi + + # Count occurrences in added lines, excluding file paths and metadata fields + local name_in_logic=0 + local name_in_hardcoded=0 + + while IFS= read -r line; do + # Only check added lines + if [[ "$line" =~ ^\+[^+] ]]; then + # Skip lines that are metadata fields (project_name, CHANGELOG entries) + if echo "$line" | grep -qiE '"project_name"|changelog|"derived from'; then + continue + fi + # Check for project name in non-path context + # Use awk to strip path-like segments before checking + local stripped + stripped=$(printf '%s' "$line" | awk '{gsub(/\/[^ ]*/, ""); print}') + if echo "$stripped" | grep -qi "$project_name"; then + # Check if it's in a conditional or hardcoded string + if echo "$line" | grep -qE "if.*[\"'].*${project_name}|=.*[\"'].*${project_name}|\[.*${project_name}.*\]"; then + name_in_hardcoded=$((name_in_hardcoded + 1)) + else + name_in_logic=$((name_in_logic + 1)) + fi + fi + fi + done <<< "$diff_content" + + local total=$((name_in_logic + name_in_hardcoded)) + + if [[ "$total" -eq 0 ]]; then + printf '0.0\tno project name references detected' + elif [[ "$name_in_hardcoded" -gt 0 ]]; then + printf '%s\tdiff contains project name "%s" in hardcoded strings' "-0.7" "$project_name" + else + printf '%s\tdiff contains project name "%s" in logic/conditions' "-0.5" "$project_name" + fi +} + +# Run GENERIC_PATTERN heuristic on a diff string for a playbook file. +# Outputs: <score>\t<reason> +evolve_heuristic_generic_pattern() { + local diff_content="$1" + + # Check if diff adds new sections (## or ### headings) + local new_sections=0 + local added_lines=0 + local is_modification=false + + while IFS= read -r line; do + if [[ "$line" =~ ^\+[^+] ]]; then + added_lines=$((added_lines + 1)) + if [[ "$line" =~ ^\+###?\ .+ ]]; then + new_sections=$((new_sections + 1)) + fi + fi + if [[ "$line" =~ ^-[^-] ]]; then + is_modification=true + fi + done <<< "$diff_content" + + if [[ "$new_sections" -gt 0 ]] && [[ "$added_lines" -ge 10 ]]; then + printf '0.8\tnew self-contained section(s) (%d headings, %d lines added) — highly generalizable' "$new_sections" "$added_lines" + elif [[ "$new_sections" -gt 0 ]]; then + printf '0.5\tnew section with limited content — generalizable with edits' + elif $is_modification && [[ "$added_lines" -ge 5 ]]; then + printf '0.6\tmodification to existing section with generic improvement (%d lines)' "$added_lines" + elif $is_modification; then + printf '0.2\tminor modification — low generalizability' + else + printf '0.3\tsmall change — uncertain generalizability' + fi +} + +# Run HOOK_FIX heuristic: check if a hook script diff has evidence in intercepts. +# Outputs: <score>\t<reason> +evolve_heuristic_hook_fix() { + local dir="$1" + local source_file="$2" + + local log_file="$dir/.selfmodel/state/hook-intercepts.log" + if [[ ! -f "$log_file" ]] || [[ ! -s "$log_file" ]]; then + printf '0.1\tno hook intercept log available' + return + fi + + # Extract hook name from source file + local hook_name + hook_name=$(basename "$source_file" .sh) + + local intercept_count + intercept_count=$({ grep -c "hook=${hook_name} " "$log_file" || true; } 2>/dev/null | tr -d '[:space:]') + # Ensure we have a clean integer + intercept_count="${intercept_count:-0}" + [[ "$intercept_count" =~ ^[0-9]+$ ]] || intercept_count=0 + + if [[ "$intercept_count" -ge 5 ]]; then + printf '0.9\thook fix backed by %d intercepts — strong evidence' "$intercept_count" + elif [[ "$intercept_count" -ge 3 ]]; then + printf '0.7\thook fix backed by %d intercepts — moderate evidence' "$intercept_count" + elif [[ "$intercept_count" -gt 0 ]]; then + printf '0.3\thook fix with %d intercept(s) — weak evidence' "$intercept_count" + else + printf '0.1\thook change with no intercept log correlation' + fi +} + +# Run SCORING_CALIBRATION heuristic for quality-gates.md changes. +# Outputs: <score>\t<reason> +evolve_heuristic_scoring_calibration() { + local dir="$1" + + local quality_file="$dir/.selfmodel/state/quality.jsonl" + if [[ ! -f "$quality_file" ]] || [[ ! -s "$quality_file" ]]; then + printf '0.3\tno quality data available — speculative calibration' + return + fi + + local entry_count + entry_count=$(wc -l < "$quality_file" | tr -d ' ') + + if [[ "$entry_count" -ge 10 ]]; then + printf '0.85\tscoring calibration backed by %d quality entries — strong trend data' "$entry_count" + elif [[ "$entry_count" -ge 5 ]]; then + printf '0.65\tscoring calibration backed by %d quality entries — moderate data' "$entry_count" + else + printf '0.3\tscoring calibration with %d entries — weak/ambiguous trend' "$entry_count" + fi +} + +# Score all applicable heuristics for a given file diff. +# Outputs JSON: {"heuristic":"<name>","score":<float>,"reason":"<string>"} +evolve_score_heuristics() { + local dir="$1" + local source_file="$2" + local diff_content="$3" + + local base_positive=0 + local base_heuristic="" + local base_reason="" + local negative_sum=0 + local negative_reasons="" + + # Always run PATH_DETECTION + local path_result + path_result=$(evolve_heuristic_path_detection "$dir" "$diff_content") + local path_score path_reason + path_score=$(echo "$path_result" | cut -f1) + path_reason=$(echo "$path_result" | cut -f2-) + if [[ "$path_score" != "0.0" ]]; then + negative_sum=$(echo "$negative_sum + $path_score" | bc 2>/dev/null || echo "$path_score") + negative_reasons="path_detection: $path_reason" + fi + + # Always run PROJECT_NAME_DETECTION + local name_result + name_result=$(evolve_heuristic_project_name_detection "$dir" "$diff_content") + local name_score name_reason + name_score=$(echo "$name_result" | cut -f1) + name_reason=$(echo "$name_result" | cut -f2-) + if [[ "$name_score" != "0.0" ]]; then + negative_sum=$(echo "$negative_sum + $name_score" | bc 2>/dev/null || echo "$name_score") + if [[ -n "$negative_reasons" ]]; then + negative_reasons="$negative_reasons; project_name: $name_reason" + else + negative_reasons="project_name: $name_reason" + fi + fi + + # Determine which positive heuristic to run based on file type + if [[ "$source_file" == *hooks/* ]] || [[ "$source_file" == *enforce-*.sh ]]; then + local hook_result + hook_result=$(evolve_heuristic_hook_fix "$dir" "$source_file") + local hook_score hook_reason + hook_score=$(echo "$hook_result" | cut -f1) + hook_reason=$(echo "$hook_result" | cut -f2-) + # Use awk for float comparison (bc might not be available) + if awk "BEGIN {exit !($hook_score > $base_positive)}" 2>/dev/null; then + base_positive="$hook_score" + base_heuristic="hook_fix" + base_reason="$hook_reason" + fi + fi + + if [[ "$source_file" == *quality-gates* ]]; then + local cal_result + cal_result=$(evolve_heuristic_scoring_calibration "$dir") + local cal_score cal_reason + cal_score=$(echo "$cal_result" | cut -f1) + cal_reason=$(echo "$cal_result" | cut -f2-) + if awk "BEGIN {exit !($cal_score > $base_positive)}" 2>/dev/null; then + base_positive="$cal_score" + base_heuristic="scoring_calibration" + base_reason="$cal_reason" + fi + fi + + if [[ "$source_file" == *playbook/* ]] || [[ "$source_file" == *scripts/* ]]; then + local gen_result + gen_result=$(evolve_heuristic_generic_pattern "$diff_content") + local gen_score gen_reason + gen_score=$(echo "$gen_result" | cut -f1) + gen_reason=$(echo "$gen_result" | cut -f2-) + if awk "BEGIN {exit !($gen_score > $base_positive)}" 2>/dev/null; then + base_positive="$gen_score" + base_heuristic="generic_pattern" + base_reason="$gen_reason" + fi + fi + + # Default heuristic if nothing matched + if [[ -z "$base_heuristic" ]]; then + base_heuristic="generic_pattern" + base_positive="0.3" + base_reason="no specific heuristic matched — default score" + fi + + # Compute final score: base_positive + negative_sum, clamped to [0.0, 1.0] + local final_score + final_score=$(awk "BEGIN { + s = $base_positive + $negative_sum; + if (s < 0) s = 0; + if (s > 1) s = 1; + printf \"%.2f\", s + }" 2>/dev/null || echo "0.30") + + # Compose combined reason + local combined_reason="$base_reason" + if [[ -n "$negative_reasons" ]]; then + combined_reason="$base_reason; $negative_reasons" + fi + + # Output as tab-separated: heuristic, score, reason + printf '%s\t%s\t%s' "$base_heuristic" "$final_score" "$combined_reason" +} + +# Append a CANDIDATE entry to evolution.jsonl. +evolve_append_candidate() { + local dir="$1" + local source_file="$2" + local category="$3" + local summary="$4" + local description="$5" + local diff_stats="$6" + local heuristic="$7" + local score="$8" + local reason="$9" + local evidence_sprints="${10:-[]}" + local evidence_quality="${11:-null}" + local evidence_intercepts="${12:-0}" + local evidence_lesson="${13:-null}" + + local evo_file="$dir/.selfmodel/state/evolution.jsonl" + local team_file="$dir/.selfmodel/state/team.json" + local version_file="$dir/VERSION" + + # Generate unique ID: evo-YYYY-MM-DD-NNN + local today + today=$(date -u +%Y-%m-%d) + local existing_today=0 + if [[ -f "$evo_file" ]]; then + existing_today=$({ grep -c "\"evo-${today}-" "$evo_file" || true; } 2>/dev/null) + fi + local seq_num + seq_num=$(printf '%03d' $((existing_today + 1))) + local evo_id="evo-${today}-${seq_num}" + + # Get current sprint from team.json + local current_sprint=0 + if [[ -f "$team_file" ]]; then + current_sprint=$(jq -r '.current_sprint // 0' "$team_file" 2>/dev/null || echo 0) + fi + + # Get project name + local project_name + project_name=$(git -C "$dir" remote get-url origin 2>/dev/null | sed 's/.*\///' | sed 's/\.git$//' || basename "$dir") + + # Get selfmodel version + local sm_version="$SELFMODEL_VERSION" + if [[ -f "$version_file" ]]; then + sm_version=$(tr -d '[:space:]' < "$version_file") + fi + + local timestamp + timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ) + + # Build JSON entry using jq for correctness + local json_entry + json_entry=$(jq -cn \ + --arg id "$evo_id" \ + --arg status "CANDIDATE" \ + --arg category "$category" \ + --arg source_file "$source_file" \ + --arg upstream_file "$source_file" \ + --arg summary "$summary" \ + --arg description "$description" \ + --argjson evidence_sprints "$evidence_sprints" \ + --arg evidence_quality "$evidence_quality" \ + --argjson evidence_intercepts "$evidence_intercepts" \ + --arg evidence_lesson "$evidence_lesson" \ + --arg heuristic "$heuristic" \ + --argjson score "$score" \ + --arg reason "$reason" \ + --arg diff_stats "$diff_stats" \ + --argjson sprint "$current_sprint" \ + --arg detected_at "$timestamp" \ + --arg project_name "$project_name" \ + --arg sm_version "$sm_version" \ + '{ + id: $id, + status: $status, + category: $category, + source_file: $source_file, + upstream_file: $upstream_file, + summary: $summary, + description: $description, + evidence: { + sprints_affected: $evidence_sprints, + quality_trend: (if $evidence_quality == "null" then null else $evidence_quality end), + hook_intercepts: $evidence_intercepts, + lessons_learned_ref: (if $evidence_lesson == "null" then null else $evidence_lesson end) + }, + heuristic: $heuristic, + generalizability_score: $score, + generalizability_reason: $reason, + diff_stats: $diff_stats, + detected_at_sprint: $sprint, + detected_at: $detected_at, + staged_at: null, + submitted_at: null, + pr_url: null, + pr_status: null, + reviewed_by: null, + project_name: $project_name, + selfmodel_version: $sm_version + }') + + # Append to evolution.jsonl (create if needed) + echo "$json_entry" >> "$evo_file" + echo "$evo_id" +} + +# Determine the category of a changed file. +evolve_categorize_file() { + local filepath="$1" + case "$filepath" in + .selfmodel/playbook/*) echo "playbook_patch" ;; + .selfmodel/hooks/*) echo "hook_improvement" ;; + scripts/*.sh) echo "script_fix" ;; + scripts/*) echo "script_fix" ;; + *) echo "playbook_patch" ;; + esac +} + +# Generate a one-line summary from a diff for a given file. +evolve_summarize_diff() { + local dir="$1" + local filepath="$2" + local diff_content="$3" + + local added removed + added=$(printf '%s\n' "$diff_content" | { grep -c '^+[^+]' || true; } 2>/dev/null) + removed=$(printf '%s\n' "$diff_content" | { grep -c '^-[^-]' || true; } 2>/dev/null) + + # Extract first meaningful added line as context + local first_added + first_added=$(printf '%s\n' "$diff_content" | grep '^+[^+]' | head -1 | sed 's/^+//' | sed 's/^[[:space:]]*//' | head -c 80) + + local basename_file + basename_file=$(basename "$filepath") + echo "Modified $basename_file: $first_added (+${added}/-${removed} lines)" +} + +# Main detection logic: scan diffs, run heuristics, write candidates. +evolve_detect() { + local dir="$1" + local selfmodel_dir="$dir/.selfmodel" + + info "Starting evolution detection scan..." + + # Step 1: Establish baseline + local baseline + if ! baseline=$(evolve_establish_baseline "$dir"); then + warn "No upstream baseline available." + warn "Run 'selfmodel update --remote' to establish a baseline," + warn "or add a git remote named 'upstream'," + warn "or write a SHA to .selfmodel/state/upstream-baseline.sha" + return 1 + fi + info "Using baseline: $(bold "$baseline")" + + # Step 2: Scan diffs + local diff_files + diff_files=$(evolve_scan_diffs "$dir" "$baseline") + + if [[ -z "$diff_files" ]]; then + info "No diffs found between baseline and HEAD in playbook/hooks/scripts." + fi + + local candidate_count=0 + local playbook_count=0 + local hook_count=0 + local script_count=0 + local lesson_count=0 + + # Step 3: Process each changed file + if [[ -n "$diff_files" ]]; then + while IFS=$'\t' read -r added removed filepath; do + [[ -z "$filepath" ]] && continue + + local category + category=$(evolve_categorize_file "$filepath") + + # Get full diff for this file + local file_diff + file_diff=$(git -C "$dir" diff "${baseline}..HEAD" -- "$filepath" 2>/dev/null) + if [[ -z "$file_diff" ]]; then + continue + fi + + # Run heuristics + local heuristic_result + heuristic_result=$(evolve_score_heuristics "$dir" "$filepath" "$file_diff") + local heuristic_name heuristic_score heuristic_reason + heuristic_name=$(echo "$heuristic_result" | cut -f1) + heuristic_score=$(echo "$heuristic_result" | cut -f2) + heuristic_reason=$(echo "$heuristic_result" | cut -f3-) + + # Generate summary and description + local summary + summary=$(evolve_summarize_diff "$dir" "$filepath" "$file_diff") + summary="${summary:0:120}" + local description="Detected diff in $filepath against baseline $baseline. $heuristic_reason" + local basename_for_stats + basename_for_stats=$(basename "$filepath") + local diff_stats="+${added} -${removed} lines in ${basename_for_stats}" + + # Check for duplicate (same source_file already CANDIDATE) + local evo_file="$selfmodel_dir/state/evolution.jsonl" + if [[ -f "$evo_file" ]]; then + local existing + existing=$(jq -r "select(.source_file == \"$filepath\" and .status == \"CANDIDATE\") | .id" "$evo_file" 2>/dev/null | tail -1) + if [[ -n "$existing" ]]; then + continue + fi + fi + + # Append candidate + local evo_id + evo_id=$(evolve_append_candidate "$dir" "$filepath" "$category" \ + "$summary" "$description" "$diff_stats" \ + "$heuristic_name" "$heuristic_score" "$heuristic_reason") + + candidate_count=$((candidate_count + 1)) + case "$category" in + playbook_patch) playbook_count=$((playbook_count + 1)) ;; + hook_improvement) hook_count=$((hook_count + 1)) ;; + script_fix) script_count=$((script_count + 1)) ;; + esac + + info " [$evo_id] $category score=$heuristic_score $filepath" + done <<< "$diff_files" + fi + + # Step 4: Scan lessons-learned.md for validated lessons + local lessons + lessons=$(evolve_scan_lessons "$dir") + if [[ -n "$lessons" ]]; then + while IFS=$'\t' read -r sprint_ref lesson_text; do + [[ -z "$sprint_ref" ]] && continue + [[ -z "$lesson_text" ]] && continue + + # Check for duplicate lesson candidate + local evo_file="$selfmodel_dir/state/evolution.jsonl" + if [[ -f "$evo_file" ]]; then + local existing + existing=$(jq -r "select(.evidence.lessons_learned_ref == \"$sprint_ref\" and .status == \"CANDIDATE\") | .id" "$evo_file" 2>/dev/null | tail -1) + if [[ -n "$existing" ]]; then + continue + fi + fi + + local summary="Validated lesson: ${lesson_text:0:100}" + local evo_id + evo_id=$(evolve_append_candidate "$dir" ".selfmodel/playbook/lessons-learned.md" \ + "new_lesson" "$summary" "Lesson from $sprint_ref validated as improved. $lesson_text" \ + "lesson entry" "generic_pattern" "0.60" \ + "validated lesson with improved result — likely generalizable" \ + "[]" "null" "0" "$sprint_ref") + + candidate_count=$((candidate_count + 1)) + lesson_count=$((lesson_count + 1)) + info " [$evo_id] new_lesson score=0.60 $sprint_ref" + done <<< "$lessons" + fi + + # Step 5: Scan hook intercepts for recurring patterns + local intercepts + intercepts=$(evolve_scan_intercepts "$dir") + if [[ -n "$intercepts" ]]; then + while IFS=$'\t' read -r hook reason count; do + [[ -z "$hook" ]] && continue + + # Check if there's already a hook_improvement candidate for this hook + local evo_file="$selfmodel_dir/state/evolution.jsonl" + local hook_script_candidates=0 + if [[ -f "$evo_file" ]]; then + hook_script_candidates=$(jq -r "select(.source_file | contains(\"$hook\")) | .id" "$evo_file" 2>/dev/null | wc -l | tr -d ' ') + fi + + if [[ "$hook_script_candidates" -gt 0 ]]; then + # Already covered by a diff-based candidate; skip standalone intercept entry + continue + fi + + local summary="Hook intercept pattern: $hook ($reason, ${count}x)" + local score="0.70" + if [[ "$count" -ge 5 ]]; then + score="0.90" + fi + + local evo_id + evo_id=$(evolve_append_candidate "$dir" ".selfmodel/hooks/${hook}.sh" \ + "hook_improvement" "${summary:0:120}" \ + "Recurring hook intercept: hook=$hook reason=$reason repeated $count times" \ + "intercept pattern" "hook_fix" "$score" \ + "hook intercept pattern with $count occurrences — likely false positive to fix" \ + "[]" "null" "$count" "null") + + candidate_count=$((candidate_count + 1)) + hook_count=$((hook_count + 1)) + info " [$evo_id] hook_improvement score=$score $hook ($reason, ${count}x)" + done <<< "$intercepts" + fi + + # Step 6: Output summary + echo "" + if [[ "$candidate_count" -eq 0 ]]; then + ok "No evolution candidates detected." + else + ok "Detected $candidate_count candidate(s): $playbook_count playbook patch(es), $hook_count hook fix(es), $script_count script fix(es), $lesson_count lesson(s)" + fi + + return 0 +} + +# Display evolution pipeline status by reading evolution.jsonl. +# ─── evolve_stage: Interactive classification of CANDIDATE entries ────────── +evolve_stage() { + local dir="$1" + local evo_file="$dir/.selfmodel/state/evolution.jsonl" + local staging_dir="$dir/.selfmodel/state/evolution-staging" + + if [[ ! -f "$evo_file" ]] || [[ ! -s "$evo_file" ]]; then + info "No evolution entries found. Run 'selfmodel evolve --detect' first." + return 0 + fi + + # Collect CANDIDATE entries sorted by generalizability_score descending + local candidates + candidates=$(jq -c 'select(.status == "CANDIDATE")' "$evo_file" 2>/dev/null \ + | jq -s 'sort_by(-.generalizability_score)' 2>/dev/null) + + local total + total=$(echo "$candidates" | jq 'length' 2>/dev/null) + + if [[ "$total" -eq 0 || "$total" == "null" ]]; then + info "No candidates to stage. All entries are already classified." + return 0 + fi + + info "Found $total CANDIDATE entries to classify." + echo "" + + local staged_count=0 + local rejected_count=0 + local kept_count=0 + + # Resolve baseline for diff display + local baseline="" + baseline=$(evolve_establish_baseline "$dir" 2>/dev/null) || true + + local i=0 + while [[ $i -lt $total ]]; do + local entry + entry=$(echo "$candidates" | jq -c ".[$i]") + + local evo_id category source_file summary score reason + evo_id=$(echo "$entry" | jq -r '.id') + category=$(echo "$entry" | jq -r '.category') + source_file=$(echo "$entry" | jq -r '.source_file') + summary=$(echo "$entry" | jq -r '.summary') + score=$(echo "$entry" | jq -r '.generalizability_score') + reason=$(echo "$entry" | jq -r '.generalizability_reason') + + echo "════════════════════════════════════════════════════" + printf "[%d/%d] ${BOLD}%s${NC} %s score=%s\n" "$((i + 1))" "$total" "$evo_id" "$category" "$score" + printf " File: %s\n" "$source_file" + printf " Summary: %s\n" "$summary" + printf " Reason: %s\n" "$reason" + + # Show diff preview (first 20 lines) + if [[ -n "$baseline" ]]; then + echo "────────────────────────────────────────────────────" + echo " Diff preview (first 20 lines):" + local diff_preview + diff_preview=$(git -C "$dir" diff "${baseline}..HEAD" -- "$source_file" 2>/dev/null | head -20) || true + if [[ -n "$diff_preview" ]]; then + printf '%s\n' "$diff_preview" | sed 's/^/ /' + else + echo " (no diff available — file may be new or baseline unreachable)" + fi + fi + echo "────────────────────────────────────────────────────" + + # Heuristic recommendation + local recommend + if awk "BEGIN {exit !($score >= 0.6)}"; then + recommend="${GREEN}Recommend: Stage${NC}" + else + recommend="${YELLOW}Recommend: Skip${NC}" + fi + printf " %b\n" "$recommend" + + # Interactive prompt + printf "${CYAN}[selfmodel]${NC} [S]tage / [R]eject / [K]eep? " + read -r choice + + case "${choice,,}" in + s|stage) + # Update status to STAGED in evolution.jsonl + local timestamp + timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ) + local tmp_evo + tmp_evo=$(mktemp) + while IFS= read -r line; do + local line_id + line_id=$(echo "$line" | jq -r '.id' 2>/dev/null) + if [[ "$line_id" == "$evo_id" ]]; then + echo "$line" | jq -c --arg ts "$timestamp" \ + '.status = "STAGED" | .staged_at = $ts' + else + echo "$line" + fi + done < "$evo_file" > "$tmp_evo" + mv "$tmp_evo" "$evo_file" + + # Generate patch and metadata in staging directory + local entry_staging_dir="${staging_dir}/${evo_id}" + mkdir -p "$entry_staging_dir" + + # Generate patch.diff + if [[ -n "$baseline" ]]; then + git -C "$dir" diff "${baseline}..HEAD" -- "$source_file" \ + > "${entry_staging_dir}/patch.diff" 2>/dev/null || true + fi + + # Strip project-specific content from patch + if [[ -f "${entry_staging_dir}/patch.diff" ]]; then + local project_root + project_root=$(git -C "$dir" rev-parse --show-toplevel 2>/dev/null || echo "$dir") + # Replace absolute paths with placeholder + sed_inplace "s|${project_root}|<project-root>|g" "${entry_staging_dir}/patch.diff" + # Replace home directory paths + if [[ -n "${HOME:-}" ]]; then + sed_inplace "s|${HOME}|~|g" "${entry_staging_dir}/patch.diff" + fi + fi + + # Write metadata.json + local description evidence_json + description=$(echo "$entry" | jq -r '.description') + evidence_json=$(echo "$entry" | jq -c '.evidence') + jq -cn \ + --arg id "$evo_id" \ + --arg category "$category" \ + --arg summary "$summary" \ + --arg description "$description" \ + --argjson evidence "$evidence_json" \ + --argjson score "$score" \ + --arg reason "$reason" \ + --arg source_file "$source_file" \ + --arg staged_at "$timestamp" \ + '{ + id: $id, + category: $category, + summary: $summary, + description: $description, + evidence: $evidence, + generalizability_score: $score, + generalizability_reason: $reason, + source_file: $source_file, + staged_at: $staged_at + }' > "${entry_staging_dir}/metadata.json" + + ok "Staged: $evo_id → ${entry_staging_dir}/" + staged_count=$((staged_count + 1)) + ;; + r|reject) + # Update status to REJECTED_PROJECT_SPECIFIC + local tmp_evo + tmp_evo=$(mktemp) + while IFS= read -r line; do + local line_id + line_id=$(echo "$line" | jq -r '.id' 2>/dev/null) + if [[ "$line_id" == "$evo_id" ]]; then + echo "$line" | jq -c '.status = "REJECTED_PROJECT_SPECIFIC"' + else + echo "$line" + fi + done < "$evo_file" > "$tmp_evo" + mv "$tmp_evo" "$evo_file" + + warn "Rejected: $evo_id (project-specific)" + rejected_count=$((rejected_count + 1)) + ;; + k|keep|"") + info "Kept: $evo_id (will revisit later)" + kept_count=$((kept_count + 1)) + ;; + *) + warn "Unknown choice '$choice', keeping as CANDIDATE." + kept_count=$((kept_count + 1)) + ;; + esac + + echo "" + i=$((i + 1)) + done + + echo "════════════════════════════════════════════════════" + ok "Stage complete: $staged_count staged, $rejected_count rejected, $kept_count kept" +} + +# ─── evolve_submit: Create upstream PR from STAGED patches ────────────────── +evolve_submit() { + local dir="$1" + local evo_file="$dir/.selfmodel/state/evolution.jsonl" + local staging_dir="$dir/.selfmodel/state/evolution-staging" + + # Pre-check: gh CLI available + if ! command -v gh &>/dev/null; then + err "gh CLI is required for PR submission." + err "Install: https://cli.github.com/" + return 1 + fi + + # Pre-check: gh authenticated + if ! gh auth status &>/dev/null 2>&1; then + err "gh CLI not authenticated. Run 'gh auth login' first." + return 1 + fi + + if [[ ! -f "$evo_file" ]] || [[ ! -s "$evo_file" ]]; then + info "No evolution entries found." + return 0 + fi + + # Collect STAGED entries + local staged_entries + staged_entries=$(jq -c 'select(.status == "STAGED")' "$evo_file" 2>/dev/null \ + | jq -s '.' 2>/dev/null) + + local staged_count + staged_count=$(echo "$staged_entries" | jq 'length' 2>/dev/null) + + if [[ "$staged_count" -eq 0 || "$staged_count" == "null" ]]; then + info "No staged entries to submit. Run 'selfmodel evolve --stage' first." + return 0 + fi + + info "Found $staged_count STAGED entries for submission." + + # ── Pre-submission checks ──────────────────────────────────────────────── + local precheck_failed=false + + # Lint .sh patches with shell checker + local i=0 + while [[ $i -lt $staged_count ]]; do + local entry + entry=$(echo "$staged_entries" | jq -c ".[$i]") + local evo_id source_file + evo_id=$(echo "$entry" | jq -r '.id') + source_file=$(echo "$entry" | jq -r '.source_file') + + if [[ "$source_file" == *.sh ]]; then + local patch_dir="${staging_dir}/${evo_id}" + if [[ -f "${patch_dir}/patch.diff" ]]; then + # Check if lint tool is available + if command -v shellcheck &>/dev/null; then + # Run lint on the source file + local tmp_file + tmp_file=$(mktemp) + if [[ -f "$dir/$source_file" ]]; then + cp "$dir/$source_file" "$tmp_file" + if ! shellcheck -S warning "$tmp_file" &>/dev/null; then + warn "Shell lint warnings in $source_file (evo: $evo_id)" + shellcheck -S warning "$tmp_file" 2>&1 | head -20 + precheck_failed=true + fi + fi + rm -f "$tmp_file" + fi + fi + fi + + i=$((i + 1)) + done + + # Path audit: scan patches for absolute paths, secrets patterns + i=0 + while [[ $i -lt $staged_count ]]; do + local entry + entry=$(echo "$staged_entries" | jq -c ".[$i]") + local evo_id + evo_id=$(echo "$entry" | jq -r '.id') + local patch_file="${staging_dir}/${evo_id}/patch.diff" + + if [[ -f "$patch_file" ]]; then + # Check for absolute paths (excluding <project-root> placeholders) + local abs_paths + abs_paths=$(grep -nE '^\+.*(/Users/|/home/|/var/|/opt/|/srv/)' "$patch_file" 2>/dev/null \ + | grep -v '<project-root>' || true) + if [[ -n "$abs_paths" ]]; then + warn "Absolute paths found in $evo_id patch:" + printf '%s\n' "$abs_paths" | head -5 + precheck_failed=true + fi + + # Check for secrets patterns + local secrets + secrets=$(grep -niE '(API_KEY|TOKEN|PASSWORD|SECRET|CREDENTIAL|PRIVATE_KEY)=' "$patch_file" 2>/dev/null \ + | grep '^+' || true) + if [[ -n "$secrets" ]]; then + err "Potential secrets found in $evo_id patch:" + printf '%s\n' "$secrets" | head -5 + precheck_failed=true + fi + fi + + i=$((i + 1)) + done + + if [[ "$precheck_failed" == "true" ]]; then + warn "Pre-submission checks found issues." + printf "${CYAN}[selfmodel]${NC} Continue despite warnings? [y/N] " + read -r reply + if [[ ! "$reply" =~ ^[Yy] ]]; then + info "Submission aborted. Fix issues and re-run." + return 0 + fi + fi + + # ── PR Preview ─────────────────────────────────────────────────────────── + local project_name + project_name=$(git -C "$dir" remote get-url origin 2>/dev/null \ + | sed 's/.*\///' | sed 's/\.git$//' || basename "$dir") + + local sm_version="$SELFMODEL_VERSION" + if [[ -f "$dir/VERSION" ]]; then + sm_version=$(tr -d '[:space:]' < "$dir/VERSION") + fi + + local current_sprint=0 + if [[ -f "$dir/.selfmodel/state/team.json" ]]; then + current_sprint=$(jq -r '.current_sprint // 0' "$dir/.selfmodel/state/team.json" 2>/dev/null || echo 0) + fi + + local pr_title="feat(evolution): improvements from ${project_name} (${staged_count} changes)" + + echo "" + echo "════════════════════════════════════════════════════" + echo " PR Preview" + echo "════════════════════════════════════════════════════" + printf " Title: %s\n" "$pr_title" + echo " Changes:" + echo "────────────────────────────────────────────────────" + printf " %-4s %-22s %-30s %s\n" "#" "Category" "File" "Score" + + i=0 + while [[ $i -lt $staged_count ]]; do + local entry + entry=$(echo "$staged_entries" | jq -c ".[$i]") + local evo_id category source_file score + evo_id=$(echo "$entry" | jq -r '.id') + category=$(echo "$entry" | jq -r '.category') + source_file=$(echo "$entry" | jq -r '.source_file') + score=$(echo "$entry" | jq -r '.generalizability_score') + printf " %-4d %-22s %-30s %s\n" "$((i + 1))" "$category" "$source_file" "$score" + i=$((i + 1)) + done + + echo "════════════════════════════════════════════════════" + echo "" + + # ── Human confirmation gate ────────────────────────────────────────────── + printf "${CYAN}[selfmodel]${NC} Submit PR to upstream? [y/N] " + read -r submit_reply + if [[ ! "$submit_reply" =~ ^[Yy] ]]; then + info "Submission cancelled. Entries remain STAGED." + return 0 + fi + + # ── Build PR body ──────────────────────────────────────────────────────── + local pr_body + pr_body="## Summary"$'\n'$'\n' + pr_body+="Community-discovered improvements from project usage (${project_name}, sprint ${current_sprint})."$'\n'$'\n' + pr_body+="These changes were detected by selfmodel's evolution pipeline, classified as"$'\n' + pr_body+="generalizable by heuristic analysis, and verified against local sprint data."$'\n'$'\n' + pr_body+="## Changes"$'\n'$'\n' + pr_body+="| # | Category | File | Summary | Score |"$'\n' + pr_body+="|---|----------|------|---------|-------|"$'\n' + + i=0 + while [[ $i -lt $staged_count ]]; do + local entry + entry=$(echo "$staged_entries" | jq -c ".[$i]") + local summary category upstream_file score + summary=$(echo "$entry" | jq -r '.summary') + category=$(echo "$entry" | jq -r '.category') + upstream_file=$(echo "$entry" | jq -r '.upstream_file') + score=$(echo "$entry" | jq -r '.generalizability_score') + pr_body+="| $((i + 1)) | ${category} | ${upstream_file} | ${summary} | ${score} |"$'\n' + i=$((i + 1)) + done + + pr_body+=$'\n'"## Per-Change Details"$'\n' + + i=0 + while [[ $i -lt $staged_count ]]; do + local entry + entry=$(echo "$staged_entries" | jq -c ".[$i]") + local evo_id summary category heuristic score reason description diff_stats + local sprints_affected quality_trend hook_intercepts lessons_ref + evo_id=$(echo "$entry" | jq -r '.id') + summary=$(echo "$entry" | jq -r '.summary') + category=$(echo "$entry" | jq -r '.category') + heuristic=$(echo "$entry" | jq -r '.heuristic') + score=$(echo "$entry" | jq -r '.generalizability_score') + reason=$(echo "$entry" | jq -r '.generalizability_reason') + description=$(echo "$entry" | jq -r '.description') + diff_stats=$(echo "$entry" | jq -r '.diff_stats') + sprints_affected=$(echo "$entry" | jq -r '.evidence.sprints_affected // []') + quality_trend=$(echo "$entry" | jq -r '.evidence.quality_trend // "N/A"') + hook_intercepts=$(echo "$entry" | jq -r '.evidence.hook_intercepts // 0') + lessons_ref=$(echo "$entry" | jq -r '.evidence.lessons_learned_ref // "N/A"') + + pr_body+=$'\n'"### Change $((i + 1)): ${summary}"$'\n'$'\n' + pr_body+="**ID**: ${evo_id}"$'\n' + pr_body+="**Category**: ${category}"$'\n' + pr_body+="**Heuristic**: ${heuristic} (score: ${score})"$'\n' + pr_body+="**Reason**: ${reason}"$'\n'$'\n' + pr_body+="**What changed**: ${description}"$'\n'$'\n' + pr_body+="**Evidence**:"$'\n' + pr_body+="- Sprints affected: ${sprints_affected}"$'\n' + pr_body+="- Quality trend: ${quality_trend}"$'\n' + pr_body+="- Hook intercepts: ${hook_intercepts}"$'\n' + pr_body+="- Lessons learned ref: ${lessons_ref}"$'\n'$'\n' + pr_body+="**Diff stats**: ${diff_stats}"$'\n' + pr_body+=$'\n'"---"$'\n' + + i=$((i + 1)) + done + + pr_body+=$'\n'"## Testing"$'\n'$'\n' + pr_body+="- [ ] shellcheck passes on all modified .sh files"$'\n' + pr_body+="- [ ] \`selfmodel status\` runs without errors after applying changes"$'\n' + pr_body+="- [ ] No absolute paths or project-specific names in submitted code"$'\n' + pr_body+="- [ ] Patches apply cleanly to upstream HEAD"$'\n'$'\n' + pr_body+="## Context"$'\n'$'\n' + pr_body+="- selfmodel version: ${sm_version}"$'\n' + pr_body+="- Project sprint count: ${current_sprint}"$'\n' + pr_body+="- Evolution entries: ${staged_count}"$'\n' + + # ── Create PR via gh ───────────────────────────────────────────────────── + + # Determine upstream repo from remote + local upstream_repo + upstream_repo=$(git -C "$dir" remote get-url upstream 2>/dev/null \ + | sed 's|.*github\.com[:/]||' | sed 's/\.git$//') || true + + if [[ -z "$upstream_repo" ]]; then + # Fall back to origin if no upstream remote + upstream_repo=$(git -C "$dir" remote get-url origin 2>/dev/null \ + | sed 's|.*github\.com[:/]||' | sed 's/\.git$//') || true + fi + + if [[ -z "$upstream_repo" ]]; then + err "Could not determine upstream repository URL." + return 1 + fi + + # Create a branch for the PR + local branch_date + branch_date=$(date -u +%Y%m%d) + local branch_name="evolution/${project_name}-${branch_date}" + + info "Creating branch: $branch_name" + if ! git -C "$dir" checkout -b "$branch_name" 2>/dev/null; then + warn "Branch $branch_name may already exist. Attempting to use it." + git -C "$dir" checkout "$branch_name" 2>/dev/null || { + err "Failed to create or switch to branch $branch_name" + return 1 + } + fi + + # Apply staged patches + local apply_failed=false + i=0 + while [[ $i -lt $staged_count ]]; do + local entry + entry=$(echo "$staged_entries" | jq -c ".[$i]") + local evo_id + evo_id=$(echo "$entry" | jq -r '.id') + local patch_file="${staging_dir}/${evo_id}/patch.diff" + + if [[ -f "$patch_file" ]] && [[ -s "$patch_file" ]]; then + if ! git -C "$dir" apply --check "$patch_file" 2>/dev/null; then + warn "Patch $evo_id does not apply cleanly. Skipping." + apply_failed=true + else + git -C "$dir" apply "$patch_file" 2>/dev/null || { + warn "Failed to apply patch $evo_id." + apply_failed=true + } + fi + fi + + i=$((i + 1)) + done + + if [[ "$apply_failed" == "true" ]]; then + warn "Some patches did not apply. PR will include only successfully applied changes." + fi + + # Commit applied changes + git -C "$dir" add -A 2>/dev/null || true + local has_changes + has_changes=$(git -C "$dir" diff --cached --name-only 2>/dev/null) + + if [[ -z "$has_changes" ]]; then + warn "No changes to commit after applying patches." + git -C "$dir" checkout - 2>/dev/null || true + return 0 + fi + + git -C "$dir" commit -m "feat(evolution): ${staged_count} improvements from ${project_name}" 2>/dev/null || { + err "Failed to commit changes." + git -C "$dir" checkout - 2>/dev/null || true + return 1 + } + + # Push and create PR + info "Pushing branch and creating PR..." + git -C "$dir" push -u origin "$branch_name" 2>/dev/null || { + err "Failed to push branch. Check your git remote configuration." + git -C "$dir" checkout - 2>/dev/null || true + return 1 + } + + local pr_url + pr_url=$(gh pr create \ + --repo "$upstream_repo" \ + --title "$pr_title" \ + --body "$pr_body" \ + --head "$branch_name" 2>&1) || { + err "Failed to create PR: $pr_url" + git -C "$dir" checkout - 2>/dev/null || true + return 1 + } + + ok "PR created: $pr_url" + + # Return to previous branch + git -C "$dir" checkout - 2>/dev/null || true + + # ── Update evolution.jsonl: STAGED → SUBMITTED ─────────────────────────── + local submit_timestamp + submit_timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ) + + local tmp_evo + tmp_evo=$(mktemp) + while IFS= read -r line; do + local line_status + line_status=$(echo "$line" | jq -r '.status' 2>/dev/null) + if [[ "$line_status" == "STAGED" ]]; then + echo "$line" | jq -c \ + --arg ts "$submit_timestamp" \ + --arg url "$pr_url" \ + '.status = "SUBMITTED" | .submitted_at = $ts | .pr_url = $url | .pr_status = "open"' + else + echo "$line" + fi + done < "$evo_file" > "$tmp_evo" + mv "$tmp_evo" "$evo_file" + + ok "Updated $staged_count entries to SUBMITTED status." +} + +# ─── evolve_track: Monitor submitted PR statuses ─────────────────────────── +evolve_track() { + local dir="$1" + local evo_file="$dir/.selfmodel/state/evolution.jsonl" + + # Check gh CLI availability + if ! command -v gh &>/dev/null; then + err "gh CLI is required for PR tracking." + err "Install: https://cli.github.com/" + return 1 + fi + + if [[ ! -f "$evo_file" ]] || [[ ! -s "$evo_file" ]]; then + info "No evolution entries found." + return 0 + fi + + # Collect SUBMITTED entries with pr_url + local submitted + submitted=$(jq -c 'select(.status == "SUBMITTED" and .pr_url != null)' "$evo_file" 2>/dev/null \ + | jq -s '.' 2>/dev/null) + + local total + total=$(echo "$submitted" | jq 'length' 2>/dev/null) + + if [[ "$total" -eq 0 || "$total" == "null" ]]; then + info "No submitted PRs to track." + return 0 + fi + + info "Tracking $total submitted PRs..." + + local accepted_count=0 + local rejected_count=0 + local pending_count=0 + + local i=0 + while [[ $i -lt $total ]]; do + local entry + entry=$(echo "$submitted" | jq -c ".[$i]") + local evo_id pr_url + evo_id=$(echo "$entry" | jq -r '.id') + pr_url=$(echo "$entry" | jq -r '.pr_url') + + # Query PR status via gh + local pr_json + pr_json=$(gh pr view "$pr_url" --json state,mergedAt,reviews 2>/dev/null) || { + warn "Could not query PR status for $evo_id ($pr_url)" + pending_count=$((pending_count + 1)) + i=$((i + 1)) + continue + } + + local pr_state + pr_state=$(echo "$pr_json" | jq -r '.state // "UNKNOWN"' 2>/dev/null) + + local new_status="" + local new_pr_status="" + local reviewed_by="" + + case "$pr_state" in + MERGED) + new_status="ACCEPTED" + new_pr_status="merged" + reviewed_by=$(echo "$pr_json" | jq -r '[.reviews[]?.author.login // empty] | unique | join(",")' 2>/dev/null) || true + accepted_count=$((accepted_count + 1)) + ok "$evo_id: ACCEPTED (merged)" + ;; + CLOSED) + new_status="REJECTED_UPSTREAM" + new_pr_status="closed" + reviewed_by=$(echo "$pr_json" | jq -r '[.reviews[]?.author.login // empty] | unique | join(",")' 2>/dev/null) || true + rejected_count=$((rejected_count + 1)) + warn "$evo_id: REJECTED_UPSTREAM (closed without merge)" + ;; + OPEN) + # Check if changes requested + local has_changes_requested + has_changes_requested=$(echo "$pr_json" | jq '[.reviews[]? | select(.state == "CHANGES_REQUESTED")] | length' 2>/dev/null) || true + if [[ "$has_changes_requested" -gt 0 ]]; then + new_pr_status="changes_requested" + info "$evo_id: open (changes requested — needs attention)" + else + new_pr_status="open" + info "$evo_id: open (pending review)" + fi + pending_count=$((pending_count + 1)) + ;; + *) + info "$evo_id: unknown state ($pr_state)" + pending_count=$((pending_count + 1)) + ;; + esac + + # Update evolution.jsonl if status changed + if [[ -n "$new_status" || -n "$new_pr_status" ]]; then + local tmp_evo + tmp_evo=$(mktemp) + while IFS= read -r line; do + local line_id + line_id=$(echo "$line" | jq -r '.id' 2>/dev/null) + if [[ "$line_id" == "$evo_id" ]]; then + local updated_line="$line" + if [[ -n "$new_status" ]]; then + updated_line=$(echo "$updated_line" | jq -c --arg s "$new_status" '.status = $s') + fi + if [[ -n "$new_pr_status" ]]; then + updated_line=$(echo "$updated_line" | jq -c --arg ps "$new_pr_status" '.pr_status = $ps') + fi + if [[ -n "$reviewed_by" ]]; then + updated_line=$(echo "$updated_line" | jq -c --arg rb "$reviewed_by" '.reviewed_by = $rb') + fi + echo "$updated_line" + else + echo "$line" + fi + done < "$evo_file" > "$tmp_evo" + mv "$tmp_evo" "$evo_file" + fi + + i=$((i + 1)) + done + + echo "────────────────────────────────────────────────────" + ok "Tracked $total PRs: $accepted_count accepted, $rejected_count rejected, $pending_count pending" +} + +# ─── evolve_status: Display evolution pipeline status ─────────────────────── +evolve_status() { + local dir="$1" + local evo_file="$dir/.selfmodel/state/evolution.jsonl" + local team_file="$dir/.selfmodel/state/team.json" + + if [[ ! -f "$evo_file" ]] || [[ ! -s "$evo_file" ]]; then + info "No evolution entries found. Run 'selfmodel evolve --detect' first." + return 0 + fi + + echo "Evolution Pipeline Status" + echo "═════════════════════════════════════════" + + # Count by status + local statuses=("CANDIDATE" "STAGED" "SUBMITTED" "ACCEPTED" "REJECTED_PROJECT_SPECIFIC" "REJECTED_UPSTREAM" "CONFLICT" "SUPERSEDED") + local total_entries=0 + for status in "${statuses[@]}"; do + local count + count=$(jq -r "select(.status == \"$status\") | .id" "$evo_file" 2>/dev/null | wc -l | tr -d ' ') + if [[ "$count" -gt 0 ]]; then + printf ' %-30s %d\n' "${status}:" "$count" + total_entries=$((total_entries + count)) + fi + done + echo " ─────────────────────────────────" + printf ' %-30s %d\n' "Total:" "$total_entries" + + echo "═════════════════════════════════════════" + + # Timestamps section + echo "Timestamps" + echo "─────────────────────────────────────────" + + # Last detect info + local last_detect_sprint last_detect_date + last_detect_sprint=$(jq -r '.detected_at_sprint // 0' "$evo_file" 2>/dev/null | sort -rn | head -1) + last_detect_date=$(jq -r '.detected_at // ""' "$evo_file" 2>/dev/null | sort -r | head -1) + if [[ -n "$last_detect_date" && "$last_detect_date" != "null" ]]; then + printf ' Last detect: Sprint %s (%s)\n' "$last_detect_sprint" "${last_detect_date%%T*}" + else + echo " Last detect: N/A" + fi + + # Last submit timestamp + local last_submit_date + last_submit_date=$(jq -r 'select(.submitted_at != null) | .submitted_at' "$evo_file" 2>/dev/null | sort -r | head -1) + if [[ -n "$last_submit_date" && "$last_submit_date" != "null" ]]; then + printf ' Last submit: %s\n' "${last_submit_date%%T*}" + else + echo " Last submit: N/A" + fi + + # Last staged timestamp + local last_staged_date + last_staged_date=$(jq -r 'select(.staged_at != null) | .staged_at' "$evo_file" 2>/dev/null | sort -r | head -1) + if [[ -n "$last_staged_date" && "$last_staged_date" != "null" ]]; then + printf ' Last staged: %s\n' "${last_staged_date%%T*}" + else + echo " Last staged: N/A" + fi + + # Next detect estimate (every 10 sprints) + if [[ -f "$team_file" ]]; then + local current_sprint last_review + current_sprint=$(jq -r '.current_sprint // 0' "$team_file" 2>/dev/null) + last_review=$(jq -r '.evolution.last_review_sprint // 0' "$team_file" 2>/dev/null) + local next_detect=$((last_review + 10)) + if [[ "$next_detect" -le "$current_sprint" ]]; then + echo " Next detect: now (overdue)" + else + printf ' Next detect: ~Sprint %d\n' "$next_detect" + fi + fi + + # Submitted PRs section + local submitted_prs + submitted_prs=$(jq -c 'select(.pr_url != null)' "$evo_file" 2>/dev/null | jq -s '.' 2>/dev/null) + local pr_count + pr_count=$(echo "$submitted_prs" | jq 'length' 2>/dev/null) + + if [[ "$pr_count" -gt 0 && "$pr_count" != "null" ]]; then + echo "═════════════════════════════════════════" + echo "Submitted PRs" + echo "─────────────────────────────────────────" + + local j=0 + while [[ $j -lt $pr_count ]]; do + local pr_entry + pr_entry=$(echo "$submitted_prs" | jq -c ".[$j]") + local evo_id pr_url pr_status + evo_id=$(echo "$pr_entry" | jq -r '.id') + pr_url=$(echo "$pr_entry" | jq -r '.pr_url') + pr_status=$(echo "$pr_entry" | jq -r '.pr_status // "unknown"') + + local status_icon + case "$pr_status" in + open) status_icon="🔵" ;; + merged) status_icon="🟢" ;; + closed) status_icon="🔴" ;; + changes_requested) status_icon="🟡" ;; + *) status_icon="⚪" ;; + esac + + printf ' %s %-24s %s %s\n' "$status_icon" "$evo_id" "$pr_status" "$pr_url" + j=$((j + 1)) + done + fi + + echo "═════════════════════════════════════════" +} + +# Main evolve command: parse flags and route. +cmd_evolve() { + [[ "${1:-}" == "--help" || "${1:-}" == "-h" ]] && { + echo "Usage: selfmodel evolve [flags]" + echo "" + echo " Evolution pipeline: detect local improvements, classify generalizability," + echo " package patches, and submit PRs to upstream selfmodel." + echo "" + echo "Flags:" + echo " --detect Scan playbook/hooks/scripts diffs against upstream baseline" + echo " Writes CANDIDATE entries to .selfmodel/state/evolution.jsonl" + echo " This is the default action when no flag is specified." + echo " --status Show evolution pipeline status (counts, timestamps, PR URLs)" + echo " --stage Interactively classify CANDIDATE entries (Stage/Reject/Keep)" + echo " --submit Create upstream PR from STAGED patches (requires gh CLI)" + echo " --track Monitor submitted PR statuses via gh CLI" + echo " --help Show this help message" + echo "" + echo "Examples:" + echo " selfmodel evolve # Run detection (default)" + echo " selfmodel evolve --detect # Explicit detection scan" + echo " selfmodel evolve --status # View pipeline status" + echo " selfmodel evolve --stage # Classify candidates interactively" + echo " selfmodel evolve --submit # Submit staged patches as PR" + echo " selfmodel evolve --track # Check PR status updates" + return 0 + } + + local dir="." + local action="detect" + + # Parse flags + while [[ $# -gt 0 ]]; do + case "$1" in + --detect) action="detect"; shift ;; + --status) action="status"; shift ;; + --stage) action="stage"; shift ;; + --submit) action="submit"; shift ;; + --track) action="track"; shift ;; + *) dir="$1"; shift ;; + esac + done + + local selfmodel_dir="$dir/.selfmodel" + if [[ ! -d "$selfmodel_dir" ]]; then + err "No .selfmodel/ directory found. Run 'selfmodel init' first." + return 1 + fi + + case "$action" in + detect) evolve_detect "$dir" ;; + status) evolve_status "$dir" ;; + stage) evolve_stage "$dir" ;; + submit) evolve_submit "$dir" ;; + track) evolve_track "$dir" ;; + *) + err "Unknown evolve action: $action" + return 1 + ;; + esac +} + # ─── Main ───────────────────────────────────────────────────────────────────── main() { local cmd="${1:-help}" @@ -1580,6 +3166,7 @@ main() { adapt) check_deps; cmd_adapt "$@" ;; update) check_deps; cmd_update "$@" ;; status) check_deps; cmd_status "$@" ;; + evolve) check_deps; cmd_evolve "$@" ;; version) cmd_version "$@" ;; -v) cmd_version "$@" ;; --version) cmd_version "$@" ;; @@ -1595,6 +3182,12 @@ main() { echo " --remote Fetch latest from GitHub (instead of local templates)" echo " --version Specify version/tag (default: main)" echo " status Show team health dashboard" + echo " evolve Evolution pipeline: detect improvements, classify, submit upstream" + echo " --detect Scan diffs against upstream baseline (default)" + echo " --status Show pipeline status, timestamps, PR URLs" + echo " --stage Interactively classify CANDIDATE entries" + echo " --submit Submit staged patches as upstream PR" + echo " --track Monitor submitted PR statuses" echo " version Show version" echo "" echo "Examples:" @@ -1604,6 +3197,8 @@ main() { echo " selfmodel update # Update playbook from local templates" echo " selfmodel update --remote # Fetch latest from GitHub (main branch)" echo " selfmodel update --remote --version v0.3.0 # Fetch specific version" + echo " selfmodel evolve # Detect evolution candidates" + echo " selfmodel evolve --status # View evolution pipeline status" ;; *) err "Unknown command: $cmd" diff --git a/skill/SKILL.md b/skill/SKILL.md index f8327dd..7ba2f2e 100644 --- a/skill/SKILL.md +++ b/skill/SKILL.md @@ -24,6 +24,7 @@ specialized agents via Sprint contracts, worktree isolation, and independent qua | `/selfmodel:status` | View team, sprints, and quality trends | | `/selfmodel:plan` | Create/update multi-phase orchestration plan | | `/selfmodel:loop` | Auto-orchestration loop (plan-driven) | +| `/selfmodel:evolve` | Evolution-to-PR pipeline for upstream improvements | ## Core Architecture diff --git a/skill/references/evolution-protocol.md b/skill/references/evolution-protocol.md new file mode 100644 index 0000000..fc80425 --- /dev/null +++ b/skill/references/evolution-protocol.md @@ -0,0 +1,681 @@ +# Evolution Protocol + +Evolution-to-PR Pipeline: detect local improvements, classify generalizability, +package patches, and submit PRs to upstream selfmodel. + +Trigger: Every 10 MERGED Sprints (orchestration-loop.md Step 8.5). +Manual trigger: `/selfmodel:evolve` command. + +--- + +## Pipeline Overview + +``` +DETECT → STAGE → SUBMIT → TRACK + │ │ │ │ + │ scan │ class │ PR │ monitor + │ diffs │ ify │ create │ status + ▼ ▼ ▼ ▼ +CANDIDATE STAGED SUBMITTED ACCEPTED/REJECTED +``` + +Four phases, each with clear input/output boundaries. Detection is fully automated +and read-only. Staging requires interactive classification. Submission requires +human approval. Tracking is passive monitoring. + +--- + +## Phase 1: DETECT + +**Purpose**: Compare local playbook, hooks, and scripts against upstream baseline +to discover improvements worth contributing back. + +**Trigger conditions**: +- Automatic: orchestration-loop.md Step 8.5 fires every 10 MERGED sprints +- Manual: `/selfmodel:evolve --detect` +- Post-update: after `selfmodel update --remote` refreshes the baseline + +**Input sources** (scanned in order): + +| Source | What to scan | Signal type | +|--------|-------------|-------------| +| Playbook diffs | `git diff upstream/main -- .selfmodel/playbook/` | Direct improvement | +| Hook diffs | `git diff upstream/main -- .selfmodel/hooks/` or `scripts/*.sh` | Bug fix / enhancement | +| Script diffs | `git diff upstream/main -- scripts/` | Tool improvement | +| Validated lessons | `lessons-learned.md` entries with Result: improved | Proven pattern | +| Hook intercept patterns | `hook-intercepts.log` repeated blocks for same reason | False positive fix | +| Quality trends | `quality.jsonl` systematic score shifts | Calibration data | + +**Detection algorithm**: + +``` +1. Establish upstream baseline: + a. If git remote "upstream" exists: git fetch upstream && use upstream/main + b. Else if .selfmodel/state/upstream-baseline.sha exists: use stored SHA + c. Else: SKIP detection, output "no upstream baseline — run selfmodel update --remote" + +2. For each source, generate candidate list: + a. Playbook/hook/script diffs: + - git diff <baseline>..HEAD -- <path> + - For each file with changes, create one CANDIDATE entry + b. Validated lessons: + - Parse lessons-learned.md for entries with "Result: improved" + - Cross-reference: if lesson's Action already reflected in a diff, skip (covered by diff) + - If lesson has no corresponding diff (pure process change), create CANDIDATE + c. Hook intercept patterns: + - Parse hook-intercepts.log for repeated blocks (same hook + same reason, 3+ occurrences) + - If corresponding hook script has a diff: enhance that CANDIDATE's evidence + - If no diff but pattern is clear: create CANDIDATE with category=hook_improvement + d. Quality trends: + - Parse quality.jsonl for systematic shifts (5+ sprints with same dimension trending) + - If quality-gates.md has threshold changes matching the trend: create CANDIDATE + +3. For each candidate, run 5 generalizability heuristics (see below) + +4. Write CANDIDATE entries to evolution.jsonl (append-only) + +5. Output summary: + "Detected N candidates: X playbook patches, Y hook fixes, Z new lessons" +``` + +**Output**: CANDIDATE entries in `evolution.jsonl`. Detection is read-only except +for appending to evolution.jsonl. + +--- + +## Phase 2: STAGE + +**Purpose**: Interactive classification of CANDIDATE entries. Human and Leader +collaborate to decide which improvements are generalizable. + +**Trigger**: `/selfmodel:evolve --stage` or automatic after DETECT when candidates exist. + +**Classification flow**: + +``` +For each CANDIDATE in evolution.jsonl (sorted by generalizability_score DESC): + + 1. Display summary: + [evo-2026-04-06-001] playbook_patch score=0.85 + .selfmodel/playbook/quality-gates.md + "Added AI Slop detection scoring rubric with 8 patterns" + + 2. Show diff preview: + git diff <baseline>..HEAD -- <source_file> | head -40 + + 3. Leader recommends classification based on heuristics: + - score >= 0.7 → recommend STAGE + - score < 0.3 → recommend REJECT_PROJECT_SPECIFIC + - 0.3 <= score < 0.7 → recommend manual review + + 4. User decides: + [S]tage → status=STAGED, generate patch + [R]eject → status=REJECTED_PROJECT_SPECIFIC + [K]eep → status=CANDIDATE (revisit later) + [E]dit → modify summary/description before staging + + 5. For STAGED entries: + a. Generate patch file: + git diff <baseline>..HEAD -- <source_file> > \ + .selfmodel/state/evolution-staging/<evo-id>/<filename>.patch + b. Strip project-specific content from patch: + - Replace absolute paths with placeholder: /Users/*/project/ → <project-root>/ + - Replace project name with placeholder: <project-name> + - Flag any remaining project-specific references for manual review + c. Update evolution.jsonl entry: staged_at=<now> +``` + +**Output**: STAGED entries in evolution.jsonl, patch files in +`.selfmodel/state/evolution-staging/<evo-id>/`. + +--- + +## Phase 3: SUBMIT + +**Purpose**: Create upstream PR from STAGED patches. Requires explicit human approval. + +**Trigger**: `/selfmodel:evolve --submit` (never automatic). + +**Submission flow**: + +``` +1. Collect all STAGED entries from evolution.jsonl + +2. Group by upstream_file for efficient PR packaging: + - Multiple changes to same file → single combined patch + - Unrelated files → can be in same PR if logically cohesive + +3. Pre-submission checks: + a. shellcheck: run shellcheck on all .sh files in staging + - FAIL → block submission, output errors, user must fix + b. Path audit: grep for absolute paths, project names, credentials + - Found → block submission, list violations + c. Patch applicability: attempt dry-run apply against upstream baseline + - CONFLICT → mark entries as CONFLICT, user must resolve or SUPERSEDE + +4. Generate PR content (see PR Template Format below) + +5. HUMAN APPROVAL GATE: + Display full PR preview (title, body, file list, diff stats) + "Submit this PR to upstream? [yes/no/edit]" + - no → abort, entries stay STAGED + - edit → user modifies PR content, re-display + - yes → proceed to submission + +6. Submit: + a. Fork upstream if not already forked + b. Create branch: evolution/<date>-<short-description> + c. Apply patches + d. Commit with evidence-rich message + e. gh pr create --repo <upstream> --title <title> --body <body> + f. Update evolution.jsonl entries: + - status=SUBMITTED + - submitted_at=<now> + - pr_url=<gh output> + - pr_status=open +``` + +**Output**: SUBMITTED entries in evolution.jsonl, open PR on upstream repo. + +--- + +## Phase 4: TRACK + +**Purpose**: Monitor submitted PRs and update local state accordingly. + +**Trigger**: `/selfmodel:evolve --track` or automatic during DETECT phase. + +**Tracking flow**: + +``` +1. For each SUBMITTED entry in evolution.jsonl: + + a. Query PR status: + gh pr view <pr_url> --json state,mergedAt,reviews + + b. Map status: + - merged → ACCEPTED, record reviewed_by from PR reviews + - closed (not merged) → REJECTED_UPSTREAM, record reviewer comments + - open + changes_requested → stays SUBMITTED, flag for user attention + - open + approved → stays SUBMITTED (waiting for maintainer merge) + - conflict detected → CONFLICT (upstream changed target file) + + c. Update evolution.jsonl entry with new status and metadata + +2. Handle CONFLICT: + - If upstream changed the target file after PR submission: + a. Mark old entry as SUPERSEDED + b. Create new CANDIDATE with updated diff against new upstream + c. Output: "evo-2026-04-06-001 superseded — upstream changed target, re-detect needed" + +3. Output summary: + "Tracked N PRs: X accepted, Y rejected, Z pending, W conflicted" +``` + +**Output**: Updated status in evolution.jsonl. + +--- + +## evolution.jsonl Schema + +Path: `.selfmodel/state/evolution.jsonl` + +Each line is a single JSON object. File is append-only (new entries appended, +status updates rewrite the specific line via `jq` or equivalent). + +```json +{ + "id": "evo-YYYY-MM-DD-NNN", + "status": "<status>", + "category": "<category>", + "source_file": "<relative path>", + "upstream_file": "<relative path>", + "summary": "<one-line description>", + "description": "<detailed explanation>", + "evidence": { + "sprints_affected": [], + "quality_trend": "<string or null>", + "hook_intercepts": 0, + "lessons_learned_ref": "<string or null>" + }, + "heuristic": "<heuristic_name>", + "generalizability_score": 0.0, + "generalizability_reason": "<why generalizable or not>", + "diff_stats": "+N -M lines in file.md", + "detected_at_sprint": 0, + "detected_at": "<ISO8601>", + "staged_at": null, + "submitted_at": null, + "pr_url": null, + "pr_status": null, + "reviewed_by": null, + "project_name": "<derived from git remote>", + "selfmodel_version": "<current version>" +} +``` + +### Field Reference + +| Field | Type | Description | +|-------|------|-------------| +| `id` | string | Unique identifier. Format: `evo-YYYY-MM-DD-NNN` where NNN is zero-padded sequence within the day. Example: `evo-2026-04-06-001` | +| `status` | enum | Lifecycle state. Values: `CANDIDATE`, `STAGED`, `SUBMITTED`, `ACCEPTED`, `REJECTED_PROJECT_SPECIFIC`, `REJECTED_UPSTREAM`, `CONFLICT`, `SUPERSEDED` | +| `category` | enum | Type of improvement. Values: `playbook_patch`, `hook_improvement`, `script_fix`, `new_lesson`, `new_playbook_page` | +| `source_file` | string | Relative path to the locally modified file (from project root). Example: `.selfmodel/playbook/quality-gates.md` | +| `upstream_file` | string | Relative path in the upstream selfmodel repo where the change would apply. Often identical to `source_file` but may differ if local structure diverges | +| `summary` | string | One-line human-readable description of the improvement. Max 120 characters | +| `description` | string | Detailed explanation of what changed, why it matters, and what problem it solves. No length limit | +| `evidence` | object | Supporting data for the improvement (see Evidence sub-schema) | +| `evidence.sprints_affected` | number[] | Sprint numbers where this improvement was relevant or would have prevented issues. Example: `[65, 66, 76]` | +| `evidence.quality_trend` | string\|null | Description of quality score trend that motivated this change. Example: `"code_quality avg dropped from 8.2 to 6.5 over sprints 40-50"`. Null if not trend-driven | +| `evidence.hook_intercepts` | number | Count of hook interceptions related to this improvement. From `hook-intercepts.log`. Zero if not hook-related | +| `evidence.lessons_learned_ref` | string\|null | Reference to lessons-learned.md entry. Example: `"Sprint 65-76: Merge conflicts"`. Null if no linked lesson | +| `heuristic` | enum | Which generalizability heuristic was applied. Values: `path_detection`, `project_name_detection`, `generic_pattern`, `hook_fix`, `scoring_calibration` | +| `generalizability_score` | float | Score from 0.0 (fully project-specific) to 1.0 (fully generalizable). Computed by heuristic rules | +| `generalizability_reason` | string | Human-readable explanation of why the score was assigned. Must reference specific evidence | +| `diff_stats` | string | Git diff statistics. Format: `+N -M lines in <filename>`. Example: `+45 -12 lines in quality-gates.md` | +| `detected_at_sprint` | number | Sprint number at which detection occurred. Derived from current sprint count in team.json | +| `detected_at` | string | ISO 8601 timestamp of detection. Example: `2026-04-06T14:30:00Z` | +| `staged_at` | string\|null | ISO 8601 timestamp of staging. Null until STAGED | +| `submitted_at` | string\|null | ISO 8601 timestamp of PR submission. Null until SUBMITTED | +| `pr_url` | string\|null | GitHub PR URL. Null until SUBMITTED. Example: `https://github.com/org/selfmodel/pull/42` | +| `pr_status` | string\|null | Current PR state. Values: `open`, `merged`, `closed`, `changes_requested`. Null until SUBMITTED | +| `reviewed_by` | string\|null | GitHub username(s) who reviewed the PR. Null until reviewed. Comma-separated for multiple reviewers | +| `project_name` | string | Derived from `git remote get-url origin` — extract repo/org name. Example: `myproject` | +| `selfmodel_version` | string | Version from `VERSION` file at time of detection. Example: `0.3.0` | + +### Status Lifecycle + +``` +CANDIDATE ──┬── STAGED ──── SUBMITTED ──┬── ACCEPTED + │ ├── REJECTED_UPSTREAM + │ └── CONFLICT ──── SUPERSEDED + ├── REJECTED_PROJECT_SPECIFIC + └── (stays CANDIDATE until revisited) +``` + +Transitions: +- `CANDIDATE → STAGED`: User approves during Stage phase +- `CANDIDATE → REJECTED_PROJECT_SPECIFIC`: User rejects as project-specific +- `STAGED → SUBMITTED`: PR created after human approval +- `SUBMITTED → ACCEPTED`: Upstream merges the PR +- `SUBMITTED → REJECTED_UPSTREAM`: Upstream closes PR without merging +- `SUBMITTED → CONFLICT`: Upstream changed target file, patch no longer applies +- `CONFLICT → SUPERSEDED`: New CANDIDATE created with updated diff +- Any status can stay indefinitely (no forced timeout) + +--- + +## Generalizability Heuristics + +Five heuristics evaluate whether a local improvement is worth contributing upstream. +Each heuristic produces a score component and reason. The final `generalizability_score` +is the weighted combination of applicable heuristics. + +### Heuristic 1: PATH_DETECTION + +**Purpose**: Detect absolute paths that make the change project-specific. + +**Rule**: Scan the diff hunks for patterns matching absolute filesystem paths +outside of example blocks or template placeholders. + +**Detection patterns**: +- `/Users/<username>/` — macOS home directory +- `/home/<username>/` — Linux home directory +- `/var/`, `/opt/`, `/srv/` followed by project-specific directory names +- Any path containing the project directory name derived from `git rev-parse --show-toplevel` + +**Exclusions** (do not flag): +- Paths inside fenced code blocks marked as example/template (` ```example `) +- Paths using placeholder syntax: `<project-root>/`, `$HOME/`, `~/.config/` +- Paths in comments explaining what to replace + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| No absolute paths in diff | +0.0 (neutral, no penalty) | +| Absolute paths only in examples/templates | +0.0 (neutral) | +| 1-2 absolute paths in non-example code | -0.4 (likely project-specific) | +| 3+ absolute paths in non-example code | -0.8 (definitely project-specific) | + +**Example**: +```diff +- timeout 180 gemini "@/Users/vvedition/Desktop/myproject/.selfmodel/inbox/gemini/sprint-1.md" ++ timeout 180 gemini "@${WORKTREE}/.selfmodel/inbox/gemini/sprint-1.md" +``` +This diff contains `/Users/vvedition/Desktop/myproject/` — a hardcoded path. +Score impact: -0.4. Reason: "diff contains 1 absolute path reference to project directory." + +### Heuristic 2: PROJECT_NAME_DETECTION + +**Purpose**: Detect references to the current project name that make the change project-specific. + +**Rule**: Extract the project name from two sources, then scan diff hunks for occurrences. + +**Name extraction**: +```bash +# Source 1: git remote URL +git remote get-url origin 2>/dev/null | sed 's/.*\///' | sed 's/\.git$//' + +# Source 2: directory name +basename "$(git rev-parse --show-toplevel)" +``` + +**Detection**: Case-insensitive search in diff hunks (excluding file path components, +since paths are handled by PATH_DETECTION). + +**Exclusions**: +- Project name appearing in a generic context: `"derived from git remote"` is fine +- Project name in CHANGELOG entries (project-specific by nature, but the pattern is generic) + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| No project name references | +0.0 (neutral) | +| Project name only in metadata fields (project_name, CHANGELOG) | +0.0 (neutral) | +| Project name in logic/rules/conditions | -0.5 (likely project-specific) | +| Project name in hardcoded strings/paths | -0.7 (definitely project-specific) | + +**Example**: +```diff ++ if [ "$PROJECT" = "vibe-sensei" ]; then ++ SPECIAL_FLAG=true ++ fi +``` +Score impact: -0.7. Reason: "diff contains hardcoded project name 'vibe-sensei' in conditional logic." + +### Heuristic 3: GENERIC_PATTERN + +**Purpose**: Identify new playbook sections, rules, or patterns that contain no +project-specific nouns and are broadly applicable. + +**Rule**: If the diff adds a new section (detected by heading markers: `## `, `### `) +or a substantial block (10+ lines) to a playbook file, and that block passes both +PATH_DETECTION and PROJECT_NAME_DETECTION with no penalties, it is a generic pattern. + +**Positive signals** (increase score): +- New section with heading that describes a universal concept (e.g., "AI Slop Detection", "Drift Detection") +- References to general software engineering practices, not project-specific workflows +- Uses placeholder variables instead of hardcoded values +- Block is self-contained (does not depend on project-specific sections) + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| New self-contained section, no project references | +0.8 (highly generalizable) | +| New section with minor project context that can be stripped | +0.5 (generalizable with edits) | +| Modification to existing section, generic improvement | +0.6 (likely generalizable) | +| Modification deeply intertwined with project-specific logic | +0.2 (low generalizability) | + +**Example**: +Adding an "AI Slop Detection" section to quality-gates.md with 8 universal patterns +and scoring rubric. No project names, no absolute paths. +Score impact: +0.8. Reason: "new self-contained section describing universal code quality patterns." + +### Heuristic 4: HOOK_FIX + +**Purpose**: Identify hook script changes that fix false positives or improve accuracy, +based on empirical evidence from intercept logs. + +**Rule**: A hook script diff qualifies when `hook-intercepts.log` contains entries +showing the old pattern caused incorrect blocks (false positives). + +**Evidence requirements**: +- At least 3 intercept log entries for the same hook + same reason pattern +- The diff modifies the matching logic (grep patterns, conditionals, allowlists) +- The fix does not introduce project-specific paths or names + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| Hook fix with 5+ false positive intercepts in log | +0.9 (strong evidence) | +| Hook fix with 3-4 false positive intercepts | +0.7 (moderate evidence) | +| Hook fix with <3 intercepts | +0.3 (weak evidence, may be coincidental) | +| Hook change with no intercept log correlation | +0.1 (speculative) | + +**Example**: +`enforce-agent-rules.sh` modified to accept `inbox/research/` directory for gemini agent. +Hook-intercepts.log shows 5 entries: `hook=enforce-agent-rules tool=Bash reason=gemini-no-inbox-file`. +Score impact: +0.9. Reason: "hook fix backed by 5 false positive intercepts; Researcher uses inbox/research/ not inbox/gemini/." + +### Heuristic 5: SCORING_CALIBRATION + +**Purpose**: Identify quality-gates.md threshold changes that are motivated by +empirical quality data trends. + +**Rule**: A threshold change in quality-gates.md qualifies when `quality.jsonl` +shows a systematic trend that motivated the recalibration. + +**Evidence requirements**: +- `quality.jsonl` contains 5+ entries showing a consistent pattern in the affected dimension +- The diff modifies scoring thresholds, rubric text, or calibration examples +- The change direction aligns with the observed trend (tightening if scores inflated, loosening if too harsh) + +**Scoring**: +| Finding | Score impact | +|---------|-------------| +| Threshold change with 10+ quality entries showing trend | +0.85 (strong calibration) | +| Threshold change with 5-9 quality entries showing trend | +0.65 (moderate calibration) | +| Threshold change with weak/ambiguous trend | +0.3 (speculative) | +| New calibration example based on real sprint data | +0.75 (empirical anchor) | + +**Example**: +quality-gates.md adds "AI Slop Detection" scoring penalties. quality.jsonl shows +Code Quality dimension averaged 8.5 over last 10 sprints despite visible slop patterns. +Score impact: +0.85. Reason: "scoring calibration backed by 10-sprint quality trend showing inflated Code Quality scores." + +### Score Combination + +When multiple heuristics apply (common: GENERIC_PATTERN + PATH_DETECTION): + +``` +final_score = base_positive + sum(negative_impacts) +``` + +Where: +- `base_positive` = highest positive score from applicable heuristics (GENERIC_PATTERN, HOOK_FIX, or SCORING_CALIBRATION) +- `negative_impacts` = sum of all negative scores from PATH_DETECTION and PROJECT_NAME_DETECTION + +Clamped to [0.0, 1.0]. + +**Example**: A new playbook section (GENERIC_PATTERN: +0.8) that contains one absolute path +(PATH_DETECTION: -0.4). Final score = 0.8 + (-0.4) = 0.4. Reason incorporates both: +"new generic section (+0.8) but contains 1 absolute path (-0.4); strip path before staging." + +--- + +## PR Template Format + +Used in SUBMIT phase when creating upstream PRs. + +```markdown +## Summary + +Community-discovered improvements from project usage (<project_name>, <N> sprints). + +These changes were detected by selfmodel's evolution pipeline, classified as +generalizable by heuristic analysis, and verified against local sprint data. + +## Changes + +| # | Category | File | Summary | Evidence | +|---|----------|------|---------|----------| +| 1 | <category> | <upstream_file> | <summary> | <sprints_affected>, <quality_trend or hook_intercepts> | +| 2 | ... | ... | ... | ... | + +## Per-Change Details + +### Change 1: <summary> + +**Category**: <category> +**Heuristic**: <heuristic> (score: <generalizability_score>) +**Reason**: <generalizability_reason> + +**What changed**: +<description> + +**Evidence**: +- Sprints affected: <sprints_affected> +- Quality trend: <quality_trend> +- Hook intercepts: <hook_intercepts> +- Lessons learned ref: <lessons_learned_ref> + +**Diff stats**: <diff_stats> + +--- + +(repeat for each change) + +## Testing + +- [ ] shellcheck passes on all modified .sh files +- [ ] `selfmodel status` runs without errors after applying changes +- [ ] No absolute paths or project-specific names in submitted code +- [ ] Patches apply cleanly to upstream HEAD + +## Context + +- selfmodel version: <selfmodel_version> +- Detection sprint: <detected_at_sprint> +- Project sprint count: <total_sprints> +- Evolution entries: <count of entries in this PR> +``` + +--- + +## Integration Points + +| System | Integration | Direction | +|--------|-------------|-----------| +| orchestration-loop.md Step 8.5 | Auto-triggers DETECT phase every 10 MERGED sprints. Leader checks `team.json.evolution.last_review_sprint` and compares to current merged count. If `current - last_review >= 10`, run DETECT | orchestration-loop → evolution | +| `/selfmodel:status` | Displays evolution pipeline status: counts by status (N candidates, M staged, K submitted, J accepted). Reads from `evolution.jsonl` | evolution → status display | +| `selfmodel update --remote` | Refreshes upstream baseline SHA. After update, stores new baseline in `.selfmodel/state/upstream-baseline.sha`. Enables DETECT to compute accurate diffs | update → evolution baseline | +| `team.json` evolution section | Stores persistent state: `last_review_sprint` (sprint number of last DETECT run), `candidate_count`, `staged_count`, `submitted_count`, `accepted_count`. Updated by each phase | evolution ↔ team.json | +| `CONTRIBUTING.md` (upstream) | Evolution PRs follow the upstream project's contributing standards. PR template references CONTRIBUTING.md if it exists | evolution → upstream standards | +| `lessons-learned.md` | DETECT phase scans for entries with `Result: improved` as evolution candidates. ACCEPTED upstream changes are cross-referenced back as validated lessons | lessons-learned ↔ evolution | +| `quality.jsonl` | DETECT phase analyzes quality trends for SCORING_CALIBRATION heuristic. Trend data becomes evidence in evolution entries | quality.jsonl → evolution evidence | +| `hook-intercepts.log` | DETECT phase scans for repeated false positive patterns for HOOK_FIX heuristic. Intercept counts become evidence in evolution entries | hook-intercepts.log → evolution evidence | + +### team.json Evolution Section Schema + +```json +{ + "evolution": { + "last_review_sprint": 0, + "candidate_count": 0, + "staged_count": 0, + "submitted_count": 0, + "accepted_count": 0, + "rejected_project_specific_count": 0, + "rejected_upstream_count": 0, + "last_detect_at": null, + "last_submit_at": null + } +} +``` + +--- + +## Safety Rules + +1. **Human MUST approve before any PR submission** — The SUBMIT phase has a mandatory + human approval gate. Leader displays the full PR preview and waits for explicit `yes`. + No automated submission. No "approve all" batch mode. + +2. **Detection is read-only** — The DETECT phase only reads from source files, logs, + and quality data. The only write operation is appending CANDIDATE entries to + `evolution.jsonl`. No source files are modified during detection. + +3. **Never submit project-specific content** — Before PR creation, the pipeline runs + a mandatory audit for: + - Absolute filesystem paths (`/Users/`, `/home/`, project root paths) + - Project name references in logic or hardcoded strings + - Credentials, API keys, tokens, or secrets + - Internal team member names or identifiers + Any finding blocks submission until resolved. + +4. **All patches must pass shellcheck before submission** — Every `.sh` file included + in a PR must pass `shellcheck` with zero warnings. This is enforced in the + SUBMIT phase pre-submission checks. No override mechanism. + +5. **evolution.jsonl is append-only** — New entries are appended. Status updates + modify existing entries in-place but never delete entries. The full history is + preserved for audit and trend analysis. Rotation policy: retain all entries + (per quality-gates.md log maintenance rules). + +6. **Upstream conflict means SUPERSEDE, never force** — When an upstream change + conflicts with a submitted patch: + - The existing entry transitions to SUPERSEDED status + - A new CANDIDATE is created with an updated diff against the new upstream state + - Force push is never used on upstream branches + - The old PR is closed with a comment explaining the supersession + +--- + +## Directory Structure + +``` +.selfmodel/ +├── state/ +│ ├── evolution.jsonl # Evolution entries (append-only) +│ ├── evolution-staging/ # Generated during STAGE phase +│ │ └── evo-2026-04-06-001/ +│ │ └── quality-gates.md.patch # Stripped patch file +│ └── upstream-baseline.sha # Upstream reference SHA +└── playbook/ + └── evolution-protocol.md # This file +``` + +--- + +## Operational Notes + +### First-Time Setup + +Before evolution can run, the project needs an upstream baseline: + +```bash +# Option A: Add upstream remote (preferred) +git remote add upstream <selfmodel-upstream-url> +git fetch upstream + +# Option B: Manual baseline (if no upstream remote) +echo "<known-good-sha>" > .selfmodel/state/upstream-baseline.sha +``` + +### Manual Invocation Examples + +```bash +# Full pipeline (interactive) +/selfmodel:evolve + +# Detection only (safe, read-only) +/selfmodel:evolve --detect + +# Stage candidates (interactive classification) +/selfmodel:evolve --stage + +# Submit staged patches (requires human approval) +/selfmodel:evolve --submit + +# Track submitted PRs +/selfmodel:evolve --track + +# Show pipeline status +/selfmodel:evolve --status +``` + +### Automatic Invocation + +The orchestration loop triggers DETECT automatically at Step 8.5 when the merged +sprint count crosses a 10-sprint boundary. The check: + +``` +current_merged = count(plan.md sprints with MERGED status) +last_review = team.json.evolution.last_review_sprint +if (current_merged - last_review) >= 10: + run DETECT phase + update team.json.evolution.last_review_sprint = current_merged +``` + +STAGE and SUBMIT are never automatic — they always require user interaction.