diff --git a/.claude/skills/issue-lifecycle/SKILL.md b/.claude/skills/issue-lifecycle/SKILL.md index aa6989fa..7e8f41f9 100644 --- a/.claude/skills/issue-lifecycle/SKILL.md +++ b/.claude/skills/issue-lifecycle/SKILL.md @@ -53,48 +53,184 @@ systeminit/swamp#850": ``` 5. **Generate an implementation plan**: + + First, write a single YAML file (e.g. `/tmp/plan.yaml`) containing both + `steps` and `potentialChallenges` as top-level keys. The CLI only supports + one `--input-file` flag per invocation, and the file must be a YAML object + (not a bare array). + + ```yaml + # /tmp/plan.yaml + steps: + - order: 1 + description: "Add the new schema types" + files: + - src/domain/schemas.ts + risks: "May need migration" + - order: 2 + description: "Update the service layer" + files: + - src/domain/service.ts + - src/domain/service_test.ts + potentialChallenges: + - "Backwards compatibility with existing data" + - "Test coverage for edge cases" + ``` + + Then run the method. The `--input` flags handle scalar values, and + `--input-file` handles the structured data: + ``` swamp model method run issue- plan \ --input summary="..." \ --input dddAnalysis="..." \ - --input-file steps.json \ --input testingStrategy="..." \ - --input-file potentialChallenges.json --json + --input-file /tmp/plan.yaml --json ``` -6. **Show the plan to the human and ask for feedback.** Present it clearly: - - Summary - - Each step with files and risks - - Testing strategy - - Potential challenges - - Then ask: "Does this plan look right, or do you have feedback?" +6. **Show the plan to the human**, then **run adversarial review** (see below). + +## Adversarial Plan Review + +After **every** `plan` or `iterate` call, you MUST run an adversarial review +before the human can approve. This is automatic — do not skip it. + +### Step 1: Challenge the plan + +Re-read the plan, then critically evaluate it across these dimensions: + +- **Architecture**: Does this follow DDD principles? Are domain boundaries + correct? Is this the right abstraction level? Are there better patterns? +- **Scope**: Is this doing too much or too little? Does it match the issue? Is + there scope creep? Are there unnecessary changes? +- **Risk**: Are all failure modes identified? What about edge cases, race + conditions, backwards compatibility? What could go wrong that isn't listed? +- **Testing**: Is the testing strategy sufficient? Are edge cases covered? Are + there integration test gaps? Is there over-reliance on unit tests? +- **Complexity**: Is this over-engineered? Could it be simpler? Are there + unnecessary abstractions or indirections? +- **Correctness**: Will this actually solve the problem? Are there logical gaps? + Does the approach match established patterns in the codebase? + +### Step 2: Verify against the codebase + +For each plan step, **read the actual code**: + +- Read every file referenced in the plan steps — verify they exist +- Confirm functions, classes, and types exist where claimed +- Grep for related code the plan might have missed +- Check for conflicts with existing patterns or naming conventions +- Verify that proposed test paths and test patterns match the codebase +- Look for code that already does what a step proposes (duplication risk) + +### Step 3: Record findings + +Write findings to a YAML file (e.g. `/tmp/findings.yaml`) as a YAML object with +a `findings` key: + +```yaml +# /tmp/findings.yaml +findings: + - id: ADV-1 + severity: high + category: architecture + description: "The plan adds a new service but doesn't define its domain boundary" + - id: ADV-2 + severity: medium + category: testing + description: "No integration tests planned for the new API endpoint" +``` + +Then record them: + +``` +swamp model method run issue- adversarial_review \ + --input-file /tmp/findings.yaml --json +``` + +Each finding must have: + +- `id`: Sequential identifier (ADV-1, ADV-2, ...) +- `severity`: critical, high, medium, or low +- `category`: architecture, scope, risk, testing, complexity, or correctness +- `description`: Clear explanation of the concern + +**Critical/high findings block approval.** Medium/low are shown as warnings. + +### Step 4: Present to the human + +Show findings grouped by severity: + +1. Critical findings (blockers) +2. High findings (blockers) +3. Medium findings (warnings) +4. Low findings (warnings) + +If there are blocking findings, say: + +> "The adversarial review found N blocking issue(s) that need to be addressed +> before approval. Here's what needs to change: ..." + +If no blocking findings, say: + +> "Adversarial review passed — no blocking findings. N warning(s) noted. Ready +> for your approval when you are." ## Plan Iteration Loop -When the human gives feedback (they almost always will on the first plan): +When the human gives feedback OR adversarial findings need addressing: + +1. **Call iterate** with both the feedback text and your revised plan. Write the + revised `steps` and `potentialChallenges` to a YAML file (same format as the + `plan` step above), then run: -1. **Call iterate** with both the feedback text and your revised plan: ``` swamp model method run issue- iterate \ - --input feedback="" \ + --input feedback="" \ --input summary="..." \ - --input-file steps.json \ - ... --json + --input dddAnalysis="..." \ + --input testingStrategy="..." \ + --input-file /tmp/plan.yaml --json + ``` + +2. **Resolve addressed findings**. Write resolutions to a YAML file: + + ```yaml + # /tmp/resolutions.yaml + resolutions: + - findingId: ADV-1 + resolutionNote: "Added domain boundary definition in step 2" + - findingId: ADV-2 + resolutionNote: "Added integration test step covering the new endpoint" ``` -2. **Show what changed** between the previous and new plan version. Be specific: + ``` + swamp model method run issue- resolve_findings \ + --input-file /tmp/resolutions.yaml --json + ``` + +3. **Re-run adversarial review** on the new plan version. The review must be + current with the latest plan version — stale reviews block approval. + +4. **Show what changed** between the previous and new plan version. Be specific: - "Reordered steps 2 and 3 based on your feedback" - "Added regression analysis for module X" - "Removed step 4 — you're right, that's already handled by..." -3. **Ask for feedback again.** Keep iterating until the human says one of: - - "approve", "approved", "looks good", "ship it", "go", "LGTM" +5. **Ask for feedback again.** Keep iterating until: + - All critical/high adversarial findings are resolved, AND + - The human says: "approve", "approved", "looks good", "ship it", "go", + "LGTM" -4. Only then call `approve`: +6. Only then call `approve`: ``` swamp model method run issue- approve --json ``` +**The `approve` method will fail** if critical/high findings are unresolved or +if the adversarial review is stale (wrong plan version). This is enforced by the +`adversarial-review-clear` pre-flight check. + ## Implementation & CI Flow After plan approval, when the human says to implement: @@ -107,17 +243,31 @@ After plan approval, when the human says to implement: --input prNumber= --json ``` -4. **Wait for CI to complete**, then fetch results: +4. **Wait 3 minutes for CI to start.** Use `sleep 180` — CI takes at least this + long, so there's no point polling earlier. + +5. **Poll for CI results.** Call `ci_status` and check if checks are still + pending. If pending, wait 60 seconds and retry. Keep polling until all checks + have completed (passed or failed). Do NOT hand control back to the human + during this wait — stay in the loop. + ``` swamp model method run issue- ci_status --json ``` -5. **Show the CI results** to the human, grouped by reviewer and severity: +6. **Show the CI results** to the human, grouped by reviewer and severity: - Which checks passed/failed - Review comments grouped by reviewer - Comments sorted by severity (critical first) -6. **When the human directs fixes**, parse their instruction: +7. **If everything is green and approved**, the PR will auto-merge. Call + `complete` immediately — no need to ask the human: + ``` + swamp model method run issue- complete --json + ``` + +8. **If there are failures or review comments**, present them and wait for the + human's direction. Parse their instruction: - "fix the CRITICAL issues from adversarial review" -> `targetReview: "claude-adversarial-review"`, `targetSeverity: "critical"` - "address all the review comments" -> no filters @@ -130,11 +280,14 @@ After plan approval, when the human says to implement: --input targetSeverity="" --json ``` -7. **Make the fixes**, push, wait for CI, then call `ci_status` again -8. **Loop until clean** or the human says to complete: - ``` - swamp model method run issue- complete --json - ``` +9. **After pushing fixes, loop back to step 4.** Wait 3 minutes, poll for CI, + show results. Repeat until clean or the human says to stop. + +**IMPORTANT:** Do not break out of this loop voluntarily. The human should never +have to manually check CI or come back to ask "what happened?" — the skill stays +in the conversation and drives through to completion. If the human wants to +break out (e.g. "I'll come back to this later", "pause", "stop"), respect that +immediately — but it must be their decision, never yours. ## Reviewing Plan History diff --git a/extensions/models/_lib/schemas.ts b/extensions/models/_lib/schemas.ts index 0eca490a..db045df9 100644 --- a/extensions/models/_lib/schemas.ts +++ b/extensions/models/_lib/schemas.ts @@ -61,6 +61,8 @@ export const TRANSITIONS: Record = { approve: ["plan_generated"], implement: ["approved"], ci_status: ["implementing"], + adversarial_review: ["plan_generated"], + resolve_findings: ["plan_generated"], fix: ["ci_review"], complete: ["ci_review"], }; @@ -161,6 +163,32 @@ export const CiResultsSchema = z.object({ fetchedAt: z.string(), }); +export const AdversarialFindingSchema = z.object({ + id: z.string().describe("Unique finding identifier, e.g. ADV-1"), + severity: z.enum(["critical", "high", "medium", "low"]), + category: z.enum([ + "architecture", + "scope", + "risk", + "testing", + "complexity", + "correctness", + ]), + description: z.string(), + resolved: z.boolean().default(false), + resolutionNote: z.string().optional(), +}); + +export const AdversarialReviewSchema = z.object({ + planVersion: z.number().describe( + "The plan version this review applies to", + ), + findings: z.array(AdversarialFindingSchema), + reviewedAt: z.string(), +}); + +export type AdversarialReviewData = z.infer; + export const FixDirectiveSchema = z.object({ round: z.number(), directive: z.string(), diff --git a/extensions/models/issue_lifecycle.ts b/extensions/models/issue_lifecycle.ts index e7bb5509..c91d4fdd 100644 --- a/extensions/models/issue_lifecycle.ts +++ b/extensions/models/issue_lifecycle.ts @@ -16,6 +16,9 @@ import { z } from "zod"; import { + AdversarialFindingSchema, + type AdversarialReviewData, + AdversarialReviewSchema, CiResultsSchema, ClassificationSchema, ContextSchema, @@ -60,9 +63,18 @@ async function readState( export const model = { type: "@si/issue-lifecycle", - version: "2026.03.25.2", + version: "2026.03.26.1", globalArguments: GlobalArgsSchema, + upgrades: [ + { + toVersion: "2026.03.26.1", + description: + "Add adversarial review resource, methods, and approval gate — no globalArguments changes", + upgradeAttributes: (old: Record) => old, + }, + ], + resources: { "state": { description: "Current lifecycle phase and metadata", @@ -94,6 +106,12 @@ export const model = { lifetime: "infinite" as const, garbageCollection: 20, }, + "adversarialReview": { + description: "Adversarial review findings for the current plan version", + schema: AdversarialReviewSchema, + lifetime: "infinite" as const, + garbageCollection: 20, + }, "ciResults": { description: "CI check results and review comments", schema: CiResultsSchema, @@ -219,6 +237,80 @@ export const model = { }, }, + "adversarial-review-clear": { + description: + "Ensures all critical/high adversarial findings are resolved before approval", + labels: ["policy"], + appliesTo: ["approve"], + execute: async (context: { + dataRepository: { + getContent: ( + type: string, + modelId: string, + dataName: string, + ) => Promise; + }; + modelType: string; + modelId: string; + }) => { + const planContent = await context.dataRepository.getContent( + context.modelType, + context.modelId, + "plan-main", + ); + if (!planContent) { + return { pass: false, errors: ["No plan exists."] }; + } + const plan = JSON.parse( + new TextDecoder().decode(planContent), + ) as PlanData; + + const reviewContent = await context.dataRepository.getContent( + context.modelType, + context.modelId, + "adversarialReview-main", + ); + if (!reviewContent) { + return { + pass: false, + errors: [ + "No adversarial review exists. Run 'adversarial_review' before approving.", + ], + }; + } + const review = JSON.parse( + new TextDecoder().decode(reviewContent), + ) as AdversarialReviewData; + + if (review.planVersion !== plan.version) { + return { + pass: false, + errors: [ + `Adversarial review is for plan v${review.planVersion} but current plan is v${plan.version}. Re-run 'adversarial_review'.`, + ], + }; + } + + const unresolved = review.findings.filter( + (f) => + !f.resolved && + (f.severity === "critical" || f.severity === "high"), + ); + + if (unresolved.length > 0) { + const ids = unresolved.map((f) => f.id).join(", "); + return { + pass: false, + errors: [ + `${unresolved.length} unresolved critical/high finding(s): ${ids}. Resolve these before approving.`, + ], + }; + } + + return { pass: true }; + }, + }, + "tracker-available": { description: "Checks that the issue tracker backend is available and authenticated", @@ -631,6 +723,211 @@ export const model = { }, }, + adversarial_review: { + description: + "Record adversarial review findings for the current plan version", + arguments: z.object({ + findings: z.array(AdversarialFindingSchema), + }), + execute: async ( + args: { + findings: { + id: string; + severity: "critical" | "high" | "medium" | "low"; + category: + | "architecture" + | "scope" + | "risk" + | "testing" + | "complexity" + | "correctness"; + description: string; + resolved?: boolean; + resolutionNote?: string; + }[]; + }, + context: { + globalArgs: { repo: string; issueNumber: number }; + logger: { + info: (msg: string, props: Record) => void; + warning: (msg: string, props: Record) => void; + }; + writeResource: ( + specName: string, + instanceName: string, + data: Record, + ) => Promise<{ name: string }>; + readResource: ( + instanceName: string, + version?: number, + ) => Promise | null>; + }, + ) => { + const { repo, issueNumber } = context.globalArgs; + const handles = []; + + const plan = await context.readResource!("plan-main") as + | PlanData + | null; + const planVersion = plan ? plan.version : 0; + + const findings = args.findings.map((f) => ({ + ...f, + resolved: f.resolved ?? false, + })); + + handles.push( + await context.writeResource( + "adversarialReview", + "adversarialReview-main", + { + planVersion, + findings, + reviewedAt: new Date().toISOString(), + }, + ), + ); + + const critical = findings.filter((f) => f.severity === "critical") + .length; + const high = findings.filter((f) => f.severity === "high").length; + const medium = findings.filter((f) => f.severity === "medium").length; + const low = findings.filter((f) => f.severity === "low").length; + + context.logger.info( + "Adversarial review for plan v{planVersion}: {critical} critical, {high} high, {medium} medium, {low} low", + { planVersion, critical, high, medium, low }, + ); + + const findingsText = findings + .map((f) => + `- **${f.id}** [${f.severity}/${f.category}]: ${f.description}` + ) + .join("\n"); + + const blockers = critical + high; + const status = blockers > 0 + ? `\u{1F6D1} **${blockers} blocking finding(s)** must be resolved before approval` + : "\u{2705} No blocking findings — ready for approval"; + + await tracker.postComment( + repo, + issueNumber, + [ + `\u{1F50D} **Adversarial review** (plan v${planVersion})`, + "", + findingsText, + "", + status, + ].join("\n"), + ); + + return { dataHandles: handles }; + }, + }, + + resolve_findings: { + description: + "Mark adversarial review findings as resolved after plan revision", + arguments: z.object({ + resolutions: z.array(z.object({ + findingId: z.string().describe("Finding ID to resolve, e.g. ADV-1"), + resolutionNote: z.string().describe( + "How this finding was addressed in the revised plan", + ), + })), + }), + execute: async ( + args: { + resolutions: { findingId: string; resolutionNote: string }[]; + }, + context: { + globalArgs: { repo: string; issueNumber: number }; + logger: { + info: (msg: string, props: Record) => void; + warning: (msg: string, props: Record) => void; + }; + writeResource: ( + specName: string, + instanceName: string, + data: Record, + ) => Promise<{ name: string }>; + readResource: ( + instanceName: string, + version?: number, + ) => Promise | null>; + }, + ) => { + const { repo, issueNumber } = context.globalArgs; + const handles = []; + + const current = await context.readResource!( + "adversarialReview-main", + ) as AdversarialReviewData | null; + if (!current) { + throw new Error( + "No adversarial review exists. Run 'adversarial_review' first.", + ); + } + + const resolutionMap = new Map( + args.resolutions.map((r) => [r.findingId, r.resolutionNote]), + ); + + const updatedFindings = current.findings.map((f) => { + const note = resolutionMap.get(f.id); + if (note) { + return { ...f, resolved: true, resolutionNote: note }; + } + return f; + }); + + handles.push( + await context.writeResource( + "adversarialReview", + "adversarialReview-main", + { + planVersion: current.planVersion, + findings: updatedFindings, + reviewedAt: new Date().toISOString(), + }, + ), + ); + + const resolved = args.resolutions.length; + const remaining = updatedFindings.filter( + (f) => + !f.resolved && + (f.severity === "critical" || f.severity === "high"), + ).length; + + context.logger.info( + "Resolved {resolved} finding(s), {remaining} blocking finding(s) remain", + { resolved, remaining }, + ); + + const resolvedText = args.resolutions + .map((r) => `- **${r.findingId}**: ${r.resolutionNote}`) + .join("\n"); + + await tracker.postComment( + repo, + issueNumber, + [ + `\u{2705} **Findings resolved** (${resolved})`, + "", + resolvedText, + "", + remaining > 0 + ? `\u{1F6D1} ${remaining} blocking finding(s) remain` + : "\u{2705} All blocking findings resolved — ready for approval", + ].join("\n"), + ); + + return { dataHandles: handles }; + }, + }, + approve: { description: "Approve the current plan and post it to the issue", arguments: z.object({}),