systeminit · stack72 · Mar 26, 2026 · Mar 26, 2026
diff --git a/.claude/skills/issue-lifecycle/SKILL.md b/.claude/skills/issue-lifecycle/SKILL.md
@@ -53,48 +53,184 @@ systeminit/swamp#850":
    ```
 
 5. **Generate an implementation plan**:
+
+   First, write a single YAML file (e.g. `/tmp/plan.yaml`) containing both
+   `steps` and `potentialChallenges` as top-level keys. The CLI only supports
+   one `--input-file` flag per invocation, and the file must be a YAML object
+   (not a bare array).
+
+   ```yaml
+   # /tmp/plan.yaml
+   steps:
+     - order: 1
+       description: "Add the new schema types"
+       files:
+         - src/domain/schemas.ts
+       risks: "May need migration"
+     - order: 2
+       description: "Update the service layer"
+       files:
+         - src/domain/service.ts
+         - src/domain/service_test.ts
+   potentialChallenges:
+     - "Backwards compatibility with existing data"
+     - "Test coverage for edge cases"
+   ```
+
+   Then run the method. The `--input` flags handle scalar values, and
+   `--input-file` handles the structured data:
+
    ```
    swamp model method run issue-<N> plan \
      --input summary="..." \
      --input dddAnalysis="..." \
-     --input-file steps.json \
      --input testingStrategy="..." \
-     --input-file potentialChallenges.json --json
+     --input-file /tmp/plan.yaml --json
    ```
 
-6. **Show the plan to the human and ask for feedback.** Present it clearly:
-   - Summary
-   - Each step with files and risks
-   - Testing strategy
-   - Potential challenges
-   - Then ask: "Does this plan look right, or do you have feedback?"
+6. **Show the plan to the human**, then **run adversarial review** (see below).
+
+## Adversarial Plan Review
+
+After **every** `plan` or `iterate` call, you MUST run an adversarial review
+before the human can approve. This is automatic — do not skip it.
+
+### Step 1: Challenge the plan
+
+Re-read the plan, then critically evaluate it across these dimensions:
+
+- **Architecture**: Does this follow DDD principles? Are domain boundaries
+  correct? Is this the right abstraction level? Are there better patterns?
+- **Scope**: Is this doing too much or too little? Does it match the issue? Is
+  there scope creep? Are there unnecessary changes?
+- **Risk**: Are all failure modes identified? What about edge cases, race
+  conditions, backwards compatibility? What could go wrong that isn't listed?
+- **Testing**: Is the testing strategy sufficient? Are edge cases covered? Are
+  there integration test gaps? Is there over-reliance on unit tests?
+- **Complexity**: Is this over-engineered? Could it be simpler? Are there
+  unnecessary abstractions or indirections?
+- **Correctness**: Will this actually solve the problem? Are there logical gaps?
+  Does the approach match established patterns in the codebase?
+
+### Step 2: Verify against the codebase
+
+For each plan step, **read the actual code**:
+
+- Read every file referenced in the plan steps — verify they exist
+- Confirm functions, classes, and types exist where claimed
+- Grep for related code the plan might have missed
+- Check for conflicts with existing patterns or naming conventions
+- Verify that proposed test paths and test patterns match the codebase
+- Look for code that already does what a step proposes (duplication risk)
+
+### Step 3: Record findings
+
+Write findings to a YAML file (e.g. `/tmp/findings.yaml`) as a YAML object with
+a `findings` key:
+
+```yaml
+# /tmp/findings.yaml
+findings:
+  - id: ADV-1
+    severity: high
+    category: architecture
+    description: "The plan adds a new service but doesn't define its domain boundary"
+  - id: ADV-2
+    severity: medium
+    category: testing
+    description: "No integration tests planned for the new API endpoint"
+```
+
+Then record them:
+
+```
+swamp model method run issue-<N> adversarial_review \
+  --input-file /tmp/findings.yaml --json
+```
+
+Each finding must have:
+
+- `id`: Sequential identifier (ADV-1, ADV-2, ...)
+- `severity`: critical, high, medium, or low
+- `category`: architecture, scope, risk, testing, complexity, or correctness
+- `description`: Clear explanation of the concern
+
+**Critical/high findings block approval.** Medium/low are shown as warnings.
+
+### Step 4: Present to the human
+
+Show findings grouped by severity:
+
+1. Critical findings (blockers)
+2. High findings (blockers)
+3. Medium findings (warnings)
+4. Low findings (warnings)
+
+If there are blocking findings, say:
+
+> "The adversarial review found N blocking issue(s) that need to be addressed
+> before approval. Here's what needs to change: ..."
+
+If no blocking findings, say:
+
+> "Adversarial review passed — no blocking findings. N warning(s) noted. Ready
+> for your approval when you are."
 
 ## Plan Iteration Loop
 
-When the human gives feedback (they almost always will on the first plan):
+When the human gives feedback OR adversarial findings need addressing:
+
+1. **Call iterate** with both the feedback text and your revised plan. Write the
+   revised `steps` and `potentialChallenges` to a YAML file (same format as the
+   `plan` step above), then run:
 
-1. **Call iterate** with both the feedback text and your revised plan:
    ```
    swamp model method run issue-<N> iterate \
-     --input feedback="<human's feedback>" \
+     --input feedback="<human's feedback or adversarial findings>" \
      --input summary="..." \
-     --input-file steps.json \
-     ... --json
+     --input dddAnalysis="..." \
+     --input testingStrategy="..." \
+     --input-file /tmp/plan.yaml --json
+   ```
+
+2. **Resolve addressed findings**. Write resolutions to a YAML file:
+
+   ```yaml
+   # /tmp/resolutions.yaml
+   resolutions:
+     - findingId: ADV-1
+       resolutionNote: "Added domain boundary definition in step 2"
+     - findingId: ADV-2
+       resolutionNote: "Added integration test step covering the new endpoint"
    ```
 
-2. **Show what changed** between the previous and new plan version. Be specific:
+   ```
+   swamp model method run issue-<N> resolve_findings \
+     --input-file /tmp/resolutions.yaml --json
+   ```
+
+3. **Re-run adversarial review** on the new plan version. The review must be
+   current with the latest plan version — stale reviews block approval.
+
+4. **Show what changed** between the previous and new plan version. Be specific:
    - "Reordered steps 2 and 3 based on your feedback"
    - "Added regression analysis for module X"
    - "Removed step 4 — you're right, that's already handled by..."
 
-3. **Ask for feedback again.** Keep iterating until the human says one of:
-   - "approve", "approved", "looks good", "ship it", "go", "LGTM"
+5. **Ask for feedback again.** Keep iterating until:
+   - All critical/high adversarial findings are resolved, AND
+   - The human says: "approve", "approved", "looks good", "ship it", "go",
+     "LGTM"
 
-4. Only then call `approve`:
+6. Only then call `approve`:
    ```
    swamp model method run issue-<N> approve --json
    ```
 
+**The `approve` method will fail** if critical/high findings are unresolved or
+if the adversarial review is stale (wrong plan version). This is enforced by the
+`adversarial-review-clear` pre-flight check.
+
 ## Implementation & CI Flow
 
 After plan approval, when the human says to implement:
@@ -107,17 +243,31 @@ After plan approval, when the human says to implement:
      --input prNumber=<N> --json
    ```
 
-4. **Wait for CI to complete**, then fetch results:
+4. **Wait 3 minutes for CI to start.** Use `sleep 180` — CI takes at least this
+   long, so there's no point polling earlier.
+
+5. **Poll for CI results.** Call `ci_status` and check if checks are still
+   pending. If pending, wait 60 seconds and retry. Keep polling until all checks
+   have completed (passed or failed). Do NOT hand control back to the human
+   during this wait — stay in the loop.
+
    ```
    swamp model method run issue-<N> ci_status --json
    ```
 
-5. **Show the CI results** to the human, grouped by reviewer and severity:
+6. **Show the CI results** to the human, grouped by reviewer and severity:
    - Which checks passed/failed
    - Review comments grouped by reviewer
    - Comments sorted by severity (critical first)
 
-6. **When the human directs fixes**, parse their instruction:
+7. **If everything is green and approved**, the PR will auto-merge. Call
+   `complete` immediately — no need to ask the human:
+   ```
+   swamp model method run issue-<N> complete --json
+   ```
+
+8. **If there are failures or review comments**, present them and wait for the
+   human's direction. Parse their instruction:
    - "fix the CRITICAL issues from adversarial review" ->
      `targetReview: "claude-adversarial-review"`, `targetSeverity: "critical"`
    - "address all the review comments" -> no filters
@@ -130,11 +280,14 @@ After plan approval, when the human says to implement:
      --input targetSeverity="<severity>" --json
    ```
 
-7. **Make the fixes**, push, wait for CI, then call `ci_status` again
-8. **Loop until clean** or the human says to complete:
-   ```
-   swamp model method run issue-<N> complete --json
-   ```
+9. **After pushing fixes, loop back to step 4.** Wait 3 minutes, poll for CI,
+   show results. Repeat until clean or the human says to stop.
+
+**IMPORTANT:** Do not break out of this loop voluntarily. The human should never
+have to manually check CI or come back to ask "what happened?" — the skill stays
+in the conversation and drives through to completion. If the human wants to
+break out (e.g. "I'll come back to this later", "pause", "stop"), respect that
+immediately — but it must be their decision, never yours.
 
 ## Reviewing Plan History
 

diff --git a/extensions/models/_lib/schemas.ts b/extensions/models/_lib/schemas.ts
@@ -61,6 +61,8 @@ export const TRANSITIONS: Record<string, Phase[]> = {
   approve: ["plan_generated"],
   implement: ["approved"],
   ci_status: ["implementing"],
+  adversarial_review: ["plan_generated"],
+  resolve_findings: ["plan_generated"],
   fix: ["ci_review"],
   complete: ["ci_review"],
 };
@@ -161,6 +163,32 @@ export const CiResultsSchema = z.object({
   fetchedAt: z.string(),
 });
 
+export const AdversarialFindingSchema = z.object({
+  id: z.string().describe("Unique finding identifier, e.g. ADV-1"),
+  severity: z.enum(["critical", "high", "medium", "low"]),
+  category: z.enum([
+    "architecture",
+    "scope",
+    "risk",
+    "testing",
+    "complexity",
+    "correctness",
+  ]),
+  description: z.string(),
+  resolved: z.boolean().default(false),
+  resolutionNote: z.string().optional(),
+});
+
+export const AdversarialReviewSchema = z.object({
+  planVersion: z.number().describe(
+    "The plan version this review applies to",
+  ),
+  findings: z.array(AdversarialFindingSchema),
+  reviewedAt: z.string(),
+});
+
+export type AdversarialReviewData = z.infer<typeof AdversarialReviewSchema>;
+
 export const FixDirectiveSchema = z.object({
   round: z.number(),
   directive: z.string(),