Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 81 additions & 32 deletions ship/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1561,19 +1561,43 @@ For each item, note:
- The item text (verbatim or concise summary)
- Its category: CODE | TEST | MIGRATION | CONFIG | DOCS

### Verification Mode

Before judging completion, classify HOW each item can be verified. The diff alone cannot prove every kind of work. Items outside the current repo or system are structurally invisible to `git diff`.

- **DIFF-VERIFIABLE** — A code change in this repo would manifest in `git diff <base>...HEAD`. Examples: "add UserService" (file appears), "validate input X" (validation logic appears), "create users table" (migration file appears).
- **CROSS-REPO** — Item names a file or change in a sibling repo (e.g., `domain-hq/docs/dashboard.md`, `~/Development/<other-repo>/...`). The current diff CANNOT prove this.
- **EXTERNAL-STATE** — Item names state in an external system: Supabase config/RLS, Cloudflare DNS, Vercel env vars, OAuth provider allowlists, third-party SaaS, DNS records. The current diff CANNOT prove this.
- **CONTENT-SHAPE** — Item requires a file to follow a specific convention. If the file is in this repo: diff-verifiable. If in another repo or system: see CROSS-REPO / EXTERNAL-STATE.

**Verification dispatch:**

- **DIFF-VERIFIABLE** → cross-reference against diff (next section).
- **CROSS-REPO** → if the sibling repo is reachable on disk (try `~/Development/<repo>/`, `~/code/<repo>/`, the parent of the current repo), run `[ -f <path> ]` to check file existence. File exists → DONE (cite path). File missing → NOT DONE (cite path). Path unreachable → UNVERIFIABLE (cite what needs manual check).
- **EXTERNAL-STATE** → UNVERIFIABLE. Cite the system and the specific check the user must perform.
- **CONTENT-SHAPE in another repo** → if the file exists, run any project-detected validator (see "Validator detection" below) before falling back to UNVERIFIABLE. With a validator: pass → DONE; fail → NOT DONE (cite validator output). No validator available: classify UNVERIFIABLE and cite both the file path and the convention to confirm.

**Path concreteness rule.** If a plan item names a *concrete filesystem path* (absolute, `~/...`, or `<sibling-repo>/<file>`), it MUST be classified DONE or NOT DONE based on `[ -f <path> ]`. UNVERIFIABLE is only valid when the path is genuinely abstract ("Cloudflare DNS", "Supabase allowlist") or the sibling root is unreachable on this machine. "I don't want to check" is not unreachable.

**Validator detection.** Before falling back to UNVERIFIABLE on a CONTENT-SHAPE item, scan the target repo's `package.json` for any script matching `validate-*`, `lint-wiki`, `check-docs`, or similar. If found, invoke it with the relevant path argument (e.g., `npm run validate-wiki -- <path>`). For multi-target validators (e.g., `validate-wiki --all`), run once and reconcile per-item from the output. A passing validator promotes the item from UNVERIFIABLE to DONE; a failing one demotes to NOT DONE.

**Honesty rule.** Do NOT classify an item as DONE just because related code shipped. Code that *handles* a deliverable is not the deliverable. Shipping a markdown-extraction library is not the same as shipping the markdown file. When in doubt between DONE and UNVERIFIABLE, prefer UNVERIFIABLE — better to surface a confirmation prompt than silently miss a deliverable.

### Cross-Reference Against Diff

Run `git diff origin/<base>...HEAD` and `git log origin/<base>..HEAD --oneline` to understand what was implemented.

For each extracted plan item, check the diff and classify:
For each extracted plan item, run the verification dispatch from the previous section, then classify:

- **DONE** — Clear evidence in the diff that this item was implemented. Cite the specific file(s) changed.
- **PARTIAL** — Some work toward this item exists in the diff but it's incomplete (e.g., model created but controller missing, function exists but edge cases not handled).
- **NOT DONE** — No evidence in the diff that this item was addressed.
- **DONE** — Clear evidence the item shipped. For DIFF-VERIFIABLE items: cite the specific file(s) changed in the diff. For CROSS-REPO items with a reachable sibling repo: cite the path that exists.
- **PARTIAL** — Some work toward this item exists but is incomplete (e.g., model created but controller missing, function exists but edge cases not handled).
- **NOT DONE** — Verification ran and produced negative evidence (file missing, code absent in diff, sibling-repo file confirmed absent).
- **CHANGED** — The item was implemented using a different approach than the plan described, but the same goal is achieved. Note the difference.
- **UNVERIFIABLE** — The diff and any reachable sibling-repo checks cannot prove or disprove this. Always applies to EXTERNAL-STATE items and to CROSS-REPO items where the sibling repo isn't reachable. Cite the specific manual verification the user must perform (e.g., "check Cloudflare DNS shows DNS-only mode for dashboard.example.com", "confirm /docs/dashboard.md exists in domain-hq repo").

**Be conservative with DONE** — require clear evidence in the diff. A file being touched is not enough; the specific functionality described must be present.
**Be conservative with DONE** — require clear evidence. A file being touched is not enough; the specific functionality described must be present.
**Be generous with CHANGED** — if the goal is met by different means, that counts as addressed.
**Be honest with UNVERIFIABLE** — better to surface 5 items the user must manually confirm than silently classify them DONE.

### Output Format

Expand All @@ -1583,56 +1607,81 @@ PLAN COMPLETION AUDIT
Plan: {plan file path}

## Implementation Items
[DONE] Create UserService — src/services/user_service.rb (+142 lines)
[PARTIAL] Add validation — model validates but missing controller checks
[NOT DONE] Add caching layer — no cache-related changes in diff
[CHANGED] "Redis queue" → implemented with Sidekiq instead
[DONE] Create UserService — src/services/user_service.rb (+142 lines)
[PARTIAL] Add validation — model validates but missing controller checks
[NOT DONE] Add caching layer — no cache-related changes in diff
[CHANGED] "Redis queue" → implemented with Sidekiq instead

## Test Items
[DONE] Unit tests for UserService — test/services/user_service_test.rb
[NOT DONE] E2E test for signup flow
[DONE] Unit tests for UserService — test/services/user_service_test.rb
[NOT DONE] E2E test for signup flow

## Migration Items
[DONE] Create users table — db/migrate/20240315_create_users.rb
[DONE] Create users table — db/migrate/20240315_create_users.rb

## Cross-Repo / External Items
[DONE] sibling-repo has /docs/dashboard.md — verified at ~/Development/sibling-repo/docs/dashboard.md
[UNVERIFIABLE] Cloudflare DNS-only on api.example.com — external system, manual check required
[UNVERIFIABLE] Supabase auth allowlist contains user email — external system, confirm in Supabase dashboard

─────────────────────────────────
COMPLETION: 4/7 DONE, 1 PARTIAL, 1 NOT DONE, 1 CHANGED
COMPLETION: 5/9 DONE, 1 PARTIAL, 1 NOT DONE, 1 CHANGED, 2 UNVERIFIABLE
─────────────────────────────────
```

### Gate Logic

After producing the completion checklist:
After producing the completion checklist, evaluate in priority order:

- **All DONE or CHANGED:** Pass. "Plan completion: PASS — all items addressed." Continue.
- **Only PARTIAL items (no NOT DONE):** Continue with a note in the PR body. Not blocking.
- **Any NOT DONE items:** Use AskUserQuestion:
- Show the completion checklist above
- "{N} items from the plan are NOT DONE. These were part of the original plan but are missing from the implementation."
- RECOMMENDATION: depends on item count and severity. If 1-2 minor items (docs, config), recommend B. If core functionality is missing, recommend A.
- Options:
A) Stop — implement the missing items before shipping
B) Ship anyway — defer these to a follow-up (will create P1 TODOs in Step 5.5)
C) These items were intentionally dropped — remove from scope
- If A: STOP. List the missing items for the user to implement.
- If B: Continue. For each NOT DONE item, create a P1 TODO in Step 5.5 with "Deferred from plan: {plan file path}".
- If C: Continue. Note in PR body: "Plan items intentionally dropped: {list}."
1. **Any NOT DONE items** (highest priority — known missing work). Use AskUserQuestion:
- Show the completion checklist above
- "{N} items from the plan are NOT DONE. These were part of the original plan but are missing from the implementation."
- RECOMMENDATION: depends on item count and severity. If 1-2 minor items (docs, config), recommend B. If core functionality is missing, recommend A.
- Options:
A) Stop — implement the missing items before shipping
B) Ship anyway — defer these to a follow-up (will create P1 TODOs in Step 5.5)
C) These items were intentionally dropped — remove from scope
- If A: STOP. List the missing items for the user to implement.
- If B: Continue. For each NOT DONE item, create a P1 TODO in Step 5.5 with "Deferred from plan: {plan file path}".
- If C: Continue. Note in PR body: "Plan items intentionally dropped: {list}."

2. **Any UNVERIFIABLE items** (silent gaps — the diff cannot prove them either way). Only fires after NOT DONE is resolved or absent.

**Per-item confirmation is mandatory.** Do NOT use a single AskUserQuestion to blanket-confirm all UNVERIFIABLE items. Blanket confirmation is the failure mode that surfaced in VAS-449 (user clicks A without opening any file). Instead:

- Loop through UNVERIFIABLE items one at a time.
- For each item, use AskUserQuestion with the item's *specific* manual check (e.g., "Confirm: does `~/Development/domain-hq/docs/dashboard.md` exist?", not "Have you checked all items?").
- Options per item:
Y) Confirmed done — cite what you verified (free-text, embedded in PR body)
N) Not done — block ship; treat as NOT DONE and re-enter the priority-1 gate
D) Intentionally dropped — note in PR body: "Plan item intentionally dropped: {item}"
- RECOMMENDATION per item: Y if the item is concrete and easily verified; N if it's critical-path (auth, DNS, deliverables to other repos) and the user shows hesitation.

**Exit conditions:**
- Any N: STOP. Surface the missing items, suggest re-running /ship after they're addressed.
- All Y or D: Continue. Embed `## Plan Completion — Manual Verifications` section in PR body listing each Y'd item with the user's free-text evidence and each D'd item with "intentionally dropped".

**Cap.** If there are more than 5 UNVERIFIABLE items, present them as a numbered list first and ask whether the user wants to (1) confirm each individually, (2) stop and reduce scope, or (3) explicitly accept blanket-confirmation with the warning that this is the VAS-449 failure shape. Default and recommended option is (1).

3. **Only PARTIAL items (no NOT DONE, no UNVERIFIABLE):** Continue with a note in the PR body. Not blocking.

4. **All DONE or CHANGED:** Pass. "Plan completion: PASS — all items addressed." Continue.

**No plan file found:** Skip entirely. "No plan file detected — skipping plan completion audit."

**Include in PR body (Step 8):** Add a `## Plan Completion` section with the checklist summary.
>
> After your analysis, output a single JSON object on the LAST LINE of your response (no other text after it):
> `{"total_items":N,"done":N,"changed":N,"deferred":N,"summary":"<markdown checklist for PR body>"}`
> `{"total_items":N,"done":N,"changed":N,"deferred":N,"unverifiable":N,"summary":"<markdown checklist for PR body>"}`

**Parent processing:**

1. Parse the LAST line of the subagent's output as JSON.
2. Store `done`, `deferred` for Step 20 metrics; use `summary` in PR body.
3. If `deferred > 0` and no user override, present the deferred items via AskUserQuestion before continuing.
4. Embed `summary` in PR body's `## Plan Completion` section (Step 19).
2. Store `done`, `deferred`, `unverifiable` for Step 20 metrics; use `summary` in PR body.
3. If `deferred > 0` or `unverifiable > 0` and no user override, present the items via the appropriate AskUserQuestion (see Gate Logic priority order above) before continuing.
4. Embed `summary` in PR body's `## Plan Completion` section (Step 19). If `unverifiable > 0` and the user picked option A in the UNVERIFIABLE gate, also embed `## Plan Completion — Manual Verifications` listing each user-confirmed item.

**If the subagent fails or returns invalid JSON:** Fall back to running the audit inline. Never block /ship on subagent failure.
**If the subagent fails or returns invalid JSON:** Fall back to running the audit inline (parent processes the same plan-extraction + classification logic). If the inline fallback also fails (e.g., plan file unreadable, parser error), do NOT silently pass — surface the failure as an explicit AskUserQuestion: "Plan Completion audit could not run ({reason}). Options: (A) Skip audit and ship anyway — record that the audit was skipped in PR body and Step 20 metrics; (B) Stop and fix the audit." Default and recommended option is (B). Silent fail-open is the failure shape that VAS-449 surfaced.

---

Expand Down
38 changes: 38 additions & 0 deletions test/ship-plan-completion-invariants.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import { describe, test, expect } from 'bun:test';
import * as fs from 'fs';
import * as path from 'path';

const SHIP_SKILL = path.join(__dirname, '..', 'ship', 'SKILL.md');

describe('ship/SKILL.md — Plan Completion gate invariants (VAS-449 remediation)', () => {
const skill = fs.readFileSync(SHIP_SKILL, 'utf8');

test('Path concreteness rule: filesystem-pathed items must be test -f checked', () => {
expect(skill).toContain('**Path concreteness rule.**');
expect(skill).toMatch(/concrete filesystem path/);
expect(skill).toMatch(/MUST be classified DONE or NOT DONE based on `\[ -f/);
});

test('Validator detection: project package.json validate-* scripts are auto-run', () => {
expect(skill).toContain('**Validator detection.**');
expect(skill).toMatch(/package\.json/);
expect(skill).toMatch(/validate-\*/);
});

test('Per-item UNVERIFIABLE confirmation: blanket-confirm is forbidden', () => {
expect(skill).toContain('**Per-item confirmation is mandatory.**');
expect(skill).toMatch(/Do NOT use a single AskUserQuestion to blanket-confirm/);
expect(skill).toMatch(/VAS-449/);
});

test('Subagent failure: fail-closed, not silent fail-open', () => {
expect(skill).not.toMatch(/Never block \/ship on subagent failure\.\s*$/m);
expect(skill).toMatch(/Silent fail-open is the failure shape that VAS-449 surfaced/);
expect(skill).toMatch(/Stop and fix the audit/);
});

test('CONTENT-SHAPE dispatch invokes validator before falling back to UNVERIFIABLE', () => {
expect(skill).toMatch(/CONTENT-SHAPE in another repo.*validator/s);
expect(skill).toMatch(/passing validator promotes the item from UNVERIFIABLE to DONE/);
});
});