From 93cc2d1b78d0033860afe1cecc98d1611c45483a Mon Sep 17 00:00:00 2001 From: CL Kao Date: Fri, 5 Jun 2026 14:06:41 -0700 Subject: [PATCH 1/2] test: net-remove the AC-3 sweep reader-axis taint machinery The reader-axis taint-FLOW layer (instructionTaintedNames/Fields, readsTaintedField, lvalueName, isStringyType, the transitive reader fixpoint, and the name-taint branches of exprInstructionTainted) plus the reader-axis planted-control tests (TestSweepDetectsEvasionShapes / TestHostneutralitySweepDetectsEvasionShapes and the assertRedThenGreen helpers, the multi-hop fixpoint cases) are removed. The sweep keeps the match-axis core: the named-reader allowlist, the recognized path-ident allowlist, the direct-read predicate (a read sink whose path arg carries a recognized instruction literal/segment/var, or a WalkDir .md collector) and the mutation control. A per-package go/ast scan structurally cannot see a cross-package read or a path built in another file, so the reader axis is documented as detached-adversarial-audit-backstopped (sweep docstrings + the validation-stage policy), not statically guarded. Per-file net-negative: HN 731->407 (-324), integration 794->430 (-364). Offline go test ./... green (1164 passed); workflow contract VALID. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/dev/README.md | 1 + internal/hostneutrality/nonac_marker_test.go | 478 +++--------------- skills/integration/nonac_marker_test.go | 506 +++---------------- 3 files changed, 149 insertions(+), 836 deletions(-) diff --git a/docs/dev/README.md b/docs/dev/README.md index b9bb93b5..1a6ef338 100644 --- a/docs/dev/README.md +++ b/docs/dev/README.md @@ -120,6 +120,7 @@ A task moves to validation after implementation is complete. The work here is to - **What it produces:** the auditor works on a separate throwaway checkout (never the implementation worktree) and never mutates the deliverable. It tries to REFUTE the validation — construct an adversarial edit that the deliverable's own tests should catch, and confirm they do. A test that stays green under an edit that breaks the claim is a hole. Findings come in two tiers — `Material:` (a real correctness or test-strength hole, e.g. an assertion that green-lights a regression) and `Polish:` (non-blocking). "Refuted nothing material" is itself a valid, recorded outcome. - **How it is recorded:** material findings route back through the normal validation→implementation feedback flow (a `### Feedback Cycles` entry naming the audit and its adversarial edit); the gate is not presented as clean until they are closed. A clean audit is noted in the gate's reviewer-findings block (or a one-line "detached audit: no material findings"). - **Why:** the audit catches the class of hole where the test passes but would also pass on a broken future edit — which validation, trusting its own green suite, cannot see. Real catches on the record: #262 (`binary-absent-fo-bootstrap`) — validation passed correct prose, the audit then found two test-strength holes in `contract_gate_test.go` (a `strings.Count(...) > 0` check that skipped on zero mentions, and a bare `strings.Contains` satisfied by a negated disclaimer); `1x` (`code-cleanups-0193`) AC-6 and `external-tracker-checkpoint` AC-6 (a self-referential "verified by review of this entity's own section" that can never fail); `7h` (`release-notes-local-summary`) AC-3 (validation passed, the audit found the tag-cut folded the notes block into the tag subject instead of the body). This is read-only refutation, not a second implementation pass. + - **AC-3 sweep reader axis — this audit IS the backstop:** the offline AC-3 sweeps (`TestNoUndeclaredTautologicalProof`, `TestNoUndeclaredHostneutralityTautology`) statically guard the MATCH axis — once a read of a recognized instruction file is detected, the test must self-classify (`markNonAC`/`markCodeBoundInvariant`) regardless of the inspection idiom. They deliberately do NOT guard the READER axis — whether an undeclared instruction-file read hides via an undiscovered read SHAPE (a path built across statements/params/fields, a transitive helper chain, a range-element flow, a cross-package reader, a package var in another file, or an unrecognized surface like `AGENTS.md`/`mods/*.md`). A per-package `go/ast` scan structurally cannot see a cross-package read or a path built elsewhere, so chasing reader-shape completeness is the enumeration trap the proof policy itself warns against. The reader axis is covered by THIS detached adversarial audit at the high-stakes gate, not by the static sweep — a reader-shape evasion is a documented, audited boundary, not a silent gap. ### `done` diff --git a/internal/hostneutrality/nonac_marker_test.go b/internal/hostneutrality/nonac_marker_test.go index 4b271e21..40416722 100644 --- a/internal/hostneutrality/nonac_marker_test.go +++ b/internal/hostneutrality/nonac_marker_test.go @@ -36,13 +36,13 @@ func markCodeBoundInvariant(t *testing.T, source string) { } } -// instructionFileReaders are the helpers that read a markdown instruction file the -// model ingests (a skill or contract) — the seed of the reader set the sweep grows -// to a fixpoint. A test that calls one is reading an ingested file (the READ alone -// triggers the must-declare rule; how it then inspects the bytes is irrelevant). -// Tests that scan CODE (host_neutrality_test.go's scanFile over .go files via -// parser.ParseFile, not a content read sink) are NOT in this set, so the sweep does -// not flag the legitimate go/parser code invariants. +// instructionFileReaders are the named helpers that read a markdown instruction file +// the model ingests (a skill or contract) — the allowlist of recognized readers. A +// test that calls one is reading an ingested file (the READ alone triggers the +// must-declare rule; how it then inspects the bytes is irrelevant). Tests that scan +// CODE (host_neutrality_test.go's scanFile over .go files via parser.ParseFile, not a +// content read sink) are NOT in this set, so the sweep does not flag the legitimate +// go/parser code invariants. var instructionFileReaders = map[string]bool{ "readSkill": true, "readText": true, @@ -54,13 +54,13 @@ var instructionFileReaders = map[string]bool{ } // instructionPathIdents are the package-level path variables that resolve to a -// markdown instruction file. A test that reads one of these via a read sink -// (os.ReadFile/os.Open/io.ReadAll/bufio) is reading an ingested file — the read -// triggers the must-declare rule regardless of how the bytes are then inspected. -// (Code-scanning tests reference ../dispatch, ../status package dirs, never these.) -// Path vars defined in ANOTHER file of this package are an out-of-scope flow (M-C, -// tracked in sweep-guard-reader-axis-invert) only insofar as the var name is not in -// this list; the listed names are recognized wherever read. +// markdown instruction file — a declared allowlist of recognized path-carrying vars. +// A test that reads one of these via a read sink (os.ReadFile/os.Open/io.ReadAll/ +// bufio) is reading an ingested file — the read triggers the must-declare rule +// regardless of how the bytes are then inspected. (Code-scanning tests reference +// ../dispatch, ../status package dirs, never these.) A path var NOT in this list is +// not recognized; whether such a read should declare is an instance of the +// reader-axis bound the detached adversarial audit backstops (see the sweep doc). var instructionPathIdents = map[string]bool{ "foCorePath": true, "ensignCorePath": true, @@ -78,37 +78,39 @@ var instructionPathIdents = map[string]bool{ // TestNoUndeclaredHostneutralityTautology is the AC-3 sweep for this package, // re-runnable offline. It parses every *_test.go and flags any test function that // READS a recognized markdown INSTRUCTION file's content — via a named reader -// helper, a tainted os.ReadFile/os.Open/io read, or a WalkDir-collected `.md`, -// through the reader-axis flow shapes the taint covers (below) — UNLESS it -// self-classifies via markNonAC or markCodeBoundInvariant. The go/parser code-scan -// invariants (host_neutrality_test.go's scanFile over .go source via parser.ParseFile, -// NOT a content read sink) and the spanHostQualified unit test are NOT flagged: they -// read no instruction file. The undeclared-offender count is the AC-3 metric; it -// must be zero. +// helper, a DIRECT os.ReadFile/os.Open/io read of a recognized instruction +// literal/segment/path-ident, or a WalkDir-collected `.md` — UNLESS it self-classifies +// via markNonAC or markCodeBoundInvariant. The go/parser code-scan invariants +// (host_neutrality_test.go's scanFile over .go source via parser.ParseFile, NOT a +// content read sink) and the spanHostQualified unit test are NOT flagged: they read no +// instruction file. The undeclared-offender count is the AC-3 metric; it must be zero. // -// What the guard actually guarantees (two axes — one closed, one bounded): +// What the guard guarantees, and what it deliberately does NOT (two axes): // -// - MATCH axis (closed, universal, load-bearing): the sweep keys on the READ, not -// on how the bytes are inspected. ONCE a read of a recognized instruction file -// is detected, the test MUST declare regardless of the inspection idiom -// (strings.Contains/Index/EqualFold, bytes.*, regexp.Regexp.Match, a bare ==) — -// the trigger is the ingest, not the match, so the whole match class is closed. +// - MATCH axis (closed, universal, load-bearing — STATICALLY GUARDED): the sweep +// keys on the READ, not on how the bytes are inspected. ONCE a read of a +// recognized instruction file is detected, the test MUST declare regardless of +// the inspection idiom (strings.Contains/Index/EqualFold, bytes.*, +// regexp.Regexp.Match, a bare ==) — the trigger is the ingest, not the match, so +// the whole match class is closed. // -// - READER axis (covered flow shapes, NOT exhaustive): a read is detected for an -// in-package read of a RECOGNIZED instruction path (a skill-tree/contract -// segment or an instructionPathIdent package var) reaching a read sink through a -// bare-`string` param, a `:=`/`=` local, a struct field, a method receiver, or a -// closure capture; path built by `+` / strings.Join / filepath.Join / -// fmt.Sprintf; transitive helper chains followed to a fixpoint. -// -// KNOWN OUT-OF-SCOPE (tracked in sweep-guard-reader-axis-invert, id -// 4qnn7dbzkyh9qv65t618vtxy, backstopped by the detached adversarial audit before -// merge — NOT silently dropped): `[]string`/`...string`-param + range/slice-element -// flow (M-D), cross-package reads (M-B), a path in a package var defined in another -// file (M-C), and unrecognized surfaces such as AGENTS.md / mods/*.md (M-A). These -// are the recurring enumerated-shape reader-flow class cycles 1-3 closed instances -// of; the follow-up weighs an invert/positive predicate and a go/types+SSA taint -// that closes the class definitionally. +// - READER axis (does an undeclared instruction-file read hide via an undiscovered +// read SHAPE? — DETACHED-AUDIT-BACKSTOPPED, NOT statically guarded): a read is +// detected only when it is DIRECT — an in-package read sink whose path arg subtree +// carries a recognized instruction literal/segment or an instructionPathIdent +// package var, a named instructionFileReaders helper, or a WalkDir `.md` +// collector. A read reached through any other shape — param/local/field/method/ +// closure taint flow, a transitive helper chain, a `[]string`/range-element flow, +// a cross-package reader, a package var defined in another file, or an +// unrecognized surface (AGENTS.md, mods/*.md) — is NOT statically flagged. This is +// a deliberate, documented boundary: a per-package go/ast scan structurally cannot +// see a cross-package read or a path built in another file, so chasing reader-shape +// completeness was the enumeration trap the proof policy itself warns against. The +// reader axis is covered by the detached adversarial audit required at every +// high-stakes-surface gate (the validation-stage policy), NOT by this sweep. The +// prior cycles' M-A/B/C/D shapes AND the two declared range-var/cross-statement HN +// reads (TestDevDisciplinesSurviveInDevHomes, TestLiveScenarioRecommendedPractice- +// Present — both carry markNonAC) are this audited boundary, not silent gaps. func TestNoUndeclaredHostneutralityTautology(t *testing.T) { offenders := sweepHostneutralityTautologies(t, ".") for _, o := range offenders { @@ -139,51 +141,30 @@ func sweepHostneutralityTautologies(t *testing.T, dir string) []string { files = append(files, f) } - // First pass: discover this package's instruction-file reader helpers, then grow - // the set to a fixpoint so a read cannot hide behind a helper chain. Seeded with - // the named readers; a func is ALSO a reader if it ingests instruction content - // directly (readsInstructionContent — a tainted ReadFile/Open/io read, or a - // WalkDir-collected `.md`) OR (transitive) it calls a known reader. Methods are - // NOT skipped: a reader can be a method on a fixture struct (the s.path / - // method-receiver flow). The code-scan helper scanFile is NOT a reader: it uses - // parser.ParseFile (not a content read sink) over a `../dispatch` path (no - // instruction taint). - taintedFields := instructionTaintedFields(files) + // First pass: the recognized reader set is the named instructionFileReaders + // allowlist plus any non-test helper that DIRECTLY reads a recognized instruction + // file (readsInstructionContent — a read sink whose path arg carries a recognized + // instruction literal/segment/path-ident, or a WalkDir-collected `.md`). The + // code-scan helper scanFile is NOT a reader: it uses parser.ParseFile (not a + // content read sink) over a `../dispatch` path (no instruction literal). readers := map[string]bool{} for r := range instructionFileReaders { readers[r] = true } - helperCalls := map[string]map[string]bool{} for _, f := range files { for _, decl := range f.Decls { fn, ok := decl.(*ast.FuncDecl) if !ok || strings.HasPrefix(fn.Name.Name, "Test") { continue } - helperCalls[fn.Name.Name] = collectCalls(fn) - if readsInstructionContent(fn, taintedFields) { + if readsInstructionContent(fn) { readers[fn.Name.Name] = true } } } - for grew := true; grew; { - grew = false - for name, calls := range helperCalls { - if readers[name] { - continue - } - for r := range readers { - if calls[r] { - readers[name] = true - grew = true - break - } - } - } - } // Second pass: a test is an offender if it ingests instruction-file content — - // directly (readsInstructionContent) or via a discovered reader helper — and does + // directly (readsInstructionContent) or via a recognized reader helper — and does // NOT declare its proof standing. The sweep keys on the READ, not a match-func // allowlist. var offenders []string @@ -194,7 +175,7 @@ func sweepHostneutralityTautologies(t *testing.T, dir string) []string { continue } calls := collectCalls(fn) - readsInstruction := readsInstructionContent(fn, taintedFields) + readsInstruction := readsInstructionContent(fn) for r := range readers { if calls[r] { readsInstruction = true @@ -238,23 +219,15 @@ var readSinks = map[string]bool{ "NewReader": true, // bufio.NewReader } -// readsInstructionContent reports whether fn ingests a recognized instruction -// file's content through the reader-axis flow shapes the taint COVERS — the -// positive/taint replacement for the Cycle-1/2 allow-lists, covering a bounded set -// of flows, not an exhaustive one. It taints a string derived from a recognized -// instruction-file path (a skill-tree/contract segment via isInstructionPathLiteral, -// or an instructionPathIdent package var) reaching a read sink (ReadFile/Open/ -// ReadAll/bufio) through a bare-`string` param, a `:=`/`=` local, a package-wide -// struct field, or a method receiver; path built by +/strings.Join/filepath.Join/ -// fmt.Sprintf; plus a WalkDir-collected instruction `.md`. -// -// NOT covered (tracked in sweep-guard-reader-axis-invert, id -// 4qnn7dbzkyh9qv65t618vtxy, audit-backstopped): `[]string`/`...string`-param + -// range/slice-element flow (M-D), cross-package reader helpers (M-B), a package var -// defined in another file (M-C), unrecognized surfaces like AGENTS.md / mods/*.md -// (M-A). See TestNoUndeclaredHostneutralityTautology's doc for the full honest bound. -func readsInstructionContent(fn *ast.FuncDecl, taintedFields map[string]bool) bool { - tainted := instructionTaintedNames(fn, taintedFields) +// readsInstructionContent reports whether fn DIRECTLY ingests a recognized +// instruction file's content: a read sink (ReadFile/Open/ReadAll/bufio) whose path +// arg subtree carries a recognized instruction literal/segment (isInstructionPathLiteral) +// or an instructionPathIdent package var, or a WalkDir-collected instruction `.md`. +// This is the direct-read predicate — no taint-flow tracking. A read reached only +// through a param/local/field/method/closure flow, a transitive helper chain, or a +// range-element flow is NOT detected here; that reader-shape axis is the +// detached-audit-backstopped boundary documented on TestNoUndeclaredHostneutralityTautology. +func readsInstructionContent(fn *ast.FuncDecl) bool { found := false ast.Inspect(fn, func(n ast.Node) bool { call, ok := n.(*ast.CallExpr) @@ -267,7 +240,7 @@ func readsInstructionContent(fn *ast.FuncDecl, taintedFields map[string]bool) bo } if readSinks[sel.Sel.Name] { for _, arg := range call.Args { - if exprInstructionTainted(arg, tainted) || readsTaintedField(arg, taintedFields) { + if exprInstructionTainted(arg) { found = true } } @@ -280,112 +253,12 @@ func readsInstructionContent(fn *ast.FuncDecl, taintedFields map[string]bool) bo return found } -// readsTaintedField reports whether expr reads a struct field whose name is in the -// package-wide instruction-tainted-field set — the s.path / method-receiver flow. -func readsTaintedField(expr ast.Expr, taintedFields map[string]bool) bool { - hit := false - ast.Inspect(expr, func(n ast.Node) bool { - if sel, ok := n.(*ast.SelectorExpr); ok && taintedFields[sel.Sel.Name] { - hit = true - } - return true - }) - return hit -} - -// instructionTaintedFields scans every struct composite literal and field -// assignment across the package, returning the set of FIELD NAMES ever assigned an -// instruction-file path. Keyed by field name (no type info at parse time) — an -// over-approximation that errs toward flagging, which the proof policy wants. -func instructionTaintedFields(files []*ast.File) map[string]bool { - fields := map[string]bool{} - for _, f := range files { - ast.Inspect(f, func(n ast.Node) bool { - switch node := n.(type) { - case *ast.KeyValueExpr: - if key, ok := node.Key.(*ast.Ident); ok && exprInstructionTainted(node.Value, nil) { - fields[key.Name] = true - } - case *ast.AssignStmt: - for i, rhs := range node.Rhs { - if i >= len(node.Lhs) { - break - } - if sel, ok := node.Lhs[i].(*ast.SelectorExpr); ok && exprInstructionTainted(rhs, nil) { - fields[sel.Sel.Name] = true - } - } - } - return true - }) - } - return fields -} - -// instructionTaintedNames computes the set of names (params, locals, recv.field -// selectors) in fn holding a string derived from an instruction-file path. Every -// string parameter is tainted (a helper that reads a string param is a path-arg -// reader — the caller supplies the .md path). It propagates through := / = to a -// fixpoint, including a local assigned from a package-wide tainted field. -func instructionTaintedNames(fn *ast.FuncDecl, taintedFields map[string]bool) map[string]bool { - tainted := map[string]bool{} - if fn.Type.Params != nil { - for _, field := range fn.Type.Params.List { - if id, ok := field.Type.(*ast.Ident); ok && id.Name == "string" { - for _, name := range field.Names { - tainted[name.Name] = true - } - } - } - } - for grew := true; grew; { - grew = false - ast.Inspect(fn, func(n ast.Node) bool { - assign, ok := n.(*ast.AssignStmt) - if !ok { - return true - } - for i, rhs := range assign.Rhs { - if i >= len(assign.Lhs) { - break - } - if !exprInstructionTainted(rhs, tainted) && !readsTaintedField(rhs, taintedFields) { - continue - } - if name := lvalueName(assign.Lhs[i]); name != "" && !tainted[name] { - tainted[name] = true - grew = true - } - } - return true - }) - } - return tainted -} - -// lvalueName renders an assignable target as a taint key: a bare ident or a -// selector `recv.field`. -func lvalueName(e ast.Expr) string { - switch x := e.(type) { - case *ast.Ident: - return x.Name - case *ast.SelectorExpr: - if inner, ok := x.X.(*ast.Ident); ok { - return inner.Name + "." + x.Sel.Name - } - return x.Sel.Name - } - return "" -} - -// exprInstructionTainted reports whether expr carries an instruction-file path -// taint anywhere in its subtree: a tainted name, an instruction-path literal/segment, -// or a known instructionPathIdent package var — so the +/strings.Join/filepath.Join/ -// fmt.Sprintf/string(...) path-build idioms (whose tainted operand is a subtree node) -// are covered. It does NOT cover a taint carried in a slice element or recovered via -// a range variable (M-D) — see readsInstructionContent's NOT-covered note and the -// follow-up sweep-guard-reader-axis-invert. -func exprInstructionTainted(expr ast.Expr, tainted map[string]bool) bool { +// exprInstructionTainted reports whether expr DIRECTLY carries an instruction-file +// path anywhere in its subtree: an instruction-path literal/segment, or a known +// instructionPathIdent package var — so the +/strings.Join/filepath.Join/fmt.Sprintf/ +// string(...) path-build idioms (whose instruction operand is a subtree node) are +// covered when their operands are literals or recognized vars. +func exprInstructionTainted(expr ast.Expr) bool { hit := false ast.Inspect(expr, func(n ast.Node) bool { switch x := n.(type) { @@ -394,11 +267,7 @@ func exprInstructionTainted(expr ast.Expr, tainted map[string]bool) bool { hit = true } case *ast.Ident: - if tainted[x.Name] || instructionPathIdents[x.Name] { - hit = true - } - case *ast.SelectorExpr: - if inner, ok := x.X.(*ast.Ident); ok && tainted[inner.Name+"."+x.Sel.Name] { + if instructionPathIdents[x.Name] { hit = true } } @@ -425,14 +294,14 @@ func fnFiltersInstructionMarkdown(fn *ast.FuncDecl) bool { // instructionPathSegments are the skill-tree / contract path segments that mark a // path literal as an instruction file. The RECOGNIZED-instruction-surface predicate // (a deliberate bound, not universal): a path carrying one of these listed segments -// is an instruction path, taint catching it even before a `.md` suffix is appended -// (closing the `.md`-suffix-only detection a Join/split-built suffix evaded). +// is an instruction path, recognized even before a `.md` suffix is appended (so a +// Join/split-built suffix still matches on the base segment). // -// KNOWN OUT-OF-SCOPE surface (M-A, tracked in sweep-guard-reader-axis-invert, id -// 4qnn7dbzkyh9qv65t618vtxy): a real instruction surface whose path carries NONE of -// these segments (e.g. AGENTS.md, mods/*.md) is not recognized and a read of it is -// not flagged. The follow-up weighs a predicate that recognizes the surface -// definitionally rather than by this enumerated list. +// A real instruction surface whose path carries NONE of these segments (e.g. +// AGENTS.md, mods/*.md) is not recognized and a read of it is not statically flagged. +// That reader-shape gap is the detached-audit-backstopped boundary documented on +// TestNoUndeclaredHostneutralityTautology — covered by the adversarial audit at the +// high-stakes gate, not by enumerating more segments here. var instructionPathSegments = map[string]bool{ "skills": true, "references": true, @@ -528,199 +397,6 @@ func TestCodeScanHN(t *T) { if !containsStrHN(off, "TestUndeclaredHN") { t.Fatalf("adding declared/codescan fixtures must not stop the sweep flagging the undeclared one; offenders=%v", off) } - - // A multi-hop-helper tautology — the read hidden one frame down behind a wrapper - // that calls the named reader — must also be flagged: the transitive reader - // fixpoint propagates reader-ness up the call chain. Before the fixpoint the HN - // sweep left this GREEN (the validation Cycle-1 finding 1). - multiHop := `package fixture -import "strings" -func wrapHop(t *T) string { return readSkill(t, foCorePath) } -func TestMultiHopUndeclaredHN(t *T) { - text := wrapHop(t) - if strings.Contains(text, "x") { _ = text } -} -` - dir2 := t.TempDir() - writeFixture(t, dir2+"/multihop_test.go", multiHop) - off = sweepHostneutralityTautologies(t, dir2) - if !containsStrHN(off, "TestMultiHopUndeclaredHN") { - t.Fatalf("sweep failed to flag a multi-hop-helper tautology (transitive reader fixpoint not working); offenders=%v", off) - } -} - -// TestHostneutralitySweepDetectsEvasionShapes is the planted-control mutation test -// for the reader-discovery evasion shapes the validation audit proved the HN sweep -// missed (the integration sweep already guarded these; this ports the guard). Each -// case plants a synthetic offender reaching an instruction file through a shape the -// naive named-reader/`.md`-literal detection cannot see, runs the sweep, asserts it -// REDs, then plants the declared form and asserts it GREENs. A regression removing a -// discovery mechanism leaves the matching case un-flagged, failing this control. -func TestHostneutralitySweepDetectsEvasionShapes(t *testing.T) { - // Shape 1 — multi-hop transitive helper: the tautology hides one hop down behind - // a wrapper that calls the named reader readSkill. The fixpoint must propagate - // reader-ness up the chain. This is the finding-1 control: before the fixpoint, - // the HN sweep left this GREEN. - multiHop := `package fixture -import "strings" -func wrapHop(t *T) string { - return readSkill(t, foCorePath) -} -func TestMultiHopHN(t *T) { - text := wrapHop(t) - if strings.Contains(text, "x") { _ = text } -} -` - assertRedThenGreenHN(t, "multi-hop transitive helper", "TestMultiHopHN", multiHop) - - // Shape 2 — path-arg reader: a NEW helper (not in the named map) os.ReadFile's a - // value built from its own path parameter; the `.md` literal lives in the caller. - pathArg := `package fixture -import ( - "os" - "strings" -) -func readArg(t *T, path string) string { - b, _ := os.ReadFile(path) - return string(b) -} -func TestPathArgHN(t *T) { - text := readArg(t, "../../skills/first-officer/references/first-officer-shared-core.md") - if strings.Contains(text, "x") { _ = text } -} -` - assertRedThenGreenHN(t, "path-arg reader", "TestPathArgHN", pathArg) - - // Shape 3 — WalkDir collector: a helper WalkDirs a tree returning `.md` paths the - // caller reads+matches. - walkDir := `package fixture -import ( - "os" - "path/filepath" - "strings" -) -func walkSkills(t *T, base string) []string { - var out []string - filepath.WalkDir(base, func(p string, d os.DirEntry, err error) error { - if !d.IsDir() && strings.HasSuffix(p, ".md") { out = append(out, p) } - return nil - }) - return out -} -func TestWalkDirHN(t *T) { - for _, p := range walkSkills(t, "../../skills") { - b, _ := os.ReadFile(p) - if strings.Contains(string(b), "x") { _ = b } - } -} -` - assertRedThenGreenHN(t, "WalkDir collector", "TestWalkDirHN", walkDir) - - // Shape 4 — split-".md" suffix: the read path is built as base + "." + "md", so - // no single literal carries the `.md` suffix; constStringConcat must rejoin it. - splitMD := `package fixture -import ( - "os" - "path/filepath" - "strings" -) -func TestSplitSuffixHN(t *T) { - p := filepath.Join("..", "..", "skills", "first-officer", "references", "first-officer-shared-core" + "." + "md") - b, _ := os.ReadFile(p) - if strings.Contains(string(b), "x") { _ = b } -} -` - assertRedThenGreenHN(t, "split-.md suffix", "TestSplitSuffixHN", splitMD) - - // Shape 5 (Cycle-3 M1, match axis) — the ingested bytes are inspected with a - // match idiom OUTSIDE the old matchFuncs allowlist. The positive rule keys on the - // READ (readSkill), so HOW the bytes are inspected is irrelevant. - matchIndex := `package fixture -import "strings" -func TestMatchIndexHN(t *T) { - text := readSkill(t, foCorePath) - if strings.Index(text, "HALT") < 0 { t.Error("x") } -} -` - assertRedThenGreenHN(t, "match via strings.Index (no Contains)", "TestMatchIndexHN", matchIndex) - - matchBytesRegexp := `package fixture -import "regexp" -func TestMatchBytesRegexpHN(t *T) { - text := readSkill(t, foCorePath) - re := regexp.MustCompile("HALT") - if !re.Match([]byte(text)) { t.Error("x") } -} -` - assertRedThenGreenHN(t, "match via regexp.Regexp.Match([]byte)", "TestMatchBytesRegexpHN", matchBytesRegexp) - - // Shape 6 (Cycle-3 M2, reader axis) — the `.md` path is built with strings.Join, - // not `+`. The base fragment carries a skill-tree segment so it taints before the - // suffix is appended. - joinPath := `package fixture -import ( - "os" - "strings" -) -func TestJoinPathHN(t *T) { - base := "../../skills/first-officer/references/first-officer-shared-core" - p := strings.Join([]string{base, "md"}, ".") - b, _ := os.ReadFile(p) - if strings.Index(string(b), "x") < 0 { t.Error("y") } -} -` - assertRedThenGreenHN(t, "strings.Join-built .md path", "TestJoinPathHN", joinPath) - - // Shape 7 (Cycle-3 M3, reader axis) — the `.md` path flows through a struct field - // and the read happens via a METHOD on that struct; discovery must include methods - // and taint the package-wide field. - structMethod := `package fixture -import ( - "os" - "strings" -) -type fixt struct { path string } -func (f *fixt) read(t *T) string { - b, _ := os.ReadFile(f.path) - return string(b) -} -func TestStructMethodHN(t *T) { - f := &fixt{path: "../../skills/first-officer/references/first-officer-shared-core.md"} - s := f.read(t) - if strings.Contains(s, "x") { _ = s } -} -` - assertRedThenGreenHN(t, "struct-field + method-receiver path flow", "TestStructMethodHN", structMethod) -} - -// assertRedThenGreenHN plants fixtureSrc, runs the HN sweep, requires offenderName -// flagged (RED on the evasion shape), then rewrites the test with a markNonAC -// declaration and requires it cleared (GREEN once declared). -func assertRedThenGreenHN(t *testing.T, shape, offenderName, fixtureSrc string) { - t.Helper() - dir := t.TempDir() - writeFixture(t, dir+"/evasion_test.go", fixtureSrc) - off := sweepHostneutralityTautologies(t, dir) - if !containsStrHN(off, offenderName) { - t.Fatalf("%s: HN sweep failed to flag the evasion offender %s; offenders=%v", shape, offenderName, off) - } - - declaredSrc := strings.Replace( - fixtureSrc, - "func "+offenderName+"(t *T) {", - "func "+offenderName+`(t *T) { - markNonAC(t, "declared evasion-shape fixture")`, - 1, - ) - if declaredSrc == fixtureSrc { - t.Fatalf("%s: could not inject markNonAC into the fixture for %s (signature not found)", shape, offenderName) - } - dir2 := t.TempDir() - writeFixture(t, dir2+"/evasion_test.go", declaredSrc) - off = sweepHostneutralityTautologies(t, dir2) - if containsStrHN(off, offenderName) { - t.Fatalf("%s: HN sweep still flagged %s after it declared markNonAC; offenders=%v", shape, offenderName, off) - } } func writeFixture(t *testing.T, path, content string) { diff --git a/skills/integration/nonac_marker_test.go b/skills/integration/nonac_marker_test.go index 5ae7e19c..14da3eff 100644 --- a/skills/integration/nonac_marker_test.go +++ b/skills/integration/nonac_marker_test.go @@ -53,17 +53,17 @@ func markCodeBoundInvariant(t *testing.T, source string) { } } -// ingestedFileReaders are the helper functions in this package that read an +// ingestedFileReaders are the named helper functions in this package that read an // instruction file the model ingests (a contract, a workflow README, a skill -// body) — the seed of the reader set the sweep grows to a fixpoint. A test that -// calls one of these (the READ — how it then inspects the bytes is irrelevant under -// the match-axis positive rule) is a presence/absence check over an ingested file, -// exactly the shape the proof policy bans as standalone behavioral proof. The AC-3 -// sweep treats such a test as tautological unless it declares markNonAC, or unless -// it binds the expected value to a code-side source (a re-bound Bucket-B invariant — -// markCodeBoundInvariant). The sweep distinguishes the two by the declaration: a -// re-bound invariant whose expectation diverges from the file declares -// markCodeBoundInvariant; a pure text-consistency lint declares markNonAC. +// body) — the recognized-reader allowlist. A test that calls one of these (the READ +// — how it then inspects the bytes is irrelevant under the match-axis positive rule) +// is a presence/absence check over an ingested file, exactly the shape the proof +// policy bans as standalone behavioral proof. The AC-3 sweep treats such a test as +// tautological unless it declares markNonAC, or unless it binds the expected value to +// a code-side source (a re-bound Bucket-B invariant — markCodeBoundInvariant). The +// sweep distinguishes the two by the declaration: a re-bound invariant whose +// expectation diverges from the file declares markCodeBoundInvariant; a pure +// text-consistency lint declares markNonAC. var ingestedFileReaders = map[string]bool{ "foCore": true, "foRuntime": true, @@ -77,44 +77,42 @@ var ingestedFileReaders = map[string]bool{ // TestNoUndeclaredTautologicalProof is the AC-3 sweep, re-runnable offline. It // parses every *_test.go in this package and flags any test function that READS a -// recognized instruction file's content — via a reader helper, a tainted -// os.ReadFile/os.Open, or a WalkDir-collected `.md`, through the flow shapes the -// reader-axis taint covers (below) — unless it self-classifies via markNonAC or +// recognized instruction file's content — via a named reader helper, a DIRECT +// os.ReadFile/os.Open of a recognized instruction literal/segment, or a +// WalkDir-collected `.md` — unless it self-classifies via markNonAC or // markCodeBoundInvariant. The count of undeclared offenders is the AC-3 metric: it // must be zero. // -// What the guard actually guarantees (two axes, with one closed and one bounded): +// What the guard guarantees, and what it deliberately does NOT (two axes): // -// - MATCH axis (closed, universal, load-bearing): the sweep keys on the READ, not -// on how the bytes are then inspected. ONCE a read of a recognized instruction -// file is detected, the test MUST declare regardless of the inspection idiom — -// strings.Contains/Index/EqualFold, bytes.*, regexp.Regexp.Match, len(Split)>1, -// a bare `==`, anything. Enumerating "match functions" was whack-a-mole; this -// rule closes the whole class because the trigger is the ingest, not the match. +// - MATCH axis (closed, universal, load-bearing — STATICALLY GUARDED): the sweep +// keys on the READ, not on how the bytes are then inspected. ONCE a read of a +// recognized instruction file is detected, the test MUST declare regardless of +// the inspection idiom — strings.Contains/Index/EqualFold, bytes.*, +// regexp.Regexp.Match, len(Split)>1, a bare `==`, anything. Enumerating "match +// functions" was whack-a-mole; this rule closes the whole class because the +// trigger is the ingest, not the match. // -// - READER axis (covered flow shapes, NOT exhaustive): a read is detected for an -// in-package read of a RECOGNIZED instruction path (a skill-tree/contract -// segment, isInstructionPathLiteral) reaching a read sink through these flows: a -// bare-`string` parameter, a `:=`/`=` local, a struct field, a method receiver, -// a closure capture; with the path built by `+` / strings.Join / filepath.Join / -// fmt.Sprintf. A transitive helper chain is followed to a fixpoint. -// -// KNOWN OUT-OF-SCOPE (tracked in the follow-up task sweep-guard-reader-axis-invert, -// id 4qnn7dbzkyh9qv65t618vtxy, backstopped by the detached adversarial audit before -// merge — NOT silently dropped): -// - M-A: unrecognized instruction surfaces (AGENTS.md, mods/*.md) — not in the -// instructionPathSegments predicate. -// - M-B: cross-package reads (a read whose reader helper lives in another package). -// - M-C: a path held in a package var defined in another file of this package. -// - M-D: `[]string`/`...string`-param + range/slice-element flow. -// These are the same recurring enumerated-shape reader-flow class cycles 1-3 each -// closed instances of; the follow-up weighs an invert/positive predicate and a -// go/types+SSA taint that closes the class definitionally. +// - READER axis (does an undeclared instruction-file read hide via an undiscovered +// read SHAPE? — DETACHED-AUDIT-BACKSTOPPED, NOT statically guarded): a read is +// detected only when it is DIRECT — an in-package read sink whose path arg subtree +// carries a recognized instruction literal/segment (isInstructionPathLiteral), a +// named ingestedFileReaders helper, or a WalkDir `.md` collector. A read reached +// through any other shape — param/local/field/method/closure taint flow, a +// transitive helper chain, a `[]string`/range-element flow, a cross-package +// reader, or a path in a package var defined in another file — is NOT statically +// flagged. This is a deliberate, documented boundary: a per-package go/ast scan +// structurally cannot see a cross-package read or a path built in another file, so +// chasing reader-shape completeness was the enumeration trap the proof policy +// itself warns against. The reader axis is covered by the detached adversarial +// audit required at every high-stakes-surface gate (the validation-stage policy), +// NOT by this sweep. The prior cycles' M-A (AGENTS.md, mods/*.md) / M-B / M-C / M-D +// shapes are this audited boundary, not silent gaps. // // This sweep is itself a code-side invariant over real parsed test source, not a -// text match over an instruction file — its expected value (which reads reach an -// instruction file) is independent of any contract prose, so it can fail when a -// future edit adds an undeclared ingest through a covered flow. +// text match over an instruction file — its expected value (which reads directly +// reach an instruction file) is independent of any contract prose, so it can fail +// when a future edit adds an undeclared DIRECT ingest. func TestNoUndeclaredTautologicalProof(t *testing.T) { offenders := sweepUndeclaredTautologies(t, ".") for _, o := range offenders { @@ -150,50 +148,30 @@ func sweepUndeclaredTautologies(t *testing.T, dir string) []string { files = append(files, f) } - // First pass: discover the package's instruction-file reader helpers, then grow - // the set to a fixpoint so a read cannot hide behind a helper chain. A func is a - // reader if it ingests instruction-file content directly (readsInstructionContent - // — a tainted os.ReadFile/io read, or a WalkDir-collected `.md`) OR (transitive) - // it calls a known reader. Methods are NOT skipped: a reader can be a method on a - // fixture struct (the s.path / method-receiver flow shape). The seeded named + // First pass: the recognized reader set is the named ingestedFileReaders allowlist + // plus any non-test helper that DIRECTLY reads an instruction file + // (readsInstructionContent — a read sink whose path arg carries a recognized + // instruction literal/segment, or a WalkDir-collected `.md`). The seeded named // helpers cover readers that return a non-`.md`-literal handle (vendoredSkillFiles // returns a map). - taintedFields := instructionTaintedFields(files) readers := map[string]bool{} for r := range ingestedFileReaders { readers[r] = true } - helperCalls := map[string]map[string]bool{} for _, f := range files { for _, decl := range f.Decls { fn, ok := decl.(*ast.FuncDecl) if !ok || strings.HasPrefix(fn.Name.Name, "Test") { continue } - helperCalls[fn.Name.Name] = collectCalls(fn) - if readsInstructionContent(fn, taintedFields) { + if readsInstructionContent(fn) { readers[fn.Name.Name] = true } } } - for grew := true; grew; { - grew = false - for name, calls := range helperCalls { - if readers[name] { - continue - } - for r := range readers { - if calls[r] { - readers[name] = true - grew = true - break - } - } - } - } // Second pass: a test is an offender if it ingests instruction-file content — - // directly (readsInstructionContent) or via a discovered reader helper — and does + // directly (readsInstructionContent) or via a recognized reader helper — and does // NOT declare its proof standing. The sweep keys on the READ, not on a match-func // allowlist: any inspection of ingested bytes (Contains/Index/EqualFold, bytes.*, // regexp.Match, a bare ==, …) is covered because the trigger is the ingest itself. @@ -205,7 +183,7 @@ func sweepUndeclaredTautologies(t *testing.T, dir string) []string { continue } calls := collectCalls(fn) - readsIngested := readsInstructionContent(fn, taintedFields) + readsIngested := readsInstructionContent(fn) for r := range readers { if calls[r] { readsIngested = true @@ -221,26 +199,16 @@ func sweepUndeclaredTautologies(t *testing.T, dir string) []string { return sortedUnique(offenders) } -// readsInstructionContent reports whether fn ingests a recognized instruction -// file's content through the reader-axis flow shapes the taint COVERS — it is the -// positive/taint replacement for the Cycle-1/2 allow-lists (readsParamPath + -// walksForMarkdown + constStringConcat-only `+` concat), but it covers a bounded -// set of flows, not an exhaustive one. It taints a string derived from a recognized -// instruction-file path (a skill-tree/contract segment, isInstructionPathLiteral) -// built by `+` / strings.Join / filepath.Join / fmt.Sprintf and flowed through a -// bare-`string` param, a `:=`/`=` local, a struct field, or a method receiver, and -// reports a read when a tainted path flows into a read sink (os.ReadFile/os.Open/ -// io.ReadAll/bufio scanner-reader), or when fn WalkDir/Walks a tree collecting -// instruction `.md` files (the reader-of-many shape its callers then read). -// -// NOT covered (tracked in sweep-guard-reader-axis-invert, id -// 4qnn7dbzkyh9qv65t618vtxy, audit-backstopped): `[]string`/`...string`-param + -// range/slice-element flow (M-D), cross-package reader helpers (M-B), a package var -// defined in another file of this package (M-C), and unrecognized surfaces like -// AGENTS.md / mods/*.md (M-A). See TestNoUndeclaredTautologicalProof's doc for the -// full honest bound. -func readsInstructionContent(fn *ast.FuncDecl, taintedFields map[string]bool) bool { - tainted := instructionTaintedNames(fn, taintedFields) +// readsInstructionContent reports whether fn DIRECTLY ingests a recognized +// instruction file's content: a read sink (os.ReadFile/os.Open/io.ReadAll/bufio +// scanner-reader) whose path arg subtree carries a recognized instruction +// literal/segment (isInstructionPathLiteral), or a WalkDir/Walk over a tree +// collecting instruction `.md` files (the reader-of-many shape its callers read). +// This is the direct-read predicate — no taint-flow tracking. A read reached only +// through a param/local/field/method/closure flow, a transitive helper chain, or a +// range-element flow is NOT detected here; that reader-shape axis is the +// detached-audit-backstopped boundary documented on TestNoUndeclaredTautologicalProof. +func readsInstructionContent(fn *ast.FuncDecl) bool { found := false ast.Inspect(fn, func(n ast.Node) bool { call, ok := n.(*ast.CallExpr) @@ -253,7 +221,7 @@ func readsInstructionContent(fn *ast.FuncDecl, taintedFields map[string]bool) bo } if readSinks[sel.Sel.Name] { for _, arg := range call.Args { - if exprInstructionTainted(arg, tainted) || readsTaintedField(arg, taintedFields) { + if exprInstructionTainted(arg) { found = true } } @@ -268,58 +236,6 @@ func readsInstructionContent(fn *ast.FuncDecl, taintedFields map[string]bool) bo return found } -// readsTaintedField reports whether expr reads a struct field whose name is in the -// package-wide instruction-tainted-field set — the s.path / method-receiver flow -// (Cycle-3 M3): the `.md` literal is assigned to the field in a constructor in one -// function and the read happens via a field selector in another (a method). Field -// taint is computed package-wide (instructionTaintedFields), so this catches the -// read even though the assigning literal is not in fn's own body. A generated-path -// field (runBuild's res.DispatchFilePath) is never instruction-assigned, so it is -// not in the set and not flagged. -func readsTaintedField(expr ast.Expr, taintedFields map[string]bool) bool { - hit := false - ast.Inspect(expr, func(n ast.Node) bool { - if sel, ok := n.(*ast.SelectorExpr); ok && taintedFields[sel.Sel.Name] { - hit = true - } - return true - }) - return hit -} - -// instructionTaintedFields scans every struct composite literal and every -// assignment to a field selector across the package, returning the set of FIELD -// NAMES ever assigned an instruction-file path. Keyed by field name (no type info -// at parse time) — a deliberate over-approximation that errs toward flagging, which -// the proof policy wants. A field only ever assigned a generated/temp path (a -// dispatch artifact) never enters the set. -func instructionTaintedFields(files []*ast.File) map[string]bool { - fields := map[string]bool{} - for _, f := range files { - ast.Inspect(f, func(n ast.Node) bool { - switch node := n.(type) { - case *ast.KeyValueExpr: - if key, ok := node.Key.(*ast.Ident); ok { - if exprInstructionTainted(node.Value, nil) { - fields[key.Name] = true - } - } - case *ast.AssignStmt: - for i, rhs := range node.Rhs { - if i >= len(node.Lhs) { - break - } - if sel, ok := node.Lhs[i].(*ast.SelectorExpr); ok && exprInstructionTainted(rhs, nil) { - fields[sel.Sel.Name] = true - } - } - } - return true - }) - } - return fields -} - // readSinks are the call selectors that ingest a file's content given a path: the // os reads, io.ReadAll over an opened handle, and the bufio scanner/reader // constructors. A tainted instruction path flowing into any of these is an ingest. @@ -331,98 +247,18 @@ var readSinks = map[string]bool{ "NewReader": true, // bufio.NewReader } -// instructionTaintedNames computes the set of identifier names (params, locals, -// struct-field selectors rendered as `recv.field`, range vars) in fn that hold a -// string derived from an instruction-file path. It seeds from instruction-path -// expressions (a `.md` skill-tree literal/segment, an instructionPathSegment, a -// known instruction ident) and propagates through := / = assignments and string -// conversions to a fixpoint, so a path built and then read in separate statements -// is still tainted at the read. -func instructionTaintedNames(fn *ast.FuncDecl, taintedFields map[string]bool) map[string]bool { - tainted := map[string]bool{} - // Seed: any parameter is a candidate taint carrier only if the CALLER supplies an - // instruction path; within fn we cannot see the caller, so a reader-helper whose - // path arg is a parameter is caught by the param-flow rule below (the parameter is - // tainted when fn itself also references an instruction literal, OR unconditionally - // for a single-string-param helper that reads it — the readSkill(t, path) shape). - // We treat every string parameter as tainted: a helper that ReadFiles a string - // param is, by construction, a path-arg reader (the caller supplies the .md path). - if fn.Type.Params != nil { - for _, field := range fn.Type.Params.List { - if isStringyType(field.Type) { - for _, name := range field.Names { - tainted[name.Name] = true - } - } - } - } - for grew := true; grew; { - grew = false - ast.Inspect(fn, func(n ast.Node) bool { - assign, ok := n.(*ast.AssignStmt) - if !ok { - return true - } - for i, rhs := range assign.Rhs { - if i >= len(assign.Lhs) { - break - } - // A local assigned from an instruction-tainted expr, OR from a read of a - // package-wide instruction-tainted field, carries the taint forward. - if !exprInstructionTainted(rhs, tainted) && !readsTaintedField(rhs, taintedFields) { - continue - } - if name := lvalueName(assign.Lhs[i]); name != "" && !tainted[name] { - tainted[name] = true - grew = true - } - } - return true - }) - } - return tainted -} - -// lvalueName renders an assignable target as a taint-tracking key: a bare ident, or -// a selector `recv.field` (the struct-field path-flow shape). -func lvalueName(e ast.Expr) string { - switch x := e.(type) { - case *ast.Ident: - return x.Name - case *ast.SelectorExpr: - if inner, ok := x.X.(*ast.Ident); ok { - return inner.Name + "." + x.Sel.Name - } - return x.Sel.Name - } - return "" -} - -// exprInstructionTainted reports whether an expression carries an instruction-file -// path taint: it references a tainted name (ident or `recv.field` selector), an -// instruction-path string literal/segment, or a known instruction path ident, -// anywhere in its subtree — so the `+` / strings.Join / filepath.Join / fmt.Sprintf -// path-build idioms (whose tainted operand is a node in the subtree) are covered. -// The over-approximation toward flagging is deliberate. It does NOT cover a taint -// carried in a slice element or recovered via a range variable (M-D) — see -// readsInstructionContent's NOT-covered note and the follow-up -// sweep-guard-reader-axis-invert. -func exprInstructionTainted(expr ast.Expr, tainted map[string]bool) bool { +// exprInstructionTainted reports whether an expression DIRECTLY carries an +// instruction-file path: it references an instruction-path string literal/segment +// (isInstructionPathLiteral) anywhere in its subtree — so the `+` / strings.Join / +// filepath.Join / fmt.Sprintf path-build idioms (whose instruction operand is a node +// in the subtree) are covered when their operands are literals. +func exprInstructionTainted(expr ast.Expr) bool { hit := false ast.Inspect(expr, func(n ast.Node) bool { - switch x := n.(type) { - case *ast.BasicLit: + if x, ok := n.(*ast.BasicLit); ok { if x.Kind == token.STRING && isInstructionPathLiteral(strings.Trim(x.Value, "`\"")) { hit = true } - case *ast.Ident: - if tainted[x.Name] { - hit = true - } - case *ast.SelectorExpr: - if inner, ok := x.X.(*ast.Ident); ok && tainted[inner.Name+"."+x.Sel.Name] { - hit = true - } } return true }) @@ -466,13 +302,6 @@ func collectCalls(fn *ast.FuncDecl) map[string]bool { return calls } -// isStringyType reports whether a parameter type node carries a path string: a bare -// `string` (the readSkill(t, path) shape) — the kind a path-arg reader takes. -func isStringyType(t ast.Expr) bool { - id, ok := t.(*ast.Ident) - return ok && id.Name == "string" -} - // instructionPathSegments are the skill-tree / contract path segments that mark a // path literal as targeting an instruction file the model ingests (a skill, // contract, agent, or runtime adapter) rather than a binary-parsed artifact (a @@ -483,16 +312,14 @@ func isStringyType(t ast.Expr) bool { // // This is the RECOGNIZED-instruction-surface predicate (a deliberate bound, not a // universal one): a path carrying one of these listed segments is an instruction -// path. A path fragment carrying a segment is instruction-tainted even before a -// `.md` suffix is appended, so strings.Join([]string{"…/first-officer-shared-core", -// "md"}, ".") taints on the segment in the base (closing the Cycle-1 -// `.md`-suffix-AND-segment pair a split/Join-built suffix evaded). +// path. A path fragment carrying a segment is recognized even before a `.md` suffix +// is appended, so a literal "…/first-officer-shared-core" matches on its segment. // -// KNOWN OUT-OF-SCOPE surfaces (M-A, tracked in sweep-guard-reader-axis-invert, id -// 4qnn7dbzkyh9qv65t618vtxy): a real instruction surface whose path carries NONE of -// these segments — e.g. AGENTS.md or mods/*.md — is not recognized and a read of it -// is not flagged. The follow-up weighs an invert/positive predicate that recognizes -// the instruction surface definitionally rather than by this enumerated list. +// A real instruction surface whose path carries NONE of these segments — e.g. +// AGENTS.md or mods/*.md — is not recognized and a read of it is not statically +// flagged. That reader-shape gap is the detached-audit-backstopped boundary +// documented on TestNoUndeclaredTautologicalProof — covered by the adversarial audit +// at the high-stakes gate, not by enumerating more segments here. var instructionPathSegments = map[string]bool{ "skills": true, "references": true, @@ -575,197 +402,6 @@ func TestCodeBoundFixture(t *T) { } } -// TestSweepDetectsEvasionShapes is the planted-control mutation test for the -// reader-discovery evasion shapes the validation audit proved the sweep missed. -// Each case plants a synthetic offender that reaches an instruction file through a -// shape the naive `.md`-literal-in-the-reader detection cannot see, runs the sweep, -// and asserts it REDs (flags the offender); then it plants the declared form and -// asserts it GREENs. A regression that removed a discovery mechanism would let the -// matching case go un-flagged, failing this control. -func TestSweepDetectsEvasionShapes(t *testing.T) { - // Shape 1 — path-arg reader: the helper os.ReadFile's a value built from its own - // path parameter; the `.md` literal lives in the CALLER (the readSkill(t,root,rel) - // shape). readsInstructionPath over the helper body sees no literal, so only - // parameter-flow detection catches it. - pathArg := `package fixture -import ( - "os" - "path/filepath" - "strings" -) -func readArg(t *T, root, rel string) string { - b, _ := os.ReadFile(filepath.Join(root, rel)) - return string(b) -} -func TestPathArgOffender(t *T) { - s := readArg(t, root, "first-officer/references/first-officer-shared-core.md") - if strings.Contains(s, "x") { _ = s } -} -` - assertRedThenGreen(t, "path-arg reader", "TestPathArgOffender", pathArg) - - // Shape 2 — WalkDir collector: the helper WalkDirs a tree collecting `.md` - // paths and RETURNS them; it never os.ReadFile's the `.md` itself. The caller - // reads+matches each returned path (the shippedSkillText shape). - walkDir := `package fixture -import ( - "os" - "path/filepath" - "strings" -) -func walkSkills(t *T, base string) []string { - var out []string - filepath.WalkDir(base, func(p string, d os.DirEntry, err error) error { - if !d.IsDir() && strings.HasSuffix(p, ".md") { out = append(out, p) } - return nil - }) - return out -} -func TestWalkDirOffender(t *T) { - for _, p := range walkSkills(t, root) { - b, _ := os.ReadFile(p) - if strings.Contains(string(b), "x") { _ = b } - } -} -` - assertRedThenGreen(t, "WalkDir collector", "TestWalkDirOffender", walkDir) - - // Shape 3 — split-".md" suffix: the read path is constructed as - // base + "." + "md", so no single literal carries the `.md` suffix. The - // constant-concatenation reconstruction must rejoin it before .md detection. - splitMD := `package fixture -import ( - "os" - "path/filepath" - "strings" -) -func TestSplitSuffixOffender(t *T) { - p := filepath.Join(root, "first-officer", "references", "first-officer-shared-core" + "." + "md") - b, _ := os.ReadFile(p) - if strings.Contains(string(b), "x") { _ = b } -} -` - assertRedThenGreen(t, "split-.md suffix", "TestSplitSuffixOffender", splitMD) - - // Shape 4 — multi-hop transitive helper: a tautology hidden two frames down - // (the test calls wrapHop, which calls readArg, which reads a param path). The - // reader fixpoint must propagate reader-ness up the call chain. This is the - // integration-side guard that the transitive fixpoint stays load-bearing. - multiHop := `package fixture -import ( - "os" - "path/filepath" - "strings" -) -func readArg2(t *T, root, rel string) string { - b, _ := os.ReadFile(filepath.Join(root, rel)) - return string(b) -} -func wrapHop(t *T, root string) string { - return readArg2(t, root, "ensign/references/ensign-shared-core.md") -} -func TestMultiHopOffender(t *T) { - s := wrapHop(t, root) - if strings.Contains(s, "x") { _ = s } -} -` - assertRedThenGreen(t, "multi-hop transitive helper", "TestMultiHopOffender", multiHop) - - // Shape 5 (Cycle-3 M1, match axis) — the ingested bytes are inspected with a - // match idiom OUTSIDE the old matchFuncs allowlist (strings.Index, regexp.Match - // over []byte). The positive rule keys on the READ, so HOW the bytes are inspected - // is irrelevant — the read of foCore alone must flag it regardless of the idiom. - matchIndex := `package fixture -import "strings" -func TestMatchIndexOffender(t *T) { - fo := foCore(t) - if strings.Index(fo, "Skill(skill=\"spacedock:present-gate\")") < 0 { t.Error("x") } -} -` - assertRedThenGreen(t, "match via strings.Index (no Contains)", "TestMatchIndexOffender", matchIndex) - - matchBytesRegexp := `package fixture -import "regexp" -func TestMatchBytesRegexpOffender(t *T) { - fo := foCore(t) - re := regexp.MustCompile("present-gate") - if !re.Match([]byte(fo)) { t.Error("x") } -} -` - assertRedThenGreen(t, "match via regexp.Regexp.Match([]byte)", "TestMatchBytesRegexpOffender", matchBytesRegexp) - - // Shape 6 (Cycle-3 M2, reader axis) — the `.md` path is built with strings.Join, - // not `+`. The base fragment carries an instruction segment so it taints before - // the suffix is appended; the constStringConcat-only-`+` design missed this. - joinPath := `package fixture -import ( - "os" - "strings" -) -func TestJoinPathOffender(t *T) { - base := "../../skills/first-officer/references/first-officer-shared-core" - p := strings.Join([]string{base, "md"}, ".") - b, _ := os.ReadFile(p) - if strings.Index(string(b), "x") < 0 { t.Error("y") } -} -` - assertRedThenGreen(t, "strings.Join-built .md path", "TestJoinPathOffender", joinPath) - - // Shape 7 (Cycle-3 M3, reader axis) — the `.md` path flows through a struct field - // and the read happens via a METHOD on that struct. readsParamPath tracked only - // string params and discovery skipped methods (fn.Recv != nil); taint over the - // field + method discovery must catch it. - structMethod := `package fixture -import ( - "os" - "strings" -) -type fixt struct { path string } -func (f *fixt) read(t *T) string { - b, _ := os.ReadFile(f.path) - return string(b) -} -func TestStructMethodOffender(t *T) { - f := &fixt{path: "skills/first-officer/references/first-officer-shared-core.md"} - s := f.read(t) - if strings.Contains(s, "x") { _ = s } -} -` - assertRedThenGreen(t, "struct-field + method-receiver path flow", "TestStructMethodOffender", structMethod) -} - -// assertRedThenGreen plants fixtureSrc in a fresh temp dir, runs the sweep, and -// requires offenderName to be flagged (RED on the evasion shape). It then rewrites -// the offending test with a markNonAC declaration and requires the offender to -// clear (GREEN once declared). The declared rewrite reuses the fixture verbatim -// with the marker inserted as the test body's first statement. -func assertRedThenGreen(t *testing.T, shape, offenderName, fixtureSrc string) { - t.Helper() - dir := t.TempDir() - writeFile(t, dir+"/evasion_test.go", fixtureSrc) - offenders := sweepUndeclaredTautologies(t, dir) - if !containsStr(offenders, offenderName) { - t.Fatalf("%s: sweep failed to flag the evasion offender %s; offenders=%v", shape, offenderName, offenders) - } - - declaredSrc := strings.Replace( - fixtureSrc, - "func "+offenderName+"(t *T) {", - "func "+offenderName+`(t *T) { - markNonAC(t, "declared evasion-shape fixture")`, - 1, - ) - if declaredSrc == fixtureSrc { - t.Fatalf("%s: could not inject markNonAC into the fixture for %s (signature not found)", shape, offenderName) - } - dir2 := t.TempDir() - writeFile(t, dir2+"/evasion_test.go", declaredSrc) - offenders = sweepUndeclaredTautologies(t, dir2) - if containsStr(offenders, offenderName) { - t.Fatalf("%s: sweep still flagged %s after it declared markNonAC; offenders=%v", shape, offenderName, offenders) - } -} - func containsStr(in []string, want string) bool { for _, s := range in { if s == want { From de652c0263bed14c399cd5b6fe95e80add6cff52 Mon Sep 17 00:00:00 2001 From: CL Kao Date: Fri, 5 Jun 2026 20:47:22 -0700 Subject: [PATCH 2/2] test: quarantine structural instruction reads --- docs/dev/README.md | 2 +- .../boundary_guard_control_test.go | 149 ++++++ internal/contractlint/boundary_guard_test.go | 168 +++++++ internal/contractlint/doc_test.go | 13 + .../instruction_read_detector_test.go | 135 ++++++ .../contractlint/structural_checks_test.go | 393 ++++++++++++++++ internal/hostneutrality/code_source_test.go | 103 ----- .../codex_runtime_contract_test.go | 90 ---- .../ensign_dev_leakage_locks_test.go | 182 -------- .../live_scenario_practice_test.go | 55 --- internal/hostneutrality/nonac_marker_test.go | 407 ----------------- .../prose_inflator_locks_test.go | 296 ------------ .../hostneutrality/prose_neutrality_test.go | 302 ------------ .../split_root_sync_contract_test.go | 132 ------ internal/status/external_proof.go | 5 +- internal/status/no_yaml_silent_drop_test.go | 2 +- .../codex_idle_notification_test.go | 144 +----- skills/integration/contract_gate_test.go | 244 ---------- .../integration/contract_status_path_test.go | 59 --- .../feedback_rejection_flow_test.go | 247 ---------- skills/integration/nonac_marker_test.go | 430 ------------------ skills/integration/portability_test.go | 181 -------- skills/integration/present_gate_test.go | 260 ----------- .../reconcile_session_contract_test.go | 99 ---- .../integration/ship_local_ceremony_test.go | 58 --- skills/integration/skill_surface_test.go | 374 --------------- skills/integration/skill_text_test.go | 297 ------------ .../integration/spacedock_vocabulary_test.go | 307 ------------- .../terminal_teardown_retry_test.go | 338 -------------- skills/integration/using_claude_team_test.go | 193 -------- skills/integration/working_principles_test.go | 68 --- 31 files changed, 863 insertions(+), 4870 deletions(-) create mode 100644 internal/contractlint/boundary_guard_control_test.go create mode 100644 internal/contractlint/boundary_guard_test.go create mode 100644 internal/contractlint/doc_test.go create mode 100644 internal/contractlint/instruction_read_detector_test.go create mode 100644 internal/contractlint/structural_checks_test.go delete mode 100644 internal/hostneutrality/code_source_test.go delete mode 100644 internal/hostneutrality/codex_runtime_contract_test.go delete mode 100644 internal/hostneutrality/ensign_dev_leakage_locks_test.go delete mode 100644 internal/hostneutrality/live_scenario_practice_test.go delete mode 100644 internal/hostneutrality/nonac_marker_test.go delete mode 100644 internal/hostneutrality/prose_inflator_locks_test.go delete mode 100644 internal/hostneutrality/prose_neutrality_test.go delete mode 100644 internal/hostneutrality/split_root_sync_contract_test.go delete mode 100644 skills/integration/contract_gate_test.go delete mode 100644 skills/integration/contract_status_path_test.go delete mode 100644 skills/integration/feedback_rejection_flow_test.go delete mode 100644 skills/integration/nonac_marker_test.go delete mode 100644 skills/integration/portability_test.go delete mode 100644 skills/integration/present_gate_test.go delete mode 100644 skills/integration/reconcile_session_contract_test.go delete mode 100644 skills/integration/ship_local_ceremony_test.go delete mode 100644 skills/integration/skill_surface_test.go delete mode 100644 skills/integration/skill_text_test.go delete mode 100644 skills/integration/spacedock_vocabulary_test.go delete mode 100644 skills/integration/terminal_teardown_retry_test.go delete mode 100644 skills/integration/using_claude_team_test.go delete mode 100644 skills/integration/working_principles_test.go diff --git a/docs/dev/README.md b/docs/dev/README.md index 1a6ef338..fee7cda8 100644 --- a/docs/dev/README.md +++ b/docs/dev/README.md @@ -120,7 +120,7 @@ A task moves to validation after implementation is complete. The work here is to - **What it produces:** the auditor works on a separate throwaway checkout (never the implementation worktree) and never mutates the deliverable. It tries to REFUTE the validation — construct an adversarial edit that the deliverable's own tests should catch, and confirm they do. A test that stays green under an edit that breaks the claim is a hole. Findings come in two tiers — `Material:` (a real correctness or test-strength hole, e.g. an assertion that green-lights a regression) and `Polish:` (non-blocking). "Refuted nothing material" is itself a valid, recorded outcome. - **How it is recorded:** material findings route back through the normal validation→implementation feedback flow (a `### Feedback Cycles` entry naming the audit and its adversarial edit); the gate is not presented as clean until they are closed. A clean audit is noted in the gate's reviewer-findings block (or a one-line "detached audit: no material findings"). - **Why:** the audit catches the class of hole where the test passes but would also pass on a broken future edit — which validation, trusting its own green suite, cannot see. Real catches on the record: #262 (`binary-absent-fo-bootstrap`) — validation passed correct prose, the audit then found two test-strength holes in `contract_gate_test.go` (a `strings.Count(...) > 0` check that skipped on zero mentions, and a bare `strings.Contains` satisfied by a negated disclaimer); `1x` (`code-cleanups-0193`) AC-6 and `external-tracker-checkpoint` AC-6 (a self-referential "verified by review of this entity's own section" that can never fail); `7h` (`release-notes-local-summary`) AC-3 (validation passed, the audit found the tag-cut folded the notes block into the tag subject instead of the body). This is read-only refutation, not a second implementation pass. - - **AC-3 sweep reader axis — this audit IS the backstop:** the offline AC-3 sweeps (`TestNoUndeclaredTautologicalProof`, `TestNoUndeclaredHostneutralityTautology`) statically guard the MATCH axis — once a read of a recognized instruction file is detected, the test must self-classify (`markNonAC`/`markCodeBoundInvariant`) regardless of the inspection idiom. They deliberately do NOT guard the READER axis — whether an undeclared instruction-file read hides via an undiscovered read SHAPE (a path built across statements/params/fields, a transitive helper chain, a range-element flow, a cross-package reader, a package var in another file, or an unrecognized surface like `AGENTS.md`/`mods/*.md`). A per-package `go/ast` scan structurally cannot see a cross-package read or a path built elsewhere, so chasing reader-shape completeness is the enumeration trap the proof policy itself warns against. The reader axis is covered by THIS detached adversarial audit at the high-stakes gate, not by the static sweep — a reader-shape evasion is a documented, audited boundary, not a silent gap. + - **Instruction-file read quarantine:** tests do not read prompt or instruction files except in `internal/contractlint`, and that package is limited to structural checks: reference closure, frontmatter validity, structural absence, dedup, and similar machine-checkable properties. Prose-grep is banned: a test that asserts a skill, contract, agent file, or this workflow README contains its own wording proves only that the wording is present. Code-bound prose checks are banned too: a prose-to-code consistency lint is not a behavior test, and it must never substitute for running the behavior. If a deleted prose/code-bound read exposed an untested behavior that still matters, record the owed behavior test instead of keeping the read. The boundary guard fails on instruction-file reads outside the quarantine; high-stakes validation still uses detached adversarial audit to refute whether the remaining behavior tests would catch a broken edit. ### `done` diff --git a/internal/contractlint/boundary_guard_control_test.go b/internal/contractlint/boundary_guard_control_test.go new file mode 100644 index 00000000..6f81b101 --- /dev/null +++ b/internal/contractlint/boundary_guard_control_test.go @@ -0,0 +1,149 @@ +// ABOUTME: Mutation control for the boundary guard — a planted out-of-quarantine +// ABOUTME: instruction read must RED the detector; a non-instruction read must not. +package contractlint + +import ( + "go/parser" + "go/token" + "os" + "path/filepath" + "testing" +) + +// TestBoundaryGuardDetectsAPlantedInstructionRead is the guard's mutation control: +// the guard is the boundary oracle, so it must be demonstrated to RED on the exact +// shape it bans (a test that reads an instruction file) and stay GREEN on a read +// that is not an instruction file. Without this, the guard could silently degrade +// to a no-op (e.g. the detector predicate inverted) and pass vacuously. It parses +// synthetic source through the same detector the real sweep uses. +func TestBoundaryGuardDetectsAPlantedInstructionRead(t *testing.T) { + dir := t.TempDir() + + // A planted out-of-quarantine instruction read: reads a skill body and inspects + // it. The retired marker model would have let this pass with a declaration; the + // guard bans it regardless. + instructionRead := `package fixture +func TestReadsInstruction(t *T) { + data, _ := os.ReadFile("../../skills/first-officer/references/first-officer-shared-core.md") + if strings.Contains(string(data), "x") { _ = data } +} +` + // A non-instruction read: reads a JSON manifest (the binary parses it) and a + // generated state entity. Neither is an instruction surface, so the guard must + // not flag it. + nonInstructionRead := `package fixture +func TestReadsManifest(t *T) { + a, _ := os.ReadFile("../../.agents/plugins/marketplace.json") + b, _ := os.ReadFile(filepath.Join(stateDir, "entity", "index.md")) + _ = a + _ = b +} +` + nonInstructionWalk := `package fixture +func TestWalksFixtureMarkdown(t *T) { + filepath.WalkDir("testdata", func(path string, d DirEntry, err error) error { + if strings.HasSuffix(path, ".md") { + _ = path + } + return nil + }) +} +` + writeFixture(t, filepath.Join(dir, "instruction_read_test.go"), instructionRead) + if got := instructionReadingTestFiles(t, dir); !contains(got, "instruction_read_test.go") { + t.Fatalf("guard failed to flag a planted instruction-file read; flagged=%v", got) + } + + writeFixture(t, filepath.Join(dir, "manifest_read_test.go"), nonInstructionRead) + writeFixture(t, filepath.Join(dir, "fixture_walk_test.go"), nonInstructionWalk) + got := instructionReadingTestFiles(t, dir) + if contains(got, "manifest_read_test.go") { + t.Fatalf("guard wrongly flagged a non-instruction read (json manifest / generated state entity); flagged=%v", got) + } + if contains(got, "fixture_walk_test.go") { + t.Fatalf("guard wrongly flagged a non-instruction markdown fixture walk; flagged=%v", got) + } + if !contains(got, "instruction_read_test.go") { + t.Fatalf("adding a non-instruction fixture must not stop the guard flagging the instruction one; flagged=%v", got) + } +} + +// TestBoundaryGuardSweepFlagsAPlantedDir drives the directory sweep itself (not just +// the per-file detector) against a planted repo layout: a policed dir outside the +// quarantine that reads an instruction file must appear as an offender, and the +// quarantine dir's own instruction read must NOT. +func TestBoundaryGuardSweepFlagsAPlantedDir(t *testing.T) { + root := t.TempDir() + + // A policed dir (not the quarantine) with an instruction read -> offender. + policedDir := filepath.Join(root, "internal", "hostneutrality") + if err := os.MkdirAll(policedDir, 0o755); err != nil { + t.Fatal(err) + } + writeFixture(t, filepath.Join(policedDir, "leak_test.go"), `package fixture +func TestLeak(t *T) { + data, _ := os.ReadFile("../../skills/ensign/references/ensign-shared-core.md") + _ = data +} +`) + + // The quarantine dir with an instruction read -> NOT an offender (the legal path). + quarantineDir := filepath.Join(root, quarantinePkg) + if err := os.MkdirAll(quarantineDir, 0o755); err != nil { + t.Fatal(err) + } + writeFixture(t, filepath.Join(quarantineDir, "legal_read_test.go"), `package fixture +func TestLegalRead(t *T) { + data, _ := os.ReadFile("../../skills/commission/SKILL.md") + _ = data +} +`) + // skills/integration must exist for the sweep's ReadDir; leave it empty (no reads). + if err := os.MkdirAll(filepath.Join(root, "skills", "integration"), 0o755); err != nil { + t.Fatal(err) + } + + offenders := sweepInstructionReadsOutsideQuarantine(t, root) + if !contains(offenders, filepath.Join("internal", "hostneutrality", "leak_test.go")) { + t.Fatalf("sweep failed to flag the out-of-quarantine instruction read; offenders=%v", offenders) + } + otherDir := filepath.Join(root, "cmd", "spacedock") + if err := os.MkdirAll(otherDir, 0o755); err != nil { + t.Fatal(err) + } + writeFixture(t, filepath.Join(otherDir, "leak_test.go"), `package fixture +func TestLeak(t *T) { + data, _ := os.ReadFile("../../skills/first-officer/SKILL.md") + _ = data +} +`) + offenders = sweepInstructionReadsOutsideQuarantine(t, root) + if !contains(offenders, filepath.Join("cmd", "spacedock", "leak_test.go")) { + t.Fatalf("sweep failed to flag an instruction read outside the historical packages; offenders=%v", offenders) + } + for _, o := range offenders { + if filepath.Dir(o) == quarantinePkg { + t.Fatalf("sweep wrongly flagged the quarantine package's own read %q; the quarantine is the legal read path", o) + } + } +} + +func contains(in []string, want string) bool { + for _, s := range in { + if s == want { + return true + } + } + return false +} + +func writeFixture(t *testing.T, path, content string) { + t.Helper() + if err := os.WriteFile(path, []byte(content), 0o644); err != nil { + t.Fatal(err) + } +} + +// ensure go/parser + token stay referenced for the fixture-shape sanity below. +var _ = parser.ParseFile +var _ = token.NewFileSet diff --git a/internal/contractlint/boundary_guard_test.go b/internal/contractlint/boundary_guard_test.go new file mode 100644 index 00000000..7f9cc119 --- /dev/null +++ b/internal/contractlint/boundary_guard_test.go @@ -0,0 +1,168 @@ +// ABOUTME: The instruction-file-read boundary guard — instruction/prompt files are +// ABOUTME: read in tests ONLY here in the quarantine package; this sweep bans them everywhere else. +package contractlint + +import ( + "go/ast" + "go/parser" + "go/token" + "os" + "path/filepath" + "strings" + "testing" +) + +// READING INSTRUCTION/PROMPT FILES IN A TEST IS BANNED BY DEFAULT. +// +// An instruction file is a markdown surface the model ingests — a skill body +// (skills/**/SKILL.md), a contract/runtime reference (skills/**/references/*.md), +// or an agent definition (agents/*.md). The ONLY legal place a test reads one is +// THIS quarantine package, and only for a STRUCTURAL check — a defect a machine +// can see without reading prose for meaning: +// +// - a @-reference resolves to a real file on disk (ref-closure), +// - YAML frontmatter parses and declares its required keys (frontmatter-validity), +// - a retired path is ABSENT from the shipped surface (structural-absence), +// - a fact appears in exactly one source / a count holds (dedup), +// - a file clears a size floor (line-floor). +// +// What is BANNED, here and everywhere: +// +// - PROSE-GREP — asserting an instruction file contains (or lacks) its own prose. +// The file is its own source of truth, so the assertion is a tautology: it +// cannot fail for a real reason, and a meaning-inverting paraphrase keeps every +// grepped token. "The skill says X" is never proof the system DOES X. +// - CODE-BOUND-AS-BEHAVIOR-SUBSTITUTE — asserting an instruction file's prose +// matches a code value (a const, a router subcommand, a seam name). That is a +// consistency lint, not a behavior test. If the behavior matters there must be a +// BEHAVIOR test that RUNS it; the prose⟷code check never proved the behavior. +// +// Behavior is proven by RUNNING it — a live scenario, an offline command-level +// drive, a code-side invariant over real parsed source — never by reading prose. +// The reader-shape axis (an undeclared read hiding via an undiscovered read shape) +// is backstopped by the detached adversarial audit at every high-stakes gate, not +// by enumerating read shapes here. +// +// This guard is the polarity flip of the retired AC-3 marker sweep: that sweep let +// a test read prose if it declared a marker (a permission slip); this guard bans +// the read outright outside this package. There is no marker. There is no taint +// flow. A read outside the quarantine is a failure, full stop. + +// quarantinePkg is the only package directory where an instruction-file read is +// legal. Paths are repo-relative (the guard walks from the repo root). +const quarantinePkg = "internal/contractlint" + +// TestNoInstructionReadsOutsideQuarantine is the boundary guard, re-runnable +// offline. It parses every *_test.go under the repo and FAILS if any test FILE +// outside the quarantine package contains a function that reads an instruction +// file's content. The quarantine package itself is exempt — it is the single legal +// read path for the structural checks. The count of out-of-quarantine reader files +// must be zero. +func TestNoInstructionReadsOutsideQuarantine(t *testing.T) { + offenders := sweepInstructionReadsOutsideQuarantine(t, repoRoot(t)) + for _, o := range offenders { + t.Errorf("%s reads an instruction file's content in a test — instruction/prompt files are read ONLY in %s for structural checks; behavior is proven by RUNNING it, never by reading prose (see the package doc)", o, quarantinePkg) + } + if len(offenders) > 0 { + t.Fatalf("boundary guard: %d file(s) read instruction content outside %s; the count must be zero", len(offenders), quarantinePkg) + } +} + +// sweepInstructionReadsOutsideQuarantine returns the repo-relative paths of +// *_test.go files OUTSIDE the quarantine package that contain a function reading +// an instruction file's content. Exported logic so the guard's mutation control +// can drive it against a planted fixture directory. +func sweepInstructionReadsOutsideQuarantine(t *testing.T, repoRootDir string) []string { + t.Helper() + var offenders []string + err := filepath.WalkDir(repoRootDir, func(path string, d os.DirEntry, err error) error { + if err != nil { + return err + } + rel, relErr := filepath.Rel(repoRootDir, path) + if relErr != nil { + return relErr + } + if d.IsDir() { + switch rel { + case ".git", ".worktrees", "docs/dev/.spacedock-state", "vendor", quarantinePkg: + return filepath.SkipDir + } + return nil + } + if !strings.HasSuffix(d.Name(), "_test.go") { + return nil + } + fset := token.NewFileSet() + f, parseErr := parser.ParseFile(fset, path, nil, 0) + if parseErr != nil { + return parseErr + } + if fileReadsInstructionContent(f) { + offenders = append(offenders, rel) + } + return nil + }) + if err != nil { + t.Fatalf("sweep instruction reads under %s: %v", repoRootDir, err) + } + return sortedUnique(offenders) +} + +// instructionReadingTestFiles returns the base names of *_test.go files in dir +// that contain at least one function reading an instruction file's content. +func instructionReadingTestFiles(t *testing.T, dir string) []string { + t.Helper() + fset := token.NewFileSet() + entries, err := os.ReadDir(dir) + if err != nil { + t.Fatalf("read package dir %s: %v", dir, err) + } + var files []string + for _, ent := range entries { + name := ent.Name() + if ent.IsDir() || !strings.HasSuffix(name, "_test.go") { + continue + } + f, err := parser.ParseFile(fset, filepath.Join(dir, name), nil, 0) + if err != nil { + t.Fatalf("parse %s: %v", name, err) + } + if fileReadsInstructionContent(f) { + files = append(files, name) + } + } + return sortedUnique(files) +} + +// fileReadsInstructionContent reports whether any function declared in f reads an +// instruction file's content. +func fileReadsInstructionContent(f *ast.File) bool { + for _, decl := range f.Decls { + fn, ok := decl.(*ast.FuncDecl) + if !ok { + continue + } + if directlyReadsInstructionFile(fn) { + return true + } + } + return false +} + +func sortedUnique(in []string) []string { + seen := map[string]bool{} + var out []string + for _, s := range in { + if !seen[s] { + seen[s] = true + out = append(out, s) + } + } + for i := 1; i < len(out); i++ { + for j := i; j > 0 && out[j-1] > out[j]; j-- { + out[j-1], out[j] = out[j], out[j-1] + } + } + return out +} diff --git a/internal/contractlint/doc_test.go b/internal/contractlint/doc_test.go new file mode 100644 index 00000000..c5640392 --- /dev/null +++ b/internal/contractlint/doc_test.go @@ -0,0 +1,13 @@ +// ABOUTME: Package-level policy for the instruction-file-read quarantine. +// ABOUTME: Structural checks live here; prose-grep and code-bound behavior substitutes do not. +package contractlint + +// This package is the instruction-file-read quarantine for tests. Tests outside +// this package must not read skill, contract, runtime-adapter, or agent markdown. +// Tests inside it may read those files only for structural facts a machine can +// verify without interpreting prose: reference closure, frontmatter validity, +// structural absence, deduplication, line/count floors, and portability markers. +// +// Do not add prose-grep checks here. Do not add prose-to-code consistency checks +// as behavior substitutes. If behavior matters, test it by running the behavior; +// if no behavior test exists yet, delete the read and report the owed test. diff --git a/internal/contractlint/instruction_read_detector_test.go b/internal/contractlint/instruction_read_detector_test.go new file mode 100644 index 00000000..de1d364b --- /dev/null +++ b/internal/contractlint/instruction_read_detector_test.go @@ -0,0 +1,135 @@ +// ABOUTME: The instruction-file-read detector the boundary guard keys on — a +// ABOUTME: direct-read predicate (read sink + recognized instruction path, or a WalkDir .md collector). +package contractlint + +import ( + "go/ast" + "go/token" + "os" + "path/filepath" + "strings" + "testing" +) + +// readSinks are the call selectors that ingest a file's content given a path. +var readSinks = map[string]bool{ + "ReadFile": true, // os.ReadFile + "Open": true, // os.Open + "ReadAll": true, // io.ReadAll + "NewScanner": true, // bufio.NewScanner + "NewReader": true, // bufio.NewReader +} + +// instructionPathSegments are the skill-tree / contract path segments that mark a +// path literal as an instruction file the model ingests (a skill, contract, agent, +// or runtime adapter). A path carrying one of these segments is an instruction +// path, recognized even before a `.md` suffix is appended. +var instructionPathSegments = map[string]bool{ + "skills": true, + "references": true, + "agents": true, + "first-officer": true, + "ensign": true, + "commission": true, + "present-gate": true, + "SKILL.md": true, +} + +// directlyReadsInstructionFile reports whether fn DIRECTLY ingests a recognized +// instruction file's content: a read sink (ReadFile/Open/ReadAll/bufio) whose path +// arg subtree carries a recognized instruction literal/segment, or a WalkDir/Walk +// whose root arg carries a recognized instruction literal/segment and whose +// function body collects `.md` files. This is the direct-read predicate — no +// data-flow tracking. A read reached only through a param/local/field/method/ +// closure flow, a transitive helper chain, a range-element flow, a cross-package +// reader, or an unrecognized surface is NOT detected here; that reader-shape axis +// is the detached-audit-backstopped boundary documented in the package guard doc. +func directlyReadsInstructionFile(fn *ast.FuncDecl) bool { + found := false + ast.Inspect(fn, func(n ast.Node) bool { + call, ok := n.(*ast.CallExpr) + if !ok { + return true + } + sel, ok := call.Fun.(*ast.SelectorExpr) + if !ok { + return true + } + if readSinks[sel.Sel.Name] { + for _, arg := range call.Args { + if exprCarriesInstructionPath(arg) { + found = true + } + } + } + if (sel.Sel.Name == "WalkDir" || sel.Sel.Name == "Walk") && + len(call.Args) > 0 && + exprCarriesInstructionPath(call.Args[0]) && + fnFiltersInstructionMarkdown(fn) { + found = true + } + return true + }) + return found +} + +// exprCarriesInstructionPath reports whether expr DIRECTLY carries an instruction-file +// path anywhere in its subtree: an instruction-path literal/segment — so the +/ +// strings.Join/filepath.Join/fmt.Sprintf/string(...) path-build idioms (whose +// instruction operand is a subtree node) are covered when their operands are +// literals. +func exprCarriesInstructionPath(expr ast.Expr) bool { + hit := false + ast.Inspect(expr, func(n ast.Node) bool { + if x, ok := n.(*ast.BasicLit); ok { + if x.Kind == token.STRING && isInstructionPathLiteral(strings.Trim(x.Value, "`\"")) { + hit = true + } + } + return true + }) + return hit +} + +// fnFiltersInstructionMarkdown reports whether fn's body filters paths by a `.md` +// suffix — the WalkDir-collector signal. +func fnFiltersInstructionMarkdown(fn *ast.FuncDecl) bool { + hit := false + ast.Inspect(fn, func(n ast.Node) bool { + if lit, ok := n.(*ast.BasicLit); ok && lit.Kind == token.STRING { + if strings.HasSuffix(strings.Trim(lit.Value, "`\""), ".md") { + hit = true + } + } + return true + }) + return hit +} + +// isInstructionPathLiteral reports whether a string literal is (a fragment of) an +// instruction-file path: it carries a skill-tree/contract segment. A `.json` +// manifest path carries none and is not instruction. +func isInstructionPathLiteral(s string) bool { + if strings.HasSuffix(s, ".json") { + return false + } + for seg := range instructionPathSegments { + if s == seg || strings.Contains(s, seg) { + return true + } + } + return false +} + +// repoRoot returns the repository root, derived from this package's source dir +// (internal/contractlint), so the guard's filesystem walk is independent of the +// test's working directory. +func repoRoot(t *testing.T) string { + t.Helper() + wd, err := os.Getwd() + if err != nil { + t.Fatalf("getwd: %v", err) + } + // go test runs with cwd = the package source dir (internal/contractlint). + return filepath.Clean(filepath.Join(wd, "..", "..")) +} diff --git a/internal/contractlint/structural_checks_test.go b/internal/contractlint/structural_checks_test.go new file mode 100644 index 00000000..3baba8f3 --- /dev/null +++ b/internal/contractlint/structural_checks_test.go @@ -0,0 +1,393 @@ +// ABOUTME: The ONLY legal instruction-file reads in tests — structural checks a +// ABOUTME: machine can see (ref-closure, frontmatter-validity, structural-absence, dedup, no-machine-dependency). +package contractlint + +import ( + "os" + "path/filepath" + "regexp" + "strings" + "testing" +) + +// This file is the single quarantined read path the boundary guard exempts. Every +// check here is STRUCTURAL — it catches a defect a machine can see without reading +// prose for meaning. No check asserts an instruction file contains its own prose +// (prose-grep), and none asserts the prose matches a code value as a stand-in for a +// behavior test (code-bound). See boundary_guard_test.go for the full policy. + +// userSkills is the published user-skill surface: each owns a SKILL.md the host +// discovers. The test-only `integration` package is deliberately absent. +var userSkills = []string{ + "commission", "debrief", "refit", "ensign", + "first-officer", "using-claude-team", "present-gate", "feedback-rejection-flow", +} + +// skillsRoot is the shipped skill tree under test. +func skillsRoot(t *testing.T) string { + t.Helper() + return filepath.Join(repoRoot(t), "skills") +} + +// frontmatter returns the YAML frontmatter block (between the leading `---` and the +// next `---`) and whether the document opened with one. +func frontmatter(doc string) (string, bool) { + if !strings.HasPrefix(doc, "---\n") { + return "", false + } + rest := doc[len("---\n"):] + end := strings.Index(rest, "\n---") + if end < 0 { + return "", false + } + return rest[:end], true +} + +// TestUserSkillsParseWithFrontmatter is a frontmatter-VALIDITY check: each user +// skill ships a SKILL.md whose YAML frontmatter block parses and declares a `name` +// and a `description` key. A malformed or keyless frontmatter block fails host +// discovery — a real structural defect, not a prose property. +func TestUserSkillsParseWithFrontmatter(t *testing.T) { + root := skillsRoot(t) + for _, skill := range userSkills { + path := filepath.Join(root, skill, "SKILL.md") + data, err := os.ReadFile(path) + if err != nil { + t.Errorf("user skill %q has no SKILL.md at %s: %v", skill, path, err) + continue + } + fm, ok := frontmatter(string(data)) + if !ok { + t.Errorf("%s/SKILL.md has no parseable YAML frontmatter block", skill) + continue + } + if !frontmatterHasKey(fm, "name") { + t.Errorf("%s/SKILL.md frontmatter declares no `name` key", skill) + } + if !frontmatterHasKey(fm, "description") { + t.Errorf("%s/SKILL.md frontmatter declares no `description` key", skill) + } + } +} + +// frontmatterHasKey reports whether a top-level `key:` line is declared in a parsed +// frontmatter block. Structural (a declared key), not a prose match on its value. +func frontmatterHasKey(fm, key string) bool { + prefix := key + ":" + for _, line := range strings.Split(fm, "\n") { + if strings.HasPrefix(line, prefix) { + return true + } + } + return false +} + +// referenceRe matches the two reference-include forms a SKILL.md uses: an +// `@references/foo.md` directive and a bare `references/foo.md` read path. +var referenceRe = regexp.MustCompile(`@?(references/[A-Za-z0-9_./-]+\.md)`) + +// TestUserSkillReferenceClosureResolves is a ref-CLOSURE check: every +// `@references/...md` / `references/...md` path mentioned in a user SKILL.md +// resolves to a real file on disk under that skill's directory. A dangling +// reference (a ported skill pointing at a path that does not exist) is a real +// structural defect the host would hit at load time. Brace-placeholder template +// paths resolve against their concrete siblings. +func TestUserSkillReferenceClosureResolves(t *testing.T) { + root := skillsRoot(t) + for _, skill := range userSkills { + skillDir := filepath.Join(root, skill) + data, err := os.ReadFile(filepath.Join(skillDir, "SKILL.md")) + if err != nil { + t.Errorf("%s: %v", skill, err) + continue + } + for _, m := range referenceRe.FindAllStringSubmatch(string(data), -1) { + rel := m[1] + if strings.Contains(rel, "{") { + parent := filepath.Join(skillDir, filepath.Dir(rel)) + glob, _ := filepath.Glob(filepath.Join(parent, "*.md")) + if len(glob) == 0 { + t.Errorf("%s: templated reference %q has no concrete .md under %s", skill, rel, parent) + } + continue + } + if _, err := os.Stat(filepath.Join(skillDir, rel)); err != nil { + t.Errorf("%s: dangling reference %q (resolved %s): %v", skill, rel, filepath.Join(skillDir, rel), err) + } + } + } +} + +// piRuntimeAdapters are the Pi runtime adapter references each host-aware skill +// must ship as a loadable file on disk. +var piRuntimeAdapters = []struct{ skill, ref string }{ + {skill: "first-officer", ref: "references/pi-first-officer-runtime.md"}, + {skill: "ensign", ref: "references/pi-ensign-runtime.md"}, +} + +// TestPiRuntimeAdaptersResolveOnDisk is a ref-CLOSURE check: each declared Pi +// runtime adapter resolves to a real file on disk. (The retired prose-grep half — +// asserting the SKILL.md advertises the ref string — is gone; that the file LOADS +// is the structural fact, proven by os.Stat.) +func TestPiRuntimeAdaptersResolveOnDisk(t *testing.T) { + root := skillsRoot(t) + for _, tc := range piRuntimeAdapters { + path := filepath.Join(root, tc.skill, tc.ref) + if _, err := os.Stat(path); err != nil { + t.Errorf("%s: Pi runtime adapter %s is not loadable on disk: %v", tc.skill, tc.ref, err) + } + } +} + +// retiredPluginPrivatePaths are the plugin-private status paths the shipped +// instruction surface must NEVER name once it calls `spacedock status`. A +// re-introduced reference silently breaks the launcher contract on a fresh install. +var retiredPluginPrivatePaths = []string{ + "skills/commission/bin/status", + "commission/bin/status", + "{spacedock_plugin_dir}", + ".agents/plugins/marketplace.json", +} + +// TestRetiredPluginPrivatePathsAbsent is a structural-ABSENCE check: no shipped +// instruction file (skills/**/*.md excluding the test-only integration dir, plus +// the canonical mods/) names a retired plugin-private status path. No positive +// behavioral seam can prove an absence; a re-introduced path is a real defect a +// fresh install would hit. (Consolidates the three prior near-duplicate +// plugin-private-path absence checks into one read path.) +func TestRetiredPluginPrivatePathsAbsent(t *testing.T) { + files := shippedInstructionMarkdown(t) + if len(files) == 0 { + t.Fatal("walked zero shipped instruction files — scope bug; the absence check would pass vacuously") + } + for _, path := range files { + data, err := os.ReadFile(path) + if err != nil { + t.Errorf("read %s: %v", path, err) + continue + } + content := string(data) + for _, p := range retiredPluginPrivatePaths { + if strings.Contains(content, p) { + t.Errorf("%s names retired plugin-private path %q — it does not exist on a fresh install", path, p) + } + } + } +} + +// TestNoUnexpectedModHookOrPRMergeIntroduced is a structural-ABSENCE check over +// the shipped surface: outside the pre-existing hook convention docs and the +// canonical pr-merge mod, instruction files must not introduce lifecycle-mod +// headings or PR-merge invocations. A new `## Hook:` mod or PR-merge command in a +// skill silently changes dispatch lifecycle, so the allowed files are explicit. +func TestNoUnexpectedModHookOrPRMergeIntroduced(t *testing.T) { + allowedHookFiles := map[string]bool{ + filepath.Join("mods", "pr-merge.md"): true, + filepath.Join("skills", "first-officer", "references", "claude-first-officer-runtime.md"): true, + filepath.Join("skills", "first-officer", "references", "first-officer-shared-core.md"): true, + } + allowedPRMergeFiles := map[string]bool{ + filepath.Join("mods", "pr-merge.md"): true, + } + prMergeMarkers := []string{"gh pr merge", "git merge --no-ff", "git merge --ff-only main"} + root := repoRoot(t) + for _, path := range shippedInstructionMarkdown(t) { + rel, err := filepath.Rel(root, path) + if err != nil { + t.Errorf("rel %s: %v", path, err) + continue + } + data, err := os.ReadFile(path) + if err != nil { + t.Errorf("read %s: %v", path, err) + continue + } + content := string(data) + if strings.Contains(content, "## Hook:") && !allowedHookFiles[rel] { + t.Errorf("%s introduces a lifecycle hook heading outside the allowed hook surfaces", rel) + } + for _, marker := range prMergeMarkers { + if strings.Contains(content, marker) && !allowedPRMergeFiles[rel] { + t.Errorf("%s introduces PR-merge invocation %q outside the canonical pr-merge mod", rel, marker) + } + } + } +} + +// TestStartupGateGuidanceHasSingleSource is a DEDUP / single-source check: the +// startup-gate abort guidance lives in exactly ONE prose file +// (first-officer-shared-core.md), and agents/first-officer.md delegates rather than +// mirroring it. A second copy would let the two surfaces drift — a real structural +// defect (not a prose property): the count of files carrying the gate markers must +// be exactly one. +func TestStartupGateGuidanceHasSingleSource(t *testing.T) { + root := repoRoot(t) + markers := []string{"Contract version gate", "per-class remedy", "spacedock doctor"} + + var sources []string + for _, tree := range []string{"skills", "agents"} { + base := filepath.Join(root, tree) + err := filepath.WalkDir(base, func(p string, d os.DirEntry, err error) error { + if err != nil { + return err + } + if d.IsDir() || filepath.Ext(p) != ".md" { + return nil + } + b, readErr := os.ReadFile(p) + if readErr != nil { + return readErr + } + text := string(b) + for _, m := range markers { + if strings.Contains(text, m) { + rel, _ := filepath.Rel(root, p) + sources = append(sources, rel) + return nil + } + } + return nil + }) + if err != nil { + t.Fatalf("walk %s: %v", tree, err) + } + } + + const sole = "skills/first-officer/references/first-officer-shared-core.md" + if len(sources) != 1 || sources[0] != sole { + t.Errorf("startup-gate prose has sources %v, want exactly [%s] (single source of truth)", sources, sole) + } +} + +// homeRootedClaudeRe matches only HOME-rooted personal-config forms: a `~/.claude` +// tilde path, or `$HOME` / `os.UserHomeDir` joined with `.claude` on the same line. +// It does NOT match a project-relative `.claude/` path, which exists in any checkout +// and is portable. +var homeRootedClaudeRe = regexp.MustCompile(`~/\.claude|\$HOME[^\n]*\.claude|os\.UserHomeDir[^\n]*\.claude`) + +// interpreterRe matches an interpreter-on-PATH dependency: a `python`/`python3` +// shell-out or a `commission/bin/...` plugin-private helper invocation. +var interpreterRe = regexp.MustCompile(`\bpython3?\b|commission/bin`) + +// machineDependentPaths are plugin-private absolute paths that do not exist on a +// fresh install. +var machineDependentPaths = []string{ + "skills/commission/bin/status", + "{spacedock_plugin_dir}", + ".agents/plugins/marketplace.json", +} + +// isClaudeAdapter reports whether a shipped file is a Claude-host coupling surface +// (a claude-*-runtime.md adapter or the Claude-only using-claude-team skill), where +// a `~/.claude/teams` read is the legitimate quarantined coupling. ONLY the +// personal-config check excludes these; the interpreter / machine-path checks apply. +func isClaudeAdapter(path string) bool { + base := filepath.Base(path) + if strings.HasPrefix(base, "claude-") && strings.HasSuffix(base, "-runtime.md") { + return true + } + return strings.Contains(path, filepath.Join("using-claude-team", "SKILL.md")) +} + +// TestShippedSurfaceHasNoHiddenMachineDependency is a no-MACHINE-DEPENDENCY +// structural-absence check: the shipped instruction surface names none of three +// non-portable markers — a HOME-rooted personal-config path, an interpreter-on-PATH +// shell-out, or a plugin-private absolute path. A clean install must run for any +// user; a re-introduced marker is a real defect that breaks a fresh install. The +// empty-walk guard keeps it from passing vacuously. +func TestShippedSurfaceHasNoHiddenMachineDependency(t *testing.T) { + files := shippedInstructionMarkdown(t) + if len(files) == 0 { + t.Fatal("walked zero shipped instruction files — scope bug; the portability check would pass vacuously") + } + for _, path := range files { + data, err := os.ReadFile(path) + if err != nil { + t.Errorf("read %s: %v", path, err) + continue + } + content := string(data) + if !isClaudeAdapter(path) { + if m := homeRootedClaudeRe.FindString(content); m != "" { + t.Errorf("%s carries a HOME-rooted personal-config dependency %q — a clean install has no such file", path, m) + } + } + if m := interpreterRe.FindString(content); m != "" { + t.Errorf("%s carries an interpreter-on-PATH dependency %q — the dispatch path must not assume an installed interpreter/helper", path, m) + } + for _, p := range machineDependentPaths { + if strings.Contains(content, p) { + t.Errorf("%s bakes in plugin-private path %q — it does not exist on a fresh install", path, p) + } + } + } +} + +// TestPortabilityCheckDiscriminatesHostSpecific is the DISCRIMINATOR control for +// the no-machine-dependency check: it proves — against the real shipped surface — +// that the legitimately host-specific forms are present yet not flagged, so the +// absence check is not vacuous. The Claude adapter's `~/.claude/teams` read is +// present (the adapter exclusion is load-bearing), and the project-relative +// `.claude/` paths are present yet the HOME-rooted regex does not match them. +func TestPortabilityCheckDiscriminatesHostSpecific(t *testing.T) { + files := shippedInstructionMarkdown(t) + var sawAdapterHomeClaude, sawProjectRelativeClaude bool + for _, path := range files { + data, err := os.ReadFile(path) + if err != nil { + t.Errorf("read %s: %v", path, err) + continue + } + content := string(data) + if isClaudeAdapter(path) && strings.Contains(content, "~/.claude") { + sawAdapterHomeClaude = true + } + if !isClaudeAdapter(path) { + for _, line := range strings.Split(content, "\n") { + if !strings.Contains(line, ".claude/") { + continue + } + if strings.Contains(line, "~/.claude") || strings.Contains(line, "$HOME") { + continue + } + sawProjectRelativeClaude = true + if homeRootedClaudeRe.MatchString(line) { + t.Errorf("%s: project-relative .claude line wrongly matched the HOME-rooted regex (false positive): %q", path, strings.TrimSpace(line)) + } + } + } + } + if !sawAdapterHomeClaude { + t.Error("positive control missing: no Claude adapter carries a ~/.claude read — the adapter exclusion is no longer load-bearing") + } + if !sawProjectRelativeClaude { + t.Error("positive control missing: no shipped file carries a project-relative .claude/ path — the discriminator has nothing to discriminate") + } +} + +// shippedInstructionMarkdown returns every markdown file under skills/ (excluding +// the test-only integration dir) plus the canonical mods/ — the full shipped +// instruction surface the structural-absence and portability checks walk. This is +// the single instruction-surface walk for the quarantine package. +func shippedInstructionMarkdown(t *testing.T) []string { + t.Helper() + root := repoRoot(t) + var out []string + walk := func(base string) { + filepath.WalkDir(base, func(p string, d os.DirEntry, err error) error { + if err != nil { + return err + } + if d.IsDir() && d.Name() == "integration" { + return filepath.SkipDir + } + if !d.IsDir() && strings.HasSuffix(p, ".md") { + out = append(out, p) + } + return nil + }) + } + walk(filepath.Join(root, "skills")) + walk(filepath.Join(root, "mods")) + return out +} diff --git a/internal/hostneutrality/code_source_test.go b/internal/hostneutrality/code_source_test.go deleted file mode 100644 index cec1f1d8..00000000 --- a/internal/hostneutrality/code_source_test.go +++ /dev/null @@ -1,103 +0,0 @@ -// ABOUTME: Code-derived sources the hostneutrality re-binds bind to — host env-var -// ABOUTME: names and dispatch subcommands AST-extracted from the binary, so a check's expectation has an independent source. -package hostneutrality - -import ( - "go/ast" - "go/parser" - "go/token" - "path/filepath" - "testing" -) - -// repoRoot is the project root (two levels up from this package's source dir). -func repoRoot() string { - return filepath.Join("..", "..") -} - -// hostEnvVar AST-extracts the host-derivation env-var name the binary reads from -// internal/dispatch/build.go (the `getenv("CODEX_THREAD_ID")` / "CLAUDECODE" -// selectors). It returns the name if the binary reads it, else "". This is the -// independent source for "the skill branches on the same env var the binary reads": -// if the binary stops reading the var, or the skill stops branching on it, the two -// diverge and a check binding to this reds. -func hostEnvVar(t *testing.T, name string) string { - t.Helper() - src := filepath.Join(repoRoot(), "internal", "dispatch", "build.go") - fset := token.NewFileSet() - f, err := parser.ParseFile(fset, src, nil, 0) - if err != nil { - t.Fatalf("parse build.go: %v", err) - } - found := "" - ast.Inspect(f, func(n ast.Node) bool { - lit, ok := n.(*ast.BasicLit) - if !ok || lit.Kind != token.STRING { - return true - } - if trimLit(lit.Value) == name { - found = name - return false - } - return true - }) - return found -} - -// dispatchSubcommands AST-extracts the dispatch subcommand names the binary routes -// from internal/dispatch/dispatch.go's Run switch — the independent source for the -// claude-helper / relocated-command checks (so a renamed subcommand shifts the set -// rather than the test self-matching a frozen literal). -func dispatchSubcommands(t *testing.T) map[string]bool { - t.Helper() - src := filepath.Join(repoRoot(), "internal", "dispatch", "dispatch.go") - fset := token.NewFileSet() - f, err := parser.ParseFile(fset, src, nil, 0) - if err != nil { - t.Fatalf("parse dispatch.go: %v", err) - } - subs := map[string]bool{} - ast.Inspect(f, func(n ast.Node) bool { - fn, ok := n.(*ast.FuncDecl) - if !ok || fn.Name.Name != "Run" { - return true - } - ast.Inspect(fn, func(m ast.Node) bool { - sw, ok := m.(*ast.SwitchStmt) - if !ok { - return true - } - tag, ok := sw.Tag.(*ast.IndexExpr) - if !ok { - return true - } - if id, ok := tag.X.(*ast.Ident); !ok || id.Name != "args" { - return true - } - for _, stmt := range sw.Body.List { - cc, ok := stmt.(*ast.CaseClause) - if !ok { - continue - } - for _, e := range cc.List { - if lit, ok := e.(*ast.BasicLit); ok && lit.Kind == token.STRING { - subs[trimLit(lit.Value)] = true - } - } - } - return false - }) - return false - }) - if len(subs) == 0 { - t.Fatal("extracted zero dispatch subcommands from dispatch.go") - } - return subs -} - -func trimLit(s string) string { - if len(s) >= 2 && (s[0] == '"' || s[0] == '`') { - return s[1 : len(s)-1] - } - return s -} diff --git a/internal/hostneutrality/codex_runtime_contract_test.go b/internal/hostneutrality/codex_runtime_contract_test.go deleted file mode 100644 index 716291e7..00000000 --- a/internal/hostneutrality/codex_runtime_contract_test.go +++ /dev/null @@ -1,90 +0,0 @@ -// ABOUTME: Codex runtime adapter contract tests — the skill surface loads -// ABOUTME: dedicated Codex FO/ensign adapters with mailbox wait semantics. -package hostneutrality - -import ( - "os" - "path/filepath" - "strings" - "testing" -) - -// TestCodexRuntimeAdaptersAreLoadable is a code-bound invariant: each runtime -// SKILL.md branches on the SAME host env var the binary reads -// (CODEX_THREAD_ID, AST-extracted from internal/dispatch/build.go) and loads its -// Codex adapter file. The env-var expectation comes from the binary's -// host-derivation code, not a literal frozen against the skill — if the binary -// stops reading CODEX_THREAD_ID, or the skill stops branching on it, the two -// diverge and this reds. The adapter-content tokens are the remaining -// text-consistency portion. -func TestCodexRuntimeAdaptersAreLoadable(t *testing.T) { - markCodeBoundInvariant(t, "hostEnvVar CODEX_THREAD_ID (internal/dispatch/build.go host-derivation)") - envVar := hostEnvVar(t, "CODEX_THREAD_ID") - if envVar == "" { - t.Fatal("the binary no longer reads CODEX_THREAD_ID for host derivation — the env var the skill must branch on is gone") - } - root := filepath.Join("..", "..") - cases := []struct { - name string - skillPath string - adapter string - }{ - { - name: "first-officer", - skillPath: filepath.Join(root, "skills", "first-officer", "SKILL.md"), - adapter: filepath.Join(root, "skills", "first-officer", "references", "codex-first-officer-runtime.md"), - }, - { - name: "ensign", - skillPath: filepath.Join(root, "skills", "ensign", "SKILL.md"), - adapter: filepath.Join(root, "skills", "ensign", "references", "codex-ensign-runtime.md"), - }, - } - - for _, tc := range cases { - t.Run(tc.name, func(t *testing.T) { - skill := readText(t, tc.skillPath) - adapterBase := filepath.Base(tc.adapter) - if !strings.Contains(skill, envVar) || !strings.Contains(skill, adapterBase) { - t.Fatalf("%s SKILL.md must branch on %s (the binary's host-derivation env var) and load %s:\n%s", tc.name, envVar, adapterBase, skill) - } - - body := readText(t, tc.adapter) - for _, want := range []string{"## Awaiting Completion", "## Dispatch", "send_input", "## Completion Signal", "Codex declares none"} { - if !strings.Contains(body, want) { - t.Fatalf("%s adapter missing %q:\n%s", tc.name, want, body) - } - } - }) - } -} - -// TestCodexAwaitingCompletionPinsMailboxSemantics is a non-AC text-consistency -// lint: the Codex FO adapter carries the mailbox-wait clauses (async final-status -// notification, wait_agent-timeout-is-normal, do-not-poll). Per the proof policy -// this presence check does NOT prove the FO obeys the mailbox semantics; the -// behavior is exercised by the Codex live runner's awaiting-completion path -// (codex_live_runner_test.go / codex_idle_notification_test.go). This lint guards -// the clauses being dropped from the adapter. -func TestCodexAwaitingCompletionPinsMailboxSemantics(t *testing.T) { - markNonAC(t, "Codex live runner awaiting-completion path (internal/ensigncycle codex_live_runner + codex_idle_notification)") - body := readText(t, filepath.Join("..", "..", "skills", "first-officer", "references", "codex-first-officer-runtime.md")) - for _, want := range []string{ - "async final-status notification in the FO mailbox", - "wait_agent timeout return is normal", - "do not poll, re-dispatch, or close the worker", - } { - if !strings.Contains(body, want) { - t.Fatalf("Codex FO adapter must pin waiting clause %q:\n%s", want, body) - } - } -} - -func readText(t *testing.T, path string) string { - t.Helper() - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - return string(data) -} diff --git a/internal/hostneutrality/ensign_dev_leakage_locks_test.go b/internal/hostneutrality/ensign_dev_leakage_locks_test.go deleted file mode 100644 index 039a03a8..00000000 --- a/internal/hostneutrality/ensign_dev_leakage_locks_test.go +++ /dev/null @@ -1,182 +0,0 @@ -// ABOUTME: Lock-in oracle that the universal ensign contract carries no dev-only -// ABOUTME: authoring discipline, and that the dev disciplines survive in their dev homes. -package hostneutrality - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// devLeakageLiterals are dev-workflow-specific phrases that must NOT appear in -// the universal shared cores. They fingerprint the test-first authoring bullet -// and the code-substrate worktree noun that leaked into the portable ensign -// contract. "failing test" is deliberately absent — it appears legitimately at -// first-officer-shared-core.md ("enforced by the binary or a failing test"), -// and a flat all-files table would false-red on that legit FO usage. -var devLeakageLiterals = []struct { - literal string - caseSensitive bool -}{ - // AC-1: the test-first authoring bullet's unique fingerprints. - {"feature or bugfix", false}, - {"the test is what the gate judges", false}, - // AC-3: the code-substrate worktree noun, banned as the exact uppercased - // form it appears in today. - {"CODE only", true}, -} - -// devLeakageCorePaths are the universal cores swept for dev-leakage. The -// runtime adapters are policed separately by the AC-4 field-enumeration check. -var devLeakageCorePaths = []string{ - filepath.Join("..", "..", "skills", "first-officer", "references", "first-officer-shared-core.md"), - filepath.Join("..", "..", "skills", "ensign", "references", "ensign-shared-core.md"), -} - -// TestNoDevLeakageInUniversalCore locks AC-1 and AC-3: the universal shared -// cores must not assert dev-only authoring discipline (test-first) nor name the -// code substrate in the worktree-isolation clause. A re-introduction of the -// banned literal fails the test (negative proof of lock-in). -func TestNoDevLeakageInUniversalCore(t *testing.T) { - markNonAC(t, "text-hygiene lint, NOT a behavioral claim — a property of the text (the universal core stays free of dev-discipline prose). No behavioral oracle: there is nothing for the FO/ensign to DO; the value is catching accidental dev-leakage back into the universal contract.") - for _, path := range devLeakageCorePaths { - t.Run(filepath.Base(path), func(t *testing.T) { - body, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - text := string(body) - lowerText := strings.ToLower(text) - - for _, banned := range devLeakageLiterals { - hit := false - if banned.caseSensitive { - hit = strings.Contains(text, banned.literal) - } else { - hit = strings.Contains(lowerText, strings.ToLower(banned.literal)) - } - // "feature or bugfix" and "the test is what the gate judges" - // uniquely fingerprint the ensign TDD bullet; "CODE only" is in - // both cores today. All three must be absent post-sweep. - if hit { - t.Errorf("%s contains banned dev-leakage literal %q — dev-only discipline re-homed into the universal core", path, banned.literal) - } - } - }) - } -} - -// TestWorktreeIsolationClauseSurvives locks AC-3's other half: removing the -// "CODE only" noun must NOT delete the worktree-isolation boundary — both cores -// must still carry an isolation clause naming the worktree. -func TestWorktreeIsolationClauseSurvives(t *testing.T) { - markNonAC(t, "text-hygiene lint, NOT a behavioral claim — a property of the text (an isolation clause survives in the cores). No behavioral oracle: the worktree-isolation BEHAVIOR is enforced by the dispatch worktree machinery, not this clause; the lint only guards the clause from being deleted when the substrate noun is neutralized.") - for _, path := range devLeakageCorePaths { - t.Run(filepath.Base(path), func(t *testing.T) { - body, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - lower := strings.ToLower(string(body)) - if !strings.Contains(lower, "worktree isolate") && !strings.Contains(lower, "isolate") { - t.Errorf("%s lost its worktree-isolation clause — neutralizing the substrate noun must not delete the boundary", path) - } - if !strings.Contains(lower, "worktree") { - t.Errorf("%s no longer mentions the worktree — isolation boundary gone", path) - } - }) - } -} - -// fieldEnumerationRe captures the runtime-adapter sentence that enumerates the -// always-present assignment fields, from "authoritative for all assignment -// fields:" to the end of the sentence. The AC-4 check scopes ONLY to this -// sentence so the legitimate conditional "if you were given a worktree path" -// at ensign-shared-core.md:17 stays untouched. -var fieldEnumerationRe = regexp.MustCompile(`(?i)authoritative for all assignment fields:[^.]*\.`) - -// runtimeAdapterFieldPaths are the two ensign runtime adapters whose -// field-enumeration sentence must use the neutral location vocabulary. -var runtimeAdapterFieldPaths = []string{ - filepath.Join("..", "..", "skills", "ensign", "references", "claude-ensign-runtime.md"), - filepath.Join("..", "..", "skills", "ensign", "references", "codex-ensign-runtime.md"), -} - -// TestRuntimeAdaptersUseNeutralLocationVocabulary locks AC-4: each runtime -// adapter's assignment-field enumeration lists "workflow location" and NOT -// "worktree path". Scoped to the field-enumeration sentence — a file-wide ban -// would false-fail on the legitimate conditional usage elsewhere. -func TestRuntimeAdaptersUseNeutralLocationVocabulary(t *testing.T) { - markNonAC(t, "text-hygiene lint, NOT a behavioral claim — a property of the text (the field-enumeration sentence uses neutral location vocabulary). No behavioral oracle and no independent code source: the vocabulary choice is prose hygiene; the lint guards against the banned 'worktree path' wording creeping back.") - for _, path := range runtimeAdapterFieldPaths { - t.Run(filepath.Base(path), func(t *testing.T) { - body, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - loc := fieldEnumerationRe.FindString(string(body)) - if loc == "" { - t.Fatalf("%s has no assignment-field enumeration sentence anchored on 'authoritative for all assignment fields:'", path) - } - lower := strings.ToLower(loc) - if strings.Contains(lower, "worktree path") { - t.Errorf("%s field enumeration still lists 'worktree path' — use the neutral 'workflow location'; sentence=%q", path, loc) - } - if !strings.Contains(lower, "workflow location") { - t.Errorf("%s field enumeration does not list 'workflow location'; sentence=%q", path, loc) - } - }) - } -} - -// devHomePresence maps each dev home to a required clause proving the lift -// RELOCATED rather than DELETED the dev discipline. Inverse polarity of the -// banned-literal tables: this errors on ABSENCE. -var devHomePresence = []struct { - path string - required string - sectionRe *regexp.Regexp // when non-nil, the required clause must sit inside this section -}{ - { - path: filepath.Join("..", "..", "skills", "commission", "references", "templates", "development.md"), - // A body phrase from the relocated test-first guidance, NOT the - // "### Test-first authoring" heading word — so the relocate-not-delete - // guarantee rests on the guidance body, not the heading. - required: "watch it fail for the right reason", - sectionRe: regexp.MustCompile(`(?is)## Recommended practices \(opt-in\).*`), - }, - { - path: filepath.Join("..", "..", "docs", "dev", "README.md"), - required: "real, checkable change", - }, -} - -// TestDevDisciplinesSurviveInDevHomes locks AC-2: the re-homed dev guidance is -// present in its dev-specific home — development.md carries the test-first -// guidance BODY (a body phrase, not just the heading word) inside its -// Recommended-practices section, and docs/dev/README.md retains the "real, -// checkable change" deliverable-proof policy. Fails if a future edit strips a -// dev home's guidance. -func TestDevDisciplinesSurviveInDevHomes(t *testing.T) { - markNonAC(t, "text-hygiene lint, NOT a behavioral claim — a property of the text (the re-homed dev guidance survives in its dev home). No behavioral oracle and no independent code source: it is a relocate-not-delete prose consistency check, valued for catching a dev home's guidance being stripped.") - for _, h := range devHomePresence { - t.Run(filepath.Base(h.path), func(t *testing.T) { - body, err := os.ReadFile(h.path) - if err != nil { - t.Fatalf("read %s: %v", h.path, err) - } - scope := string(body) - if h.sectionRe != nil { - scope = h.sectionRe.FindString(scope) - if scope == "" { - t.Fatalf("%s missing the section that must hold %q", h.path, h.required) - } - } - if !strings.Contains(strings.ToLower(scope), strings.ToLower(h.required)) { - t.Errorf("%s lost the re-homed dev clause %q — the lift must relocate, not delete", h.path, h.required) - } - }) - } -} diff --git a/internal/hostneutrality/live_scenario_practice_test.go b/internal/hostneutrality/live_scenario_practice_test.go deleted file mode 100644 index 3a1303cb..00000000 --- a/internal/hostneutrality/live_scenario_practice_test.go +++ /dev/null @@ -1,55 +0,0 @@ -// ABOUTME: AC-3 presence lock — the dev template carries live-scenario-for- -// ABOUTME: runtime-claims guidance as a third opt-in recommended practice. -package hostneutrality - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// recommendedPracticesSectionRe scopes the presence check to the dev template's -// "## Recommended practices (opt-in)" section, the same scoping the existing -// External-proof / Detached-audit / Test-first locks use, so a stray mention of -// "live scenario" elsewhere in the file cannot satisfy the lock. -var recommendedPracticesSectionRe = regexp.MustCompile(`(?is)## Recommended practices \(opt-in\).*`) - -// TestLiveScenarioRecommendedPracticePresent locks AC-3: the dev template -// (skills/commission/references/templates/development.md) names the live- -// scenario pattern for runtime claims as a THIRD opt-in recommended practice -// beside External-proof and Detached-audit. The required clauses encode the -// pattern's load-bearing distinction (a recording proves the WATCHER not the -// producer; a contract-text check proves the WORDS not the behaviour), so the -// lock rests on the guidance's substance, not merely a heading word. Same kind -// of presence check that guards the existing recommended-practice blocks; the -// claim is about the text itself, so proof at the claim's own level is legit. -func TestLiveScenarioRecommendedPracticePresent(t *testing.T) { - markNonAC(t, "n/a — the claim is about the dev-template text itself (the live-scenario practice is documented); proof at the claim's own level") - path := filepath.Join("..", "..", "skills", "commission", "references", "templates", "development.md") - body, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - section := recommendedPracticesSectionRe.FindString(string(body)) - if section == "" { - t.Fatalf("%s missing the Recommended-practices section", path) - } - lower := strings.ToLower(section) - - // A heading naming the live-scenario practice — the third opt-in block. - if !strings.Contains(lower, "live scenario") { - t.Errorf("%s recommended-practices section does not name the live-scenario practice", path) - } - // The load-bearing distinction the pattern exists to teach. - for _, want := range []string{ - "recording proves the watcher", - "text", - "durable", - } { - if !strings.Contains(lower, want) { - t.Errorf("%s live-scenario block missing required substance %q", path, want) - } - } -} diff --git a/internal/hostneutrality/nonac_marker_test.go b/internal/hostneutrality/nonac_marker_test.go deleted file mode 100644 index 40416722..00000000 --- a/internal/hostneutrality/nonac_marker_test.go +++ /dev/null @@ -1,407 +0,0 @@ -// ABOUTME: The non-AC text-consistency marker + the AC-3 sweep meta-test for the -// ABOUTME: hostneutrality suite — a presence/absence check over an instruction file proves nothing unless it self-classifies. -package hostneutrality - -import ( - "go/ast" - "go/parser" - "go/token" - "os" - "strings" - "testing" -) - -// markNonAC declares a test a non-AC text-consistency lint (the prose carries a -// required clause / stays free of a banned token), NOT a behavioral proof, naming -// the behavioral oracle (a live drive, a code-side test, or "n/a — the claim is -// about the text"). The proof policy (f8b257cf) bans a string match over an -// instruction file the model reads as proof of any behavioral acceptance -// criterion. The AC-3 sweep (TestNoUndeclaredHostneutralityTautology) keys on this -// call; it does nothing at runtime. -func markNonAC(t *testing.T, oracle string) { - t.Helper() - if oracle == "" { - t.Fatal("markNonAC requires a non-empty behavioral oracle reference") - } -} - -// markCodeBoundInvariant declares a test's expectation comes from an independent -// code-side source (a Go const, an env-var token the binary defines, a dispatch -// subcommand, a DIFFERENT file's n-gram) that can DIVERGE from the file under -// test — a legitimate invariant, not a tautology. The sweep treats it as declared. -func markCodeBoundInvariant(t *testing.T, source string) { - t.Helper() - if source == "" { - t.Fatal("markCodeBoundInvariant requires a non-empty independent-source reference") - } -} - -// instructionFileReaders are the named helpers that read a markdown instruction file -// the model ingests (a skill or contract) — the allowlist of recognized readers. A -// test that calls one is reading an ingested file (the READ alone triggers the -// must-declare rule; how it then inspects the bytes is irrelevant). Tests that scan -// CODE (host_neutrality_test.go's scanFile over .go files via parser.ParseFile, not a -// content read sink) are NOT in this set, so the sweep does not flag the legitimate -// go/parser code invariants. -var instructionFileReaders = map[string]bool{ - "readSkill": true, - "readText": true, - // The markdown-span parsers read an instruction file internally; a test that - // drives one is reading an instruction file even though the os.ReadFile lives - // one frame down. - "parseSpans": true, - "parseProseSpansForOverlap": true, -} - -// instructionPathIdents are the package-level path variables that resolve to a -// markdown instruction file — a declared allowlist of recognized path-carrying vars. -// A test that reads one of these via a read sink (os.ReadFile/os.Open/io.ReadAll/ -// bufio) is reading an ingested file — the read triggers the must-declare rule -// regardless of how the bytes are then inspected. (Code-scanning tests reference -// ../dispatch, ../status package dirs, never these.) A path var NOT in this list is -// not recognized; whether such a read should declare is an instance of the -// reader-axis bound the detached adversarial audit backstops (see the sweep doc). -var instructionPathIdents = map[string]bool{ - "foCorePath": true, - "ensignCorePath": true, - "commissionSkillPath": true, - "sharedCorePath": true, - "claudeRuntimePath": true, - "contractProseFiles": true, - "sharedCorePaths": true, - "runtimeAdapterPaths": true, - "devLeakageCorePaths": true, - "runtimeAdapterFieldPaths": true, - "devHomePresence": true, -} - -// TestNoUndeclaredHostneutralityTautology is the AC-3 sweep for this package, -// re-runnable offline. It parses every *_test.go and flags any test function that -// READS a recognized markdown INSTRUCTION file's content — via a named reader -// helper, a DIRECT os.ReadFile/os.Open/io read of a recognized instruction -// literal/segment/path-ident, or a WalkDir-collected `.md` — UNLESS it self-classifies -// via markNonAC or markCodeBoundInvariant. The go/parser code-scan invariants -// (host_neutrality_test.go's scanFile over .go source via parser.ParseFile, NOT a -// content read sink) and the spanHostQualified unit test are NOT flagged: they read no -// instruction file. The undeclared-offender count is the AC-3 metric; it must be zero. -// -// What the guard guarantees, and what it deliberately does NOT (two axes): -// -// - MATCH axis (closed, universal, load-bearing — STATICALLY GUARDED): the sweep -// keys on the READ, not on how the bytes are inspected. ONCE a read of a -// recognized instruction file is detected, the test MUST declare regardless of -// the inspection idiom (strings.Contains/Index/EqualFold, bytes.*, -// regexp.Regexp.Match, a bare ==) — the trigger is the ingest, not the match, so -// the whole match class is closed. -// -// - READER axis (does an undeclared instruction-file read hide via an undiscovered -// read SHAPE? — DETACHED-AUDIT-BACKSTOPPED, NOT statically guarded): a read is -// detected only when it is DIRECT — an in-package read sink whose path arg subtree -// carries a recognized instruction literal/segment or an instructionPathIdent -// package var, a named instructionFileReaders helper, or a WalkDir `.md` -// collector. A read reached through any other shape — param/local/field/method/ -// closure taint flow, a transitive helper chain, a `[]string`/range-element flow, -// a cross-package reader, a package var defined in another file, or an -// unrecognized surface (AGENTS.md, mods/*.md) — is NOT statically flagged. This is -// a deliberate, documented boundary: a per-package go/ast scan structurally cannot -// see a cross-package read or a path built in another file, so chasing reader-shape -// completeness was the enumeration trap the proof policy itself warns against. The -// reader axis is covered by the detached adversarial audit required at every -// high-stakes-surface gate (the validation-stage policy), NOT by this sweep. The -// prior cycles' M-A/B/C/D shapes AND the two declared range-var/cross-statement HN -// reads (TestDevDisciplinesSurviveInDevHomes, TestLiveScenarioRecommendedPractice- -// Present — both carry markNonAC) are this audited boundary, not silent gaps. -func TestNoUndeclaredHostneutralityTautology(t *testing.T) { - offenders := sweepHostneutralityTautologies(t, ".") - for _, o := range offenders { - t.Errorf("%s reads a markdown instruction file's content without self-classifying — call markNonAC (with its behavioral oracle) or markCodeBoundInvariant (with its independent source); how the bytes are inspected does not matter", o) - } - if len(offenders) > 0 { - t.Fatalf("AC-3 sweep: %d undeclared tautological-behavioral-proof test(s) in hostneutrality; the count must be zero", len(offenders)) - } -} - -func sweepHostneutralityTautologies(t *testing.T, dir string) []string { - t.Helper() - fset := token.NewFileSet() - entries, err := os.ReadDir(dir) - if err != nil { - t.Fatalf("read package dir %s: %v", dir, err) - } - var files []*ast.File - for _, ent := range entries { - name := ent.Name() - if ent.IsDir() || !strings.HasSuffix(name, "_test.go") { - continue - } - f, err := parser.ParseFile(fset, dir+"/"+name, nil, 0) - if err != nil { - t.Fatalf("parse %s: %v", name, err) - } - files = append(files, f) - } - - // First pass: the recognized reader set is the named instructionFileReaders - // allowlist plus any non-test helper that DIRECTLY reads a recognized instruction - // file (readsInstructionContent — a read sink whose path arg carries a recognized - // instruction literal/segment/path-ident, or a WalkDir-collected `.md`). The - // code-scan helper scanFile is NOT a reader: it uses parser.ParseFile (not a - // content read sink) over a `../dispatch` path (no instruction literal). - readers := map[string]bool{} - for r := range instructionFileReaders { - readers[r] = true - } - for _, f := range files { - for _, decl := range f.Decls { - fn, ok := decl.(*ast.FuncDecl) - if !ok || strings.HasPrefix(fn.Name.Name, "Test") { - continue - } - if readsInstructionContent(fn) { - readers[fn.Name.Name] = true - } - } - } - - // Second pass: a test is an offender if it ingests instruction-file content — - // directly (readsInstructionContent) or via a recognized reader helper — and does - // NOT declare its proof standing. The sweep keys on the READ, not a match-func - // allowlist. - var offenders []string - for _, f := range files { - for _, decl := range f.Decls { - fn, ok := decl.(*ast.FuncDecl) - if !ok || !strings.HasPrefix(fn.Name.Name, "Test") { - continue - } - calls := collectCalls(fn) - readsInstruction := readsInstructionContent(fn) - for r := range readers { - if calls[r] { - readsInstruction = true - break - } - } - declared := calls["markNonAC"] || calls["markCodeBoundInvariant"] - if readsInstruction && !declared { - offenders = append(offenders, fn.Name.Name) - } - } - } - return sortedUniqueHN(offenders) -} - -// collectCalls walks a function body and returns the set of called function names -// (bare and selector trailing name), used to detect reader-helper calls and the -// markNonAC / markCodeBoundInvariant declarations. -func collectCalls(fn *ast.FuncDecl) map[string]bool { - calls := map[string]bool{} - ast.Inspect(fn, func(n ast.Node) bool { - if call, ok := n.(*ast.CallExpr); ok { - switch f := call.Fun.(type) { - case *ast.Ident: - calls[f.Name] = true - case *ast.SelectorExpr: - calls[f.Sel.Name] = true - } - } - return true - }) - return calls -} - -// readSinks are the call selectors that ingest a file's content given a path. -var readSinks = map[string]bool{ - "ReadFile": true, // os.ReadFile - "Open": true, // os.Open - "ReadAll": true, // io.ReadAll - "NewScanner": true, // bufio.NewScanner - "NewReader": true, // bufio.NewReader -} - -// readsInstructionContent reports whether fn DIRECTLY ingests a recognized -// instruction file's content: a read sink (ReadFile/Open/ReadAll/bufio) whose path -// arg subtree carries a recognized instruction literal/segment (isInstructionPathLiteral) -// or an instructionPathIdent package var, or a WalkDir-collected instruction `.md`. -// This is the direct-read predicate — no taint-flow tracking. A read reached only -// through a param/local/field/method/closure flow, a transitive helper chain, or a -// range-element flow is NOT detected here; that reader-shape axis is the -// detached-audit-backstopped boundary documented on TestNoUndeclaredHostneutralityTautology. -func readsInstructionContent(fn *ast.FuncDecl) bool { - found := false - ast.Inspect(fn, func(n ast.Node) bool { - call, ok := n.(*ast.CallExpr) - if !ok { - return true - } - sel, ok := call.Fun.(*ast.SelectorExpr) - if !ok { - return true - } - if readSinks[sel.Sel.Name] { - for _, arg := range call.Args { - if exprInstructionTainted(arg) { - found = true - } - } - } - if (sel.Sel.Name == "WalkDir" || sel.Sel.Name == "Walk") && fnFiltersInstructionMarkdown(fn) { - found = true - } - return true - }) - return found -} - -// exprInstructionTainted reports whether expr DIRECTLY carries an instruction-file -// path anywhere in its subtree: an instruction-path literal/segment, or a known -// instructionPathIdent package var — so the +/strings.Join/filepath.Join/fmt.Sprintf/ -// string(...) path-build idioms (whose instruction operand is a subtree node) are -// covered when their operands are literals or recognized vars. -func exprInstructionTainted(expr ast.Expr) bool { - hit := false - ast.Inspect(expr, func(n ast.Node) bool { - switch x := n.(type) { - case *ast.BasicLit: - if x.Kind == token.STRING && isInstructionPathLiteral(strings.Trim(x.Value, "`\"")) { - hit = true - } - case *ast.Ident: - if instructionPathIdents[x.Name] { - hit = true - } - } - return true - }) - return hit -} - -// fnFiltersInstructionMarkdown reports whether fn's body filters paths by a `.md` -// suffix — the WalkDir-collector signal. -func fnFiltersInstructionMarkdown(fn *ast.FuncDecl) bool { - hit := false - ast.Inspect(fn, func(n ast.Node) bool { - if lit, ok := n.(*ast.BasicLit); ok && lit.Kind == token.STRING { - if strings.HasSuffix(strings.Trim(lit.Value, "`\""), ".md") { - hit = true - } - } - return true - }) - return hit -} - -// instructionPathSegments are the skill-tree / contract path segments that mark a -// path literal as an instruction file. The RECOGNIZED-instruction-surface predicate -// (a deliberate bound, not universal): a path carrying one of these listed segments -// is an instruction path, recognized even before a `.md` suffix is appended (so a -// Join/split-built suffix still matches on the base segment). -// -// A real instruction surface whose path carries NONE of these segments (e.g. -// AGENTS.md, mods/*.md) is not recognized and a read of it is not statically flagged. -// That reader-shape gap is the detached-audit-backstopped boundary documented on -// TestNoUndeclaredHostneutralityTautology — covered by the adversarial audit at the -// high-stakes gate, not by enumerating more segments here. -var instructionPathSegments = map[string]bool{ - "skills": true, - "references": true, - "agents": true, - "first-officer": true, - "ensign": true, - "commission": true, - "present-gate": true, - "SKILL.md": true, -} - -// isInstructionPathLiteral reports whether a string literal is (a fragment of) an -// instruction-file path: it carries a skill-tree/contract segment. A `.json` -// manifest path carries none and is not instruction. -func isInstructionPathLiteral(s string) bool { - if strings.HasSuffix(s, ".json") { - return false - } - for seg := range instructionPathSegments { - if s == seg || strings.Contains(s, seg) { - return true - } - } - return false -} - -func sortedUniqueHN(in []string) []string { - seen := map[string]bool{} - var out []string - for _, s := range in { - if !seen[s] { - seen[s] = true - out = append(out, s) - } - } - for i := 1; i < len(out); i++ { - for j := i; j > 0 && out[j-1] > out[j]; j-- { - out[j-1], out[j] = out[j], out[j-1] - } - } - return out -} - -func containsStrHN(in []string, want string) bool { - for _, s := range in { - if s == want { - return true - } - } - return false -} - -// TestHostneutralitySweepDetectsAnUndeclaredTautology is the mutation control for -// the sweep: it must RED on the shape it polices and GREEN once that shape -// self-classifies. Writes synthetic fixtures to a temp dir and runs the sweep. -func TestHostneutralitySweepDetectsAnUndeclaredTautology(t *testing.T) { - dir := t.TempDir() - undeclared := `package fixture -import "strings" -func TestUndeclaredHN(t *T) { - text := readSkill(t, foCorePath) - if strings.Contains(text, "x") { _ = text } -} -` - declared := `package fixture -func TestDeclaredHN(t *T) { - markNonAC(t, "live split-root-halt scenario") - text := readSkill(t, foCorePath) - if strings.Contains(text, "x") { _ = text } -} -` - codeScan := `package fixture -func TestCodeScanHN(t *T) { - leaks := scanFile(t, "../dispatch/x.go") - if strings.Contains(leaks[0].text, ".claude") { _ = leaks } -} -` - writeFixture(t, dir+"/undeclared_test.go", undeclared) - off := sweepHostneutralityTautologies(t, dir) - if !containsStrHN(off, "TestUndeclaredHN") { - t.Fatalf("sweep failed to flag an undeclared instruction-file presence check; offenders=%v", off) - } - - writeFixture(t, dir+"/declared_test.go", declared) - writeFixture(t, dir+"/codescan_test.go", codeScan) - off = sweepHostneutralityTautologies(t, dir) - if containsStrHN(off, "TestDeclaredHN") { - t.Fatalf("sweep wrongly flagged a declared lint; offenders=%v", off) - } - if containsStrHN(off, "TestCodeScanHN") { - t.Fatalf("sweep wrongly flagged a code-scanning invariant (reads no instruction file); offenders=%v", off) - } - if !containsStrHN(off, "TestUndeclaredHN") { - t.Fatalf("adding declared/codescan fixtures must not stop the sweep flagging the undeclared one; offenders=%v", off) - } -} - -func writeFixture(t *testing.T, path, content string) { - t.Helper() - if err := os.WriteFile(path, []byte(content), 0o644); err != nil { - t.Fatal(err) - } -} diff --git a/internal/hostneutrality/prose_inflator_locks_test.go b/internal/hostneutrality/prose_inflator_locks_test.go deleted file mode 100644 index bb4c42f0..00000000 --- a/internal/hostneutrality/prose_inflator_locks_test.go +++ /dev/null @@ -1,296 +0,0 @@ -// ABOUTME: AC-4 prose-inflator lock-in oracles for the FO + ensign contract files. -// ABOUTME: Catches audit-trail-style exposition and cross-file restatement. -package hostneutrality - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// contractProseFiles is the four-file surface the AC-4 oracles police. -var contractProseFiles = []string{ - filepath.Join("..", "..", "skills", "first-officer", "references", "first-officer-shared-core.md"), - filepath.Join("..", "..", "skills", "ensign", "references", "ensign-shared-core.md"), - filepath.Join("..", "..", "skills", "ensign", "references", "claude-ensign-runtime.md"), - filepath.Join("..", "..", "skills", "ensign", "references", "codex-ensign-runtime.md"), -} - -// sharedCorePaths are the universal cores. They must not restate one another -// against the per-runtime adapter cores. -var sharedCorePaths = []string{ - filepath.Join("..", "..", "skills", "first-officer", "references", "first-officer-shared-core.md"), - filepath.Join("..", "..", "skills", "ensign", "references", "ensign-shared-core.md"), -} - -// runtimeAdapterPaths are the per-host ensign adapter cores tested against the -// shared cores for restatement. -var runtimeAdapterPaths = []string{ - filepath.Join("..", "..", "skills", "ensign", "references", "claude-ensign-runtime.md"), - filepath.Join("..", "..", "skills", "ensign", "references", "codex-ensign-runtime.md"), -} - -// auditTrailLiterals are exact phrases banned across all four contract files. -// These are the audit-trail-exposition class — history-as-meta-comment that -// inflates without conveying contract content. -var auditTrailLiterals = []string{ - "audit-trail", - "the audit said", - "the auditor flagged", - "the audit returned", - "now we do X because", -} - -// auditTrailRegexes catch the parameterized variants of the audit-trail class. -// The cycle-N pattern is scoped to audit/sweep context — a bare "cycle 3" in -// the feedback-flow contract clause (a load-bearing workflow threshold) is -// not audit-trail exposition; "cycle-1 audit" or "the cycle 2 sweep" is. -var auditTrailRegexes = []*regexp.Regexp{ - // cycle-N variants in audit context: "cycle-N audit", "the cycle N sweep", - // "cycleN reconcile", etc. The "audit"/"sweep"/"reconcile" qualifier is what - // makes it audit-trail prose; bare "cycle 3" stays allowed for contract use. - regexp.MustCompile(`(?i)cycle[-\s]?\d+\b[^.\n]{0,40}(audit|sweep|reconcile)\b`), - regexp.MustCompile(`(?i)\b(audit|sweep|reconcile)\b[^.\n]{0,40}cycle[-\s]?\d+\b`), - // w-prefix WfRunID-shaped task IDs (8-10 chars), the leak vector for - // specific audit task IDs in prose. Requires a digit somewhere after - // the leading w to exclude English words like "with", "want". - regexp.MustCompile(`\bw[a-z0-9]*\d[a-z0-9]{6,9}\b`), -} - -// TestNoAuditTrailExposition locks AC-4: the four contract prose files MUST -// NOT carry audit-trail-style exposition — phrases like "audit-trail", -// "cycle-N", "the auditor flagged", WfRunID-shaped task IDs ("wobgtdlsp"), -// or "now we do X because" restatements. These inflate the load every -// ensign dispatch re-pays without conveying contract content. -// -// Scope: full files per captain (Path B authorization). A passing test means -// the swept HEAD is clean; a deliberately-inserted regression of any banned -// phrase fails the test (positive proof of lock-in). -func TestNoAuditTrailExposition(t *testing.T) { - markNonAC(t, "text-hygiene lint, NOT a behavioral claim — a property of the text (the contract files stay free of audit-trail exposition). No behavioral oracle and no independent code source: it is prose-inflation hygiene, valued for catching history-as-meta-comment re-inflating the dispatch load.") - for _, path := range contractProseFiles { - t.Run(filepath.Base(path), func(t *testing.T) { - body, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - text := string(body) - lowerText := strings.ToLower(text) - - for _, banned := range auditTrailLiterals { - if strings.Contains(lowerText, strings.ToLower(banned)) { - t.Errorf("%s contains banned audit-trail literal %q — re-inflation of swept prose", path, banned) - } - } - for _, re := range auditTrailRegexes { - if loc := re.FindStringIndex(text); loc != nil { - match := text[loc[0]:loc[1]] - t.Errorf("%s contains banned audit-trail pattern %q (matched %q) — re-inflation of swept prose", - path, re.String(), match) - } - } - }) - } -} - -// TestNoCrossFileRestatement locks AC-4: a 12-word n-gram from a runtime -// adapter core (claude-ensign-runtime.md, codex-ensign-runtime.md) MUST NOT -// reappear verbatim in either shared core. Cross-file restatement inflates -// the dispatch-load (the FO/ensign reads BOTH shared core and a runtime -// adapter; restated content is double-paid). -// -// Exceptions: -// - heading-marked spans (e.g. `### Backstop (Claude)`, `## Sequencing rule`) -// - fenced code blocks (``` ... ```) -// - markdown tables (lines starting with `|`) -// - host-qualified contrast spans (per spanHostQualified — naming both Codex -// and Claude in the same span) -// -// Threshold: 12 contiguous words. -func TestNoCrossFileRestatement(t *testing.T) { - markCodeBoundInvariant(t, "12-word n-grams sourced from a DIFFERENT file (the runtime adapter cores) than the shared cores under test") - // Build the n-gram set from the runtime adapter cores, excluding - // exception spans. - adapterNGrams := map[string]string{} - for _, path := range runtimeAdapterPaths { - spans := parseProseSpansForOverlap(t, path) - for _, sp := range spans { - words := tokenizeForOverlap(sp.text) - for i := 0; i+12 <= len(words); i++ { - gram := strings.Join(words[i:i+12], " ") - if _, exists := adapterNGrams[gram]; !exists { - adapterNGrams[gram] = filepath.Base(path) - } - } - } - } - - // Walk the shared cores looking for any of those n-grams appearing in a - // non-exception span. - for _, path := range sharedCorePaths { - t.Run(filepath.Base(path), func(t *testing.T) { - spans := parseProseSpansForOverlap(t, path) - for _, sp := range spans { - words := tokenizeForOverlap(sp.text) - for i := 0; i+12 <= len(words); i++ { - gram := strings.Join(words[i:i+12], " ") - if source, hit := adapterNGrams[gram]; hit { - t.Errorf("%s span starting line %d restates a 12-word n-gram from %s: %q", - path, sp.startLine, source, gram) - } - } - } - }) - } -} - -// proseSpan is one span with location and exception-class info. -type proseSpan struct { - text string - startLine int - excludeOverlap bool -} - -// parseProseSpansForOverlap walks a markdown file and returns spans suitable for -// the n-gram restatement oracle. Spans inside fenced code blocks, markdown -// tables, host-qualified contrast spans, or under sections marked with -// known-headed exceptions are flagged with excludeOverlap=true, and the -// function filters them out — only the includable spans are returned. -func parseProseSpansForOverlap(t *testing.T, path string) []proseSpan { - t.Helper() - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - lines := strings.Split(string(data), "\n") - - var spans []proseSpan - var cur []string - curStart := 0 - inFence := false - headingMarksException := false - - flush := func() { - if len(cur) == 0 { - return - } - text := strings.Join(cur, "\n") - // Skip table-only spans (all non-blank lines start with `|`). - if isMarkdownTable(text) { - cur = nil - return - } - // Skip host-qualified contrast spans. - if spanHostQualified(text) { - cur = nil - return - } - // Skip spans under headings flagged as exception spans. - if headingMarksException { - cur = nil - return - } - spans = append(spans, proseSpan{text: text, startLine: curStart}) - cur = nil - } - - for i, line := range lines { - lineNo := i + 1 - trimmed := strings.TrimSpace(line) - - // Headings flip section-exception state but are not spans themselves. - if strings.HasPrefix(trimmed, "#") { - flush() - headingMarksException = isExceptionHeading(trimmed) - continue - } - - if strings.HasPrefix(trimmed, "```") { - // Fenced code blocks are exception spans — flush prose first, skip - // fence body until close, do not include it. - flush() - inFence = !inFence - continue - } - if inFence { - continue - } - - if trimmed == "" { - flush() - continue - } - - if len(cur) == 0 { - curStart = lineNo - } - cur = append(cur, line) - } - flush() - return spans -} - -// isMarkdownTable reports whether a span's non-blank lines are all table rows -// (start with `|`) — table rows often share `| Header | Header |` cells across -// files without being prose restatement. -func isMarkdownTable(text string) bool { - lines := strings.Split(text, "\n") - for _, l := range lines { - t := strings.TrimSpace(l) - if t == "" { - continue - } - if !strings.HasPrefix(t, "|") { - return false - } - } - return true -} - -// isExceptionHeading reports whether a heading marks a section whose contents -// are exempt from the cross-file restatement oracle. These are the named -// host-contrast sections where each runtime adapter is expected to mirror -// shared-core terminology by design. -func isExceptionHeading(heading string) bool { - exceptions := []string{ - "Backstop (Claude)", - "Backstop (Codex)", - "Standing teammates", - "Sequencing rule", - } - for _, ex := range exceptions { - if strings.Contains(heading, ex) { - return true - } - } - return false -} - -// tokenizeForOverlap splits a span into lowercase word tokens, stripping -// markdown formatting characters (`*`, `_`, backticks) and trailing -// punctuation. Code identifiers stay intact (`status --set`, `pull --rebase`). -func tokenizeForOverlap(text string) []string { - // Replace markdown emphasis and inline-code backticks with spaces, keep - // the content. Hyphens and dots inside identifiers stay as one token. - cleaned := text - for _, ch := range []string{"*", "_", "`", "(", ")", "[", "]"} { - cleaned = strings.ReplaceAll(cleaned, ch, " ") - } - // Strip trailing punctuation tokens like `.`, `,`, `;`, `:` by inserting - // spaces around them. - for _, ch := range []string{",", ";", ":", "—"} { - cleaned = strings.ReplaceAll(cleaned, ch, " ") - } - fields := strings.Fields(cleaned) - out := make([]string, 0, len(fields)) - for _, f := range fields { - f = strings.TrimRight(f, ".!?") - if f == "" { - continue - } - out = append(out, strings.ToLower(f)) - } - return out -} diff --git a/internal/hostneutrality/prose_neutrality_test.go b/internal/hostneutrality/prose_neutrality_test.go deleted file mode 100644 index 843f3a23..00000000 --- a/internal/hostneutrality/prose_neutrality_test.go +++ /dev/null @@ -1,302 +0,0 @@ -// ABOUTME: AC-3 prose-side host-neutrality oracle — a markdown-structure parse of -// ABOUTME: first-officer-shared-core.md flagging unqualified Claude-only helper tokens. -package hostneutrality - -import ( - "bufio" - "os" - "path/filepath" - "strconv" - "strings" - "testing" -) - -// sharedCorePath is the generic FO operating-contract core the prose oracle -// polices. Relative to this test's package dir (go test cwd = package source dir). -var sharedCorePath = filepath.Join("..", "..", "skills", "first-officer", "references", - "first-officer-shared-core.md") - -// claudeRuntimePath is the Claude adapter the relocated commands move into. The -// oracle does NOT police it — Claude-only tokens are expected there. -var claudeRuntimePath = filepath.Join("..", "..", "skills", "first-officer", "references", - "claude-first-officer-runtime.md") - -// claudeHelperProseTokens are the Claude-runtime helper names that have NO code -// subcommand source — ~/.claude-reading helper functions and the named drift -// classes of the reconcile helper. They are authored prose tokens; the -// code-sourced subcommands (context-budget, *-standing, reconcile) are added at -// test time from the dispatch router via claudeHelperTokens(t), so the banned set -// has an independent code source that shifts when the router changes. -var claudeHelperProseTokens = []string{ - "claude-team", - "member_exists", - "lookup_model", - // Class A/B/C/D/E are the named drift classes of the reconcile helper. - "Class A", - "Class B", - "Class C", - "Class D", - "Class E", -} - -// claudeHelperTokens combines the code-sourced Claude-coupled dispatch subcommands -// (derived from the router, qualified `spacedock dispatch reconcile` for the -// roster-reading helper) with the authored prose tokens. The subcommand half binds -// to dispatch.go, so the banned set diverges from the generic core when a -// subcommand is renamed in code — that divergence is what makes the unqualified- -// helper check a real invariant rather than a self-match. -func claudeHelperTokens(t *testing.T) []string { - t.Helper() - subs := dispatchSubcommands(t) - tokens := append([]string{}, claudeHelperProseTokens...) - for _, sub := range []string{"context-budget", "spawn-standing", "list-standing", "show-standing"} { - if !subs[sub] { - t.Fatalf("dispatch router no longer exposes Claude-coupled helper %q", sub) - } - tokens = append(tokens, sub) - } - if !subs["reconcile"] { - t.Fatal("dispatch router no longer exposes the reconcile subcommand") - } - tokens = append(tokens, "spacedock dispatch reconcile") - return tokens -} - -// A span counts as host-qualified only when it names BOTH runtimes by their -// capitalized product names — the `X on Codex, Y on Claude` contrast shape, where -// the Claude token is a qualified realization presented alongside the Codex -// alternative, not an unqualified generic step. Requiring the contrast (both -// names), rather than a lone "Codex"/"codex" anywhere, keeps a span that merely -// mentions a host in passing (e.g. a bare `codex exec`) from falsely qualifying an -// unqualified Claude-only helper. The names are matched capitalized so the -// lowercase `claude-` inside a helper token (claude-team) does not self-satisfy -// the Claude half of the contrast. -const ( - codexMarker = "Codex" - claudeMarker = "Claude" -) - -// TestSharedCoreHasNoUnqualifiedClaudeHelpers parses the generic core's markdown -// structure into spans (numbered/bulleted list items and paragraphs) and fails if -// any Claude-only helper token sits in a span lacking a host qualifier. This is a -// structural parse, NOT a substring count: re-introducing the line-142 unqualified -// `claude-team context-budget` reuse step makes it fail; the line-207 -// `send_input on Codex, SendMessage on Claude teams` span passes (host-qualified). -func TestSharedCoreHasNoUnqualifiedClaudeHelpers(t *testing.T) { - markCodeBoundInvariant(t, "dispatchSubcommands (dispatch.go) supplies the Claude-coupled helper subcommand tokens") - spans := parseSpans(t, sharedCorePath) - helperTokens := claudeHelperTokens(t) - var violations []string - for _, sp := range spans { - for _, tok := range helperTokens { - if !strings.Contains(sp.text, tok) { - continue - } - if spanHostQualified(sp.text) { - continue - } - violations = append(violations, - strings.TrimSpace( - sp.text[:min(len(sp.text), 120)])+ - " [unqualified token: "+tok+", first line "+strconv.Itoa(sp.startLine)+"]") - } - } - if len(violations) > 0 { - for _, v := range violations { - t.Errorf("unqualified Claude-only helper in generic core: %s", v) - } - t.Fatalf("found %d unqualified Claude-runtime helper reference(s) in %s; relocate them to %s", - len(violations), sharedCorePath, claudeRuntimePath) - } -} - -// TestClaudeAdapterOwnsRelocatedCommands is a code-bound invariant confirming the -// relocation landed: the Claude adapter names each relocated dispatch-helper -// subcommand. The required set is DERIVED from the dispatch router's actual -// subcommands (dispatchSubcommands over dispatch.go), not literals frozen against -// the adapter — so a subcommand renamed in the router shifts the expectation and -// reds here if the adapter still names the old one (or drops the new one). The -// commands did not vanish, they moved; this is the presence half of the -// relocation invariant. -func TestClaudeAdapterOwnsRelocatedCommands(t *testing.T) { - markCodeBoundInvariant(t, "dispatchSubcommands (internal/dispatch/dispatch.go router)") - data, err := os.ReadFile(claudeRuntimePath) - if err != nil { - t.Fatalf("read %s: %v", claudeRuntimePath, err) - } - body := string(data) - subs := dispatchSubcommands(t) - // The relocated Claude-coupled helpers — each must be a real router subcommand - // (so the anchor cannot name a command the binary does not route) and must - // appear in the adapter. - relocated := []string{"context-budget", "list-standing", "spawn-standing", "show-standing", "reconcile"} - for _, sub := range relocated { - if !subs[sub] { - t.Fatalf("the dispatch router no longer exposes %q — the relocated-command set diverged from the binary", sub) - } - want := sub - if sub == "reconcile" { - want = "spacedock dispatch reconcile" // the qualified form the adapter documents - } - if !strings.Contains(body, want) { - t.Errorf("Claude adapter %s does not name the relocated command %q", claudeRuntimePath, want) - } - } -} - -// TestSpanHostQualifiedRequiresContrast pins AC-4: a span counts as host-qualified -// only when it presents the Codex/Claude CONTRAST — naming both a Codex token and a -// Claude token — not when it merely mentions "codex" in passing. The bare-word -// markers wrongly qualified any span containing "Codex"/"codex"; the de-brittled -// oracle qualifies only the contrast. Each case names a Claude helper token so the -// qualification decision is what gates a violation. -func TestSpanHostQualifiedRequiresContrast(t *testing.T) { - cases := []struct { - name string - text string - qualified bool - }{ - { - name: "real-contrast", - text: "The concrete routing call is the runtime adapter's (`send_input` on Codex, `SendMessage` on Claude teams).", - qualified: true, - }, - { - name: "declares-contrast", - text: "The probe command is the adapter's (Codex declares none; Claude supplies one — see the context-budget section).", - qualified: true, - }, - { - name: "bare-codex-mention-no-claude", - text: "Run the context-budget probe before reuse (the codex exec flow does the same).", - qualified: false, - }, - { - name: "no-host-tokens", - text: "Before evaluating reuse conditions, consult the context-budget probe.", - qualified: false, - }, - } - for _, c := range cases { - t.Run(c.name, func(t *testing.T) { - if got := spanHostQualified(c.text); got != c.qualified { - t.Errorf("spanHostQualified(%q) = %v, want %v", c.text, got, c.qualified) - } - }) - } -} - -// span is one markdown block: a list item or a paragraph, with its first line -// number for diagnostics. A list item spans its bullet line plus indented -// continuation lines; a paragraph spans consecutive non-blank, non-heading lines. -type span struct { - text string - startLine int -} - -// parseSpans reads a markdown file and groups its lines into spans. Headings -// (lines starting with #) are span boundaries and are not themselves spans — -// section ledes that follow a heading are their own paragraph span. Fenced code -// blocks (```) are kept whole as one span so a token inside a code example is -// attributed to the enclosing block, not split. -func parseSpans(t *testing.T, path string) []span { - t.Helper() - f, err := os.Open(path) - if err != nil { - t.Fatalf("open %s: %v", path, err) - } - defer f.Close() - - var spans []span - var cur []string - curStart := 0 - inFence := false - lineNo := 0 - - flush := func() { - if len(cur) > 0 { - spans = append(spans, span{text: strings.Join(cur, "\n"), startLine: curStart}) - cur = nil - } - } - - sc := bufio.NewScanner(f) - sc.Buffer(make([]byte, 0, 1024*1024), 1024*1024) - for sc.Scan() { - lineNo++ - line := sc.Text() - trimmed := strings.TrimSpace(line) - - if strings.HasPrefix(trimmed, "```") { - // Toggle fence; keep the fence and its body in the current span so a - // token inside an example stays attributed to its block. - if len(cur) == 0 { - curStart = lineNo - } - cur = append(cur, line) - inFence = !inFence - continue - } - if inFence { - cur = append(cur, line) - continue - } - - if trimmed == "" { - flush() - continue - } - if strings.HasPrefix(trimmed, "#") { - // Heading is a boundary; do not include it in any span. - flush() - continue - } - - isListItem := isListItemStart(line) - isIndentCont := len(line) > 0 && (line[0] == ' ' || line[0] == '\t') - - if isListItem && len(cur) > 0 && !isIndentCont { - // A new top-level list item starts a new span unless it is an indented - // continuation of the prior one. - flush() - } - if len(cur) == 0 { - curStart = lineNo - } - cur = append(cur, line) - } - flush() - if err := sc.Err(); err != nil { - t.Fatalf("scan %s: %v", path, err) - } - return spans -} - -// isListItemStart reports whether line begins a markdown list item: an optional -// indent then `- `, `* `, or `N.` / `N) ` (ordered). Indented items are still -// list-item starts; the caller distinguishes continuation by leading whitespace. -func isListItemStart(line string) bool { - t := strings.TrimLeft(line, " \t") - if strings.HasPrefix(t, "- ") || strings.HasPrefix(t, "* ") { - return true - } - // Ordered: leading digits then `.` or `)` then space (or end). - i := 0 - for i < len(t) && t[i] >= '0' && t[i] <= '9' { - i++ - } - if i == 0 || i >= len(t) { - return false - } - return (t[i] == '.' || t[i] == ')') -} - -// spanHostQualified reports whether a span is explicitly host-qualified — it names -// BOTH runtimes, the `X on Codex, Y on Claude` contrast shape. Such a span may -// reference a Claude helper because it is presenting the Claude realization -// alongside the Codex alternative, not stating an unqualified generic step. A span -// that names only one runtime (a passing mention) is not the contrast and does not -// qualify. -func spanHostQualified(text string) bool { - return strings.Contains(text, codexMarker) && strings.Contains(text, claudeMarker) -} diff --git a/internal/hostneutrality/split_root_sync_contract_test.go b/internal/hostneutrality/split_root_sync_contract_test.go deleted file mode 100644 index be3e0d1f..00000000 --- a/internal/hostneutrality/split_root_sync_contract_test.go +++ /dev/null @@ -1,132 +0,0 @@ -// ABOUTME: B5/B6 contract skill-text tests — assert the FO halt-gate + push/pull -// ABOUTME: sync + rebase-conflict halt prose is present at the FO and ensign homes. -package hostneutrality - -import ( - "os" - "path/filepath" - "strings" - "testing" -) - -// foCorePath is the FO shared-core contract. -var foCorePath = filepath.Join("..", "..", "skills", "first-officer", "references", - "first-officer-shared-core.md") - -// ensignCorePath is the ensign shared-core contract. -var ensignCorePath = filepath.Join("..", "..", "skills", "ensign", "references", - "ensign-shared-core.md") - -// commissionSkillPath is the commission SKILL.md. -var commissionSkillPath = filepath.Join("..", "..", "skills", "commission", "SKILL.md") - -// readSkill reads a skill file, failing the test on error. -func readSkill(t *testing.T, path string) string { - t.Helper() - b, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - return string(b) -} - -// assertAll fails with a per-token diagnostic for each token missing from text. -func assertAll(t *testing.T, name, text string, tokens []string) { - t.Helper() - for _, tok := range tokens { - if !strings.Contains(text, tok) { - t.Errorf("%s missing required contract token: %q", name, tok) - } - } -} - -// TestFOHaltGateProse is a non-AC text-consistency lint: it asserts the FO core -// carries the boot halt-gate prose keyed on the boot fields (split-root && -// entity_dir_present false → HALT, point at `spacedock state init`). Per the proof -// policy this presence check does NOT prove the FO halts; an inverted clause keeps -// every token. The MECHANISM is proven by command-level tests — the binary EMITS -// the halt signal (internal/status TestBootJSONStateBackendEntityDirAbsent observes -// `entity_dir_present: false` + `state_backend: split-root`) and the `spacedock -// state init` recovery WORKS (internal/cli TestStateInitResumesFreshClone) — but -// the OWED behavioral proof that the FO actually HALTs on that signal (rather than -// running state init silently and proceeding) is a live drive, tracked as task -// ev3e (fo-halt-sync-journey-live-drives). This lint guards the prose tokens until -// ev3e lands the live oracle. -func TestFOHaltGateProse(t *testing.T) { - markNonAC(t, "OWED live drive: task ev3e (fo-halt-sync-journey-live-drives). Mechanism today: internal/status TestBootJSONStateBackendEntityDirAbsent (binary emits the halt signal) + internal/cli TestStateInitResumesFreshClone (recovery works)") - text := readSkill(t, foCorePath) - assertAll(t, "FO core (B5 halt-gate)", text, []string{ - "state_backend", - "entity_dir_present", - "split-root", - "spacedock state init", - "HALT", - }) -} - -// TestFOSyncProse is a non-AC text-consistency lint: it asserts the FO core -// carries the B6 sync prose (pull --rebase, push origin, the M-3 rebase-conflict -// halt: abort + no force-push + no auto-resolve). Per the proof policy this -// presence check does NOT prove the FO performs the sync. The git MECHANICS are -// already oracle-covered by real two-writer e2e — internal/cli state_sync_test.go -// (TestTwoWriterSyncHappyPath: push → non-ff rejection → pull --rebase → re-push; -// TestTwoWriterSameEntityConflictHalts: CONFLICT → rebase --abort, no force-push) -// and internal/dispatch build_statecommit_test.go. The remaining behavioral -// proof — that the FO actually ISSUES this sync at the contract points — rides -// task ev3e's halt drive (fo-halt-sync-journey-live-drives), where ev3e's ideation -// folded the sync/journey residual into the halt scenario. This lint guards the -// prose tokens. -func TestFOSyncProse(t *testing.T) { - markNonAC(t, "behavioral-issuance rides task ev3e's halt drive (fo-halt-sync-journey-live-drives). Sync MECHANICS already oracle-covered: internal/cli state_sync_test.go (TestTwoWriterSyncHappyPath + TestTwoWriterSameEntityConflictHalts) + internal/dispatch build_statecommit_test.go (TestStateCommitGuidanceResolvesPaths)") - text := readSkill(t, foCorePath) - assertAll(t, "FO core (B6 sync)", text, []string{ - "pull --rebase", - "push origin", - "rebase --abort", - "--force", - "auto-resolve", - "must NOT", - }) -} - -// TestEnsignSyncProse is a non-AC text-consistency lint: it asserts the ensign -// core carries the B6 sync prose (push origin, pull --rebase, the M-3 -// rebase-conflict halt) alongside the path-scoped rule. Same disposition as the FO -// half: the git MECHANICS are oracle-covered by the real two-writer e2e in -// internal/cli + build_statecommit_test.go; the remaining behavioral proof that -// the ensign ISSUES this sync after its state commits rides task ev3e's halt drive -// (fo-halt-sync-journey-live-drives). This lint guards the tokens. -func TestEnsignSyncProse(t *testing.T) { - markNonAC(t, "behavioral-issuance rides task ev3e's halt drive (fo-halt-sync-journey-live-drives). Sync MECHANICS already oracle-covered: internal/cli state_sync_test.go (TestTwoWriterSyncHappyPath + TestTwoWriterSameEntityConflictHalts) + internal/dispatch build_statecommit_test.go (TestStateCommitGuidanceResolvesPaths)") - text := readSkill(t, ensignCorePath) - assertAll(t, "ensign core (B6 sync)", text, []string{ - "push origin", - "pull --rebase", - "rebase --abort", - "--force", - "auto-resolve", - }) -} - -// TestCommissionJourneyProse is a non-AC text-consistency lint: it asserts the -// commission SKILL.md carries the journey-1 orphan-branch mechanics (clear -// inherited tree, linked worktree, state-init pointer) and the journey-2 $inline -// prose. Per the proof policy this presence check does NOT prove the mechanics -// work. The orphan-birth/resume MECHANICS are oracle-covered by command-level -// tests — internal/cli state_new_test.go (TestStateNewBirthsSplitRoot) + -// state_init_test.go (TestCommissionOrphanBranchScaffolding: orphan branch with a -// cleared tree as a linked worktree; TestStateInitInlineNoOp: the $inline branch). -// The remaining behavioral proof that the commission FLOW drives these journeys -// rides task ev3e's halt drive (fo-halt-sync-journey-live-drives). This lint guards -// the prose tokens. -func TestCommissionJourneyProse(t *testing.T) { - markNonAC(t, "behavioral-issuance rides task ev3e's halt drive (fo-halt-sync-journey-live-drives). Journey MECHANICS already oracle-covered: internal/cli state_init_test.go (TestStateInitResumesFreshClone + TestCommissionOrphanBranchScaffolding + TestStateInitInlineNoOp) + state_new_test.go (TestStateNewBirthsSplitRoot)") - text := readSkill(t, commissionSkillPath) - assertAll(t, "commission SKILL.md (journeys)", text, []string{ - "checkout --orphan", - "clear", - "linked worktree", - "spacedock state init", - "$inline", - }) -} diff --git a/internal/status/external_proof.go b/internal/status/external_proof.go index 089a8277..8007265e 100644 --- a/internal/status/external_proof.go +++ b/internal/status/external_proof.go @@ -36,9 +36,8 @@ var acHeaderRe = regexp.MustCompile(`^\*\*AC-`) // clause taken to block end. var proofMarkerRe = regexp.MustCompile(`(?i)(verified by|oracle:|proof:|end state[:.])`) -// quotedSpanRe matches every double-quoted span (`"..."`) and backtick-fenced -// span (`` `...` ``) so a quoted example of the antipattern is excluded from -// the self-phrase match. +// quotedSpanRe matches every double-quoted span and every backtick-fenced span +// so a quoted example of the antipattern is excluded from the self-phrase match. var quotedSpanRe = regexp.MustCompile("\"[^\"]*\"|`[^`]*`") // selfPhraseRes are the case-insensitive self-reference phrases the diff --git a/internal/status/no_yaml_silent_drop_test.go b/internal/status/no_yaml_silent_drop_test.go index cfaad475..8180531d 100644 --- a/internal/status/no_yaml_silent_drop_test.go +++ b/internal/status/no_yaml_silent_drop_test.go @@ -13,7 +13,7 @@ import ( // TestNoSilentYAMLValueDrop walks every index.md fixture under testdata/, // extracts each top-level frontmatter line's raw post-`:` substring, parses the // file through ParseFrontmatter, and asserts that no key whose raw value is -// non-empty (and not an explicit empty quoted scalar `""` / `''`) decodes to +// non-empty (and not an explicit empty quoted scalar) decodes to // the empty string. The canonical failure is `pr: #N` — yaml.v3 treats `#N` as // a comment and drops the value silently. The writer-quoting policy in // mutate.go (needsExplicitQuoting + setScalarValue) prevents this on every diff --git a/skills/integration/codex_idle_notification_test.go b/skills/integration/codex_idle_notification_test.go index 3c77269f..7e87735f 100644 --- a/skills/integration/codex_idle_notification_test.go +++ b/skills/integration/codex_idle_notification_test.go @@ -1,4 +1,4 @@ -// ABOUTME: Contract tests for Codex no-wait completion evidence and runtime wording. +// ABOUTME: Contract tests for Codex no-wait completion evidence. // ABOUTME: Keeps queued notifications distinct from autonomous idle wake-up. package integration @@ -6,14 +6,10 @@ import ( "encoding/json" "os" "path/filepath" - "regexp" - "strings" "testing" "time" ) -const codexIdleNotificationProbePrompt = "You are a no-write Codex idle-wake probe. Do not read or write files, do not run tools, and do not modify state. Sleep for 30 seconds, then reply exactly: Done: idle-wake Codex notification probe completed." - var codexIdleNotificationClassifications = map[string]bool{ "foreground_wait": true, "queued_flush": true, @@ -21,113 +17,6 @@ var codexIdleNotificationClassifications = map[string]bool{ "no_notification_observed": true, } -// TestCodexIdleNotificationRuntimeContract is a non-AC text-consistency lint: it -// asserts the Codex runtime adapter's `## Awaiting Completion` section carries the -// three outcome headings + scheduling-priority clauses and stays free of -// blanket-foreground-wait wording. Per the proof policy this presence check does -// NOT prove the FO observes the idle-notification semantics; the behavior is -// proven by the captured idle-wake evidence (TestCodexIdleNotificationEvidenceSchema -// validates real recorded probe runs) and the Codex live runner's -// awaiting-completion path. This lint guards the adapter clauses. -func TestCodexIdleNotificationRuntimeContract(t *testing.T) { - markNonAC(t, "TestCodexIdleNotificationEvidenceSchema (captured idle-wake evidence) + Codex live runner awaiting-completion path") - root := skillsRoot(t) - path := filepath.Join(root, "first-officer", "references", "codex-first-officer-runtime.md") - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - region := sectionAfter(string(data), "## Awaiting Completion") - if region == "" { - t.Fatal("Codex runtime missing `## Awaiting Completion` section") - } - - for _, required := range []string{ - "### Foreground wait", - "### Queued notification flushed by later activity", - "### Autonomous idle FO wake-up", - } { - if !strings.Contains(region, required) { - t.Errorf("Awaiting Completion missing outcome heading %q", required) - } - } - - scheduling := paragraphContaining(region, "Before calling `wait_agent`") - if scheduling == "" { - t.Fatal("Awaiting Completion missing scheduling priority paragraph before `wait_agent`") - } - scheduling = squashWhitespace(scheduling) - for _, required := range []string{ - "ready final-status notifications", - "gate decisions", - "state transitions", - "newly dispatchable work", - "no other dispatchable or gate-processing work is available", - "unresolved worker completion is the next useful idle action", - } { - if !strings.Contains(scheduling, required) { - t.Errorf("wait_agent scheduling paragraph missing %q", required) - } - } - - lower := strings.ToLower(squashWhitespace(region)) - for _, forbidden := range []string{ - "must call `wait_agent` after every dispatch", - "must call wait_agent after every dispatch", - "always call `wait_agent`", - "always call wait_agent", - "foreground-wait after every dispatch", - "wait after every dispatch", - } { - if strings.Contains(lower, forbidden) { - t.Errorf("Awaiting Completion contains blanket foreground-wait wording %q", forbidden) - } - } -} - -func TestCodexIdleNotificationRecipeShape(t *testing.T) { - path := filepath.Join(repoRoot(t), "docs", "dev", "codex-idle-notification-probe.md") - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read %s: %v", path, err) - } - content := string(data) - contentFlat := squashWhitespace(content) - - for _, heading := range []string{ - "## Foreground wait comparison", - "## No-wait idle probe", - "## Queued-notification flush check", - } { - if sectionAfter(content, heading) == "" { - t.Errorf("recipe missing section %q", heading) - } - } - if !strings.Contains(content, codexIdleNotificationProbePrompt) { - t.Errorf("recipe missing exact no-write worker prompt %q", codexIdleNotificationProbePrompt) - } - if idleWindowSeconds(content) < 90 { - t.Errorf("recipe idle window is less than 90 seconds") - } - for _, required := range []string{ - "avoid captain messages", - "shell-outs", - "terminal jobs", - "tool calls", - "retry the same handle", - "A queued notification flushed by later activity is `queued_flush`, not `autonomous_idle_wake`.", - } { - if !strings.Contains(contentFlat, required) { - t.Errorf("recipe missing %q", required) - } - } - for classification := range codexIdleNotificationClassifications { - if !strings.Contains(content, "`"+classification+"`") { - t.Errorf("recipe missing interpretation classification %q", classification) - } - } -} - func TestCodexIdleNotificationEvidenceSchema(t *testing.T) { dir := filepath.Join(repoRoot(t), "docs", "dev", "_evidence", "codex-idle-notification-probe") paths, err := filepath.Glob(filepath.Join(dir, "*.json")) @@ -184,37 +73,6 @@ func TestCodexIdleNotificationEvidenceSchema(t *testing.T) { } } -func paragraphContaining(text, needle string) string { - for _, paragraph := range strings.Split(text, "\n\n") { - if strings.Contains(paragraph, needle) { - return paragraph - } - } - return "" -} - -func idleWindowSeconds(text string) int { - re := regexp.MustCompile(`(?i)(?:minimum idle window|idle window)[^0-9\n]*(\d+)\s+seconds`) - best := 0 - for _, match := range re.FindAllStringSubmatch(text, -1) { - if len(match) != 2 { - continue - } - var n int - for _, r := range match[1] { - n = n*10 + int(r-'0') - } - if n > best { - best = n - } - } - return best -} - -func squashWhitespace(text string) string { - return strings.Join(strings.Fields(text), " ") -} - func requireString(t *testing.T, path string, record map[string]any, field string) string { t.Helper() value, ok := record[field] diff --git a/skills/integration/contract_gate_test.go b/skills/integration/contract_gate_test.go deleted file mode 100644 index 396e3543..00000000 --- a/skills/integration/contract_gate_test.go +++ /dev/null @@ -1,244 +0,0 @@ -// ABOUTME: AC-2 bracketing test over the vendored FO Startup contract — the -// ABOUTME: embedded contract-range literal must bracket the binary's CONTRACT_VERSION. -package integration - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" - - "github.com/spacedock-dev/spacedock/internal/contract" -) - -// foSharedCore reads the vendored first-officer shared core contract text. -func foSharedCore(t *testing.T) string { - t.Helper() - p := filepath.Join(skillsRoot(t), "first-officer", "references", "first-officer-shared-core.md") - b, err := os.ReadFile(p) - if err != nil { - t.Fatalf("read FO shared core: %v", err) - } - return string(b) -} - -// embeddedRangeRe matches the half-open contract range literal embedded in the -// Startup step-0 prose (e.g. `>=1,<2`). -var embeddedRangeRe = regexp.MustCompile(`>=\s*\d+\s*,\s*<\s*\d+`) - -// TestStartupEmbeddedRangeBracketsContractVersion locks the embedded-range -// bracketing invariant: the range literal embedded in the FO Startup prose -// brackets CONTRACT_VERSION. Both surfaces live in spacedock-v1, so a single go -// test closes the 4th-source-of-truth drift (the FO contract embeds its own -// expected range as a literal). -// -// Oracle: contract.ParseRange (the half-open-range parser) + the compiled -// contract.CONTRACT_VERSION constant. This is NOT bare prose-grep — it does not -// assert "the prose says X"; it parses the embedded literal and checks it -// brackets the binary's real contract version, catching FO/binary range drift. -// The contract-gate ORDERING behavior (gate runs before discover/boot) is owned -// behaviorally by internal/contract/gate_test.go, which drives a real spacedock -// stub --version and observes discover invoked 0×/1×. -func TestStartupEmbeddedRangeBracketsContractVersion(t *testing.T) { - markCodeBoundInvariant(t, "contract.CONTRACT_VERSION (the binary's contract version) — the embedded range must bracket the independent code value") - startup := sectionAfter(foSharedCore(t), "## Startup") - raw := embeddedRangeRe.FindString(startup) - if raw == "" { - t.Fatalf("Startup section has no embedded contract range literal (>=N,= 0 { - return rest[:j] - } - return rest -} - -// TestStartupAbortSplitsByBinaryPresence locks the binary-absent FO-startup -// hardening (entity binary-absent-fo-bootstrap): -// -// - AC-1 — the binary-absent / non-executable abort class carries both -// runnable install lines verbatim (released brew lane + source go-build lane) -// and routes to NO `spacedock doctor` within that class, since `doctor` is the -// same missing binary. -// - AC-2 — `spacedock doctor` survives, attached to the binary-present / -// contract-out-of-range class (Class B), so the fix is a per-class split and -// not a blanket deletion of the doctor route. -// -// Oracle: the abort-clause text of Startup step 1, isolated via startupStep1 and -// split by the two class markers. This is the legitimate doc-as-deliverable case -// recorded in the entity's AC-1 framing note: the binary is absent by definition -// of the failure mode, so no `spacedock` command can run to emit the install -// hint — the contract prose the FO loads IS the only artifact present at failure. -// The check is therefore a presence test (both install lines) plus a -// banned-token absence test (no doctor route inside Class A) over the real file, -// which is proof at the claim's own level, not a behavioral claim a code gate -// could enforce here. -func TestStartupAbortSplitsByBinaryPresence(t *testing.T) { - markNonAC(t, "n/a — doc-as-deliverable: the binary is absent by definition of this failure mode, so the contract prose IS the only artifact present; proof at the claim's own level") - step1 := startupStep1(t) - - const ( - classAMarker = "**Binary absent or non-executable**" - classBMarker = "**Binary present but contract out of range**" - brewLine = "brew install spacedock-dev/homebrew-tap/spacedock" - goBuildLine = "go build -o spacedock ./cmd/spacedock" - doctor = "spacedock doctor" - ) - - aStart := strings.Index(step1, classAMarker) - if aStart < 0 { - t.Fatalf("step 1 has no binary-absent class marker %q", classAMarker) - } - bStart := strings.Index(step1, classBMarker) - if bStart < 0 { - t.Fatalf("step 1 has no binary-present-out-of-range class marker %q", classBMarker) - } - if !(aStart < bStart) { - t.Fatalf("class markers out of order: Class A at %d, Class B at %d (Class A must precede Class B)", aStart, bStart) - } - - // Class A spans from its marker up to where Class B begins. - classA := step1[aStart:bStart] - // Class B spans from its marker to the end of step 1. - classB := step1[bStart:] - - // AC-1: both install lines present verbatim, inside the binary-absent class. - for _, line := range []string{brewLine, goBuildLine} { - if !strings.Contains(classA, line) { - t.Errorf("binary-absent class is missing runnable install line %q", line) - } - } - // AC-1: Class A must carry the no-doctor PROHIBITION and must not route to - // doctor. The prohibition string itself names `spacedock doctor` once; any - // further mention would be a route. Require the prohibition is present (so - // deleting the guidance fails, not skips) AND that the lone doctor mention is - // exactly that prohibition (so an added route fails). `doctor` is the same - // missing binary, so it can never be a live remedy in Class A. - const doctorProhibition = "Do NOT run `spacedock doctor`" - if !strings.Contains(classA, doctorProhibition) { - t.Errorf("binary-absent class is missing the no-doctor prohibition %q — Class A must carry the guidance that `doctor` is the same missing binary", doctorProhibition) - } - if n := strings.Count(classA, doctor); n != 1 { - t.Errorf("binary-absent class mentions %q %d time(s); want exactly 1 (the prohibition only) — any extra mention is a route, and `doctor` must not be a route in Class A", doctor, n) - } - - // AC-2: doctor survives as a LIVE ROUTE in the binary-present class. A bare - // mention is too weak — a disclaimer ("Historically we suggested spacedock - // doctor but no longer.") would satisfy it. Assert the active routing phrasing - // so a gutted route phrased as a disclaimer fails: the class must instruct to - // `run `spacedock doctor`` for `the per-class remedy`. - const ( - doctorRoute = "run `spacedock doctor`" - perClassVerb = "for the per-class remedy" - ) - if !strings.Contains(classB, doctorRoute) || !strings.Contains(classB, perClassVerb) { - t.Errorf("binary-present-out-of-range class no longer carries the live doctor route (%q + %q) — a blanket deletion or a disclaimer-only mention would hit this", doctorRoute, perClassVerb) - } -} - -// TestStartupGateGuidanceHasSingleProseSource locks AC-3: the startup-gate abort -// guidance lives in exactly ONE prose file (first-officer-shared-core.md), and -// agents/first-officer.md continues to delegate to the shared core rather than -// restating the gate. A second prose copy would let the two surfaces drift, so a -// single-file edit would silently fail to harden the mirror. -// -// Oracle: a walk of the skills/ and agents/ trees for the gate markers over -// markdown prose files only. The integration .go test files legitimately mention -// `spacedock doctor` in a comment (marketplace_manifest_test.go) and assert it -// here (this file); those are Go sources, not a prose mirror of the gate, so the -// walk is scoped to .md files. This is NOT a bare prose-grep AC: it does not -// assert "the prose says X"; it asserts a structural single-source invariant — -// that the gate guidance is NOT duplicated across prose files — which a code -// change can violate and this test would catch. -func TestStartupGateGuidanceHasSingleProseSource(t *testing.T) { - markNonAC(t, "n/a — structural single-source/dedup invariant over the .md surface; the contract-version gate behavior is proven by TestStartupEmbeddedRangeBracketsContractVersion + TestStartupAbortSplitsByBinaryPresence") - root := repoRoot(t) - // Markers unique to the startup-gate abort prose. A second .md file carrying - // any of these would be a drift-prone mirror of the single source of truth. - markers := []string{ - "Contract version gate", - "per-class remedy", - "spacedock doctor", - } - - var proseSources []string - for _, tree := range []string{"skills", "agents"} { - base := filepath.Join(root, tree) - err := filepath.Walk(base, func(p string, info os.FileInfo, err error) error { - if err != nil { - return err - } - if info.IsDir() || filepath.Ext(p) != ".md" { - return nil - } - b, err := os.ReadFile(p) - if err != nil { - return err - } - text := string(b) - for _, m := range markers { - if strings.Contains(text, m) { - rel, _ := filepath.Rel(root, p) - proseSources = append(proseSources, rel) - return nil - } - } - return nil - }) - if err != nil { - t.Fatalf("walk %s: %v", tree, err) - } - } - - const sole = "skills/first-officer/references/first-officer-shared-core.md" - if len(proseSources) != 1 || proseSources[0] != sole { - t.Errorf("startup-gate prose has sources %v, want exactly [%s] (single source of truth)", proseSources, sole) - } - - // agents/first-officer.md must still delegate, not mirror the gate prose. - agentPath := filepath.Join(root, "agents", "first-officer.md") - b, err := os.ReadFile(agentPath) - if err != nil { - t.Fatalf("read %s: %v", agentPath, err) - } - agent := string(b) - const delegation = "begin the Startup procedure from the shared core" - if !strings.Contains(agent, delegation) { - t.Errorf("agents/first-officer.md missing delegation line %q", delegation) - } - for _, m := range []string{"per-class remedy", "spacedock doctor"} { - if strings.Contains(agent, m) { - t.Errorf("agents/first-officer.md now mirrors gate prose marker %q (must delegate, not restate)", m) - } - } -} diff --git a/skills/integration/contract_status_path_test.go b/skills/integration/contract_status_path_test.go deleted file mode 100644 index a162670f..00000000 --- a/skills/integration/contract_status_path_test.go +++ /dev/null @@ -1,59 +0,0 @@ -// ABOUTME: AC-5a absence invariant — the vendored FO/ensign contracts carry -// ABOUTME: zero plugin-private status-path refs (no real seam can prove an absence). -package integration - -import ( - "os" - "path/filepath" - "strings" - "testing" -) - -// pluginPrivateStatusRefs are the plugin-private status invocation forms the -// vendored skill surface must never reference once it calls `spacedock status`. -var pluginPrivateStatusRefs = []string{ - "skills/commission/bin/status", - "{spacedock_plugin_dir}", - "commission/bin/status", -} - -// TestNoPluginPrivateStatusPathInContracts locks the AC-5a absence invariant: -// NEITHER the FO nor the ensign contract references any plugin-private status -// path. The positive "the contract calls `spacedock status`" claim is owned -// behaviorally by the launcher smoke seam (launcher_smoke_test.go drives the -// real status binary for list/set/archive) and internal/status/*; that -// behavioral coverage makes a bare prose-grep over the FO text redundant. -// -// Oracle: the absence invariant over the vendored on-disk skill surface — the -// vendored skill tree must NOT re-introduce a retired plugin-private status -// path (skills/commission/bin/status, {spacedock_plugin_dir}, commission/bin/ -// status). No positive behavioral seam can prove an absence: a re-introduced -// plugin path would silently break the `spacedock status` launcher contract, -// and only this structural scan over the contract bytes catches it. This is NOT -// bare prose-grep — it asserts a structural negative the system depends on. -func TestNoPluginPrivateStatusPathInContracts(t *testing.T) { - markNonAC(t, "behavioral coverage: the launcher smoke seam (TestLauncherListSetArchive drives the real `spacedock status` binary for list/set/archive) + internal/status/* prove the positive `spacedock status` path; this is the structural-absence complement no positive seam can prove") - root := skillsRoot(t) - fo := readSkill(t, root, "first-officer/references/first-officer-shared-core.md") - ensign := readSkill(t, root, "ensign/references/ensign-shared-core.md") - - for name, content := range map[string]string{ - "first-officer-shared-core.md": fo, - "ensign-shared-core.md": ensign, - } { - for _, ref := range pluginPrivateStatusRefs { - if strings.Contains(content, ref) { - t.Errorf("%s references plugin-private status path %q", name, ref) - } - } - } -} - -func readSkill(t *testing.T, root, rel string) string { - t.Helper() - b, err := os.ReadFile(filepath.Join(root, rel)) - if err != nil { - t.Fatalf("read vendored skill %s: %v", rel, err) - } - return string(b) -} diff --git a/skills/integration/feedback_rejection_flow_test.go b/skills/integration/feedback_rejection_flow_test.go deleted file mode 100644 index 293fd9ab..00000000 --- a/skills/integration/feedback_rejection_flow_test.go +++ /dev/null @@ -1,247 +0,0 @@ -// ABOUTME: AC-1/AC-2 oracles for the feedback-rejection-flow extraction — the -// ABOUTME: 7-step rejection procedure lives in the skill, not the FO core, with a Skill() seam; always-on machinery stays. -package integration - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// feedbackProcedureFingerprints uniquely identifies the moved feedback-rejection -// procedure. Each literal is unique-1 in the pre-change first-officer-shared-core.md -// (ground-truthed 2026-06-04). AC-1(a) asserts presence in the skill; AC-1(b) -// asserts absence from the FO core. The four cover the load-bearing steps — the -// 3-cycle escalation, re-run-reviewer, re-enter-gate, and the FO ownership of -// `### Feedback Cycles`. -var feedbackProcedureFingerprints = map[string]string{ - "cycle-3-escalation": "On cycle 3, escalate to the human instead of another round", - "re-run-reviewer": "Re-run the reviewer after fixes", - "re-enter-gate-flow": "Re-enter the normal gate flow with the updated result", - "fo-owns-feedback-cycle": "The FO owns `### Feedback Cycles`", -} - -// feedbackFaithfulnessFingerprints are the AC-2 clauses whose loss would silently -// mis-route a rejection: the Codex `send_input` non-completion caveat and the -// `feedback-to` target-read clause. Asserted present in the skill body. -var feedbackFaithfulnessFingerprints = map[string]string{ - "send-input-non-completion": "do not treat the immediate `send_input` response as the new completion result", - "feedback-to-target-read": "the stage that receives the fix request, not the reviewer", -} - -// feedbackRejectionFlowSkill reads the new skill body under test. -func feedbackRejectionFlowSkill(t *testing.T) string { - t.Helper() - p := filepath.Join(skillsRoot(t), "feedback-rejection-flow", "SKILL.md") - b, err := os.ReadFile(p) - if err != nil { - t.Fatalf("read feedback-rejection-flow SKILL.md: %v", err) - } - return string(b) -} - -// TestFeedbackProcedurePresentInSkill is a non-AC text-consistency lint: the -// moved procedure fingerprints are present in -// skills/feedback-rejection-flow/SKILL.md — the prose MOVED here. Per the proof -// policy this presence check does NOT prove the FO follows the procedure; an -// inverted skill body keeps every fingerprint. The behavior — the FO observes a -// REJECTED report and routes the concrete finding back through implementation — is -// proven by the live rejection-flow scenario (runClaudeRejectionFlowScenario / -// runCodexRejectionFlowScenario, asserted by assertRejectionFlow) and its offline -// mutation control TestRejectionFlowNegativeMissingRoute. -func TestFeedbackProcedurePresentInSkill(t *testing.T) { - markNonAC(t, "live rejection-flow scenario (assertRejectionFlow) + TestRejectionFlowNegativeMissingRoute") - skill := feedbackRejectionFlowSkill(t) - for name, fp := range feedbackProcedureFingerprints { - if !strings.Contains(skill, fp) { - t.Errorf("feedback-rejection-flow SKILL.md missing %s fingerprint %q", name, fp) - } - } -} - -// TestFeedbackFaithfulnessClausesPresentInSkill is a non-AC text-consistency -// lint: the two mis-route-on-loss clauses — the Codex `send_input` non-completion -// caveat and the `feedback-to` target-read clause — are present in the skill body. -// This is text authoring, not behavioral proof; the behavior that the FO routes -// the fix to the feedback-to target (not the reviewer) is proven by the live -// rejection-flow scenario, which asserts the entity returns to status: -// implementation with the fix applied. -func TestFeedbackFaithfulnessClausesPresentInSkill(t *testing.T) { - markNonAC(t, "live rejection-flow scenario (assertRejectionFlow) + TestRejectionFlowNegativeMissingRoute") - skill := feedbackRejectionFlowSkill(t) - for name, fp := range feedbackFaithfulnessFingerprints { - if !strings.Contains(skill, fp) { - t.Errorf("feedback-rejection-flow SKILL.md missing %s faithfulness clause %q", name, fp) - } - } -} - -// TestFeedbackProcedureAbsentFromFOCore is a non-AC text-consistency lint (dedup): -// the moved procedure fingerprints (and the faithfulness clauses, which moved with -// the body) are NO LONGER present in first-officer-shared-core.md — moved, not -// duplicated. Whole-file (NOT region-scoped): region-scoping an absence check -// would false-pass content that moved elsewhere. This is a structural dedup -// property, not a behavioral claim; the FO's rejection behavior is proven by the -// live rejection-flow scenario. Re-inlining the procedure re-introduces a -// fingerprint and flips this RED. -func TestFeedbackProcedureAbsentFromFOCore(t *testing.T) { - markNonAC(t, "dedup lint; behavior via live rejection-flow scenario (assertRejectionFlow)") - fo := foCore(t) - for name, fp := range feedbackProcedureFingerprints { - if strings.Contains(fo, fp) { - t.Errorf("first-officer-shared-core.md still inlines %s fingerprint %q (moved, not duplicated)", name, fp) - } - } - for name, fp := range feedbackFaithfulnessFingerprints { - if strings.Contains(fo, fp) { - t.Errorf("first-officer-shared-core.md still inlines %s faithfulness clause %q (moved, not duplicated)", name, fp) - } - } -} - -// feedbackAtInclude matches an `@`-prefixed path token resolving toward the -// feedback-rejection-flow skill — `@feedback-rejection-flow`, -// `@../feedback-rejection-flow`, `@./feedback-rejection-flow/SKILL.md`, etc. The -// leading `@` plus any run of relative-path segments (`./`, `../`, bare) ending -// in a `feedback-rejection-flow` path component is the disproven cross-skill -// include the seam must NOT use. Structural (not a two-literal enum), and scanned -// whole-file: a stale `@feedback-rejection-flow` re-introduced in ANY core -// section — not just the detection region — is the disproven mechanism. A bare -// `@\S` whole-file ban would false-fire on a legitimate unrelated -// `@references/...` include the core may later carry; this name-targeted form -// only fires on the feedback-rejection-flow include family. -var feedbackAtInclude = regexp.MustCompile(`@(?:\.{1,2}/)*feedback-rejection-flow\b`) - -// TestFOCoreInvokesFeedbackRejectionSkill is a non-AC text-consistency lint: it -// asserts the FO core carries the Skill(skill="spacedock:feedback-rejection-flow") -// invocation literal at the rejection-detection point and no disproven -// cross-skill @-include. Per the proof policy this presence check does NOT prove -// the FO invokes the skill on a rejection: an inverted clause ("NEVER invoke -// feedback-rejection-flow; just wait") keeps the Skill(...) substring and passes -// (verified in ideation). The behavior — the FO routes a REJECTED finding back -// through implementation — is proven only by the live rejection-flow scenario -// (assertRejectionFlow). This lint guards the seam STRING and bans the @-include -// mechanism; it is the text half, not the behavioral proof. -func TestFOCoreInvokesFeedbackRejectionSkill(t *testing.T) { - markNonAC(t, "live rejection-flow scenario (assertRejectionFlow) + TestRejectionFlowNegativeMissingRoute") - fo := foCore(t) - region := sectionAfter(fo, "## Completion and Gates") - if region == "" { - t.Fatal("first-officer-shared-core.md has no `## Completion and Gates` section") - } - if !strings.Contains(region, `Skill(skill="spacedock:feedback-rejection-flow")`) { - t.Errorf("`## Completion and Gates` section does not invoke Skill(skill=\"spacedock:feedback-rejection-flow\")") - } - if m := feedbackAtInclude.FindString(fo); m != "" { - t.Errorf("first-officer-shared-core.md uses a disproven cross-skill @-include toward feedback-rejection-flow (token %q)", m) - } -} - -// TestAlwaysOnMachineryRetainedInFOCore is a non-AC text-consistency lint (the -// retention sibling of the dedup checks): the referenced always-on machinery did -// NOT move with the procedure — the FO Write Scope `### Feedback Cycles` bullet -// and the reuse-conditions block stay in the FO core. This is a structural -// retention property, not a behavioral claim; that the FO actually tracks feedback -// cycles is proven by the live rejection-flow scenario. Deleting either anchor -// reds this. -func TestAlwaysOnMachineryRetainedInFOCore(t *testing.T) { - markNonAC(t, "retention lint; behavior via live rejection-flow scenario (assertRejectionFlow)") - fo := foCore(t) - for name, anchor := range map[string]string{ - "feedback-cycles-write-scope": "**`### Feedback Cycles` section**", - "reuse-conditions": "Reuse conditions", - } { - if !strings.Contains(fo, anchor) { - t.Errorf("first-officer-shared-core.md no longer contains always-on anchor %s %q (must stay — the procedure references it by name)", name, anchor) - } - } -} - -// TestClaudeBareModeSeamStaysConsistent is a non-AC text-consistency lint: the -// Claude adapter's `## Feedback Rejection Flow (bare mode)` seam stays present -// with its sequential-dispatch and keep-reviewer-alive sentences — the seam is a -// Claude-runtime execution mode, not moved into the runtime-neutral skill. This -// is text authoring, not behavioral proof; the live rejection-flow scenario -// (Claude runner) exercises the bare-mode path for real. Removing the seam reds -// this. -func TestClaudeBareModeSeamStaysConsistent(t *testing.T) { - markNonAC(t, "live rejection-flow scenario, Claude runner (assertRejectionFlow)") - claude := vendoredSkillFiles(t)["first-officer/references/claude-first-officer-runtime.md"] - for name, fp := range map[string]string{ - "bare-mode-heading": "## Feedback Rejection Flow (bare mode)", - "sequential-dispatch": "the feedback rejection flow is sequential", - "keep-reviewer-alive": "Keep the reviewer alive", - } { - if !strings.Contains(claude, fp) { - t.Errorf("claude-first-officer-runtime.md missing bare-mode seam %s fingerprint %q (the adapter seam must stay consistent)", name, fp) - } - } -} - -// feedbackRejectionFrontmatterValue returns the trimmed scalar value of a -// top-level `key:` line in the feedback-rejection-flow SKILL.md frontmatter, with -// surrounding quotes stripped. The bool reports whether the key was found. -func feedbackRejectionFrontmatterValue(t *testing.T, key string) (string, bool) { - t.Helper() - fm, ok := frontmatter(feedbackRejectionFlowSkill(t)) - if !ok { - t.Fatal("feedback-rejection-flow SKILL.md has no YAML frontmatter block") - } - prefix := key + ":" - for _, line := range strings.Split(fm, "\n") { - if strings.HasPrefix(line, prefix) { - v := strings.TrimSpace(strings.TrimPrefix(line, prefix)) - v = strings.Trim(v, `"'`) - return v, true - } - } - return "", false -} - -// feedbackRejectionSeamLiteral is the seam name the FO contract invokes; the -// re-bound checks read the expected value from the contract, not from the skill -// file under test. -const feedbackRejectionSeamLiteral = "feedback-rejection-flow" - -// TestFeedbackRejectionSkillNameMatchesSeam is a code-bound invariant: the skill's -// frontmatter `name:` equals the seam name the FO CONTRACT invokes -// (Skill(skill="spacedock:NAME") in first-officer-shared-core.md). The expected -// value comes from the contract, not the skill file under test — renaming either -// side makes the two diverge and reds this, catching a renamed skill the FO -// invocation no longer reaches. -func TestFeedbackRejectionSkillNameMatchesSeam(t *testing.T) { - markCodeBoundInvariant(t, "FO contract Skill(skill=\"spacedock:feedback-rejection-flow\") invocation (first-officer-shared-core.md)") - name, ok := feedbackRejectionFrontmatterValue(t, "name") - if !ok { - t.Fatal("feedback-rejection-flow SKILL.md frontmatter has no name field") - } - seam := invokedSeamName(foCore(t), feedbackRejectionSeamLiteral) - if seam == "" { - t.Fatalf("FO contract does not invoke Skill(skill=\"spacedock:%s\") — the seam the skill name must match is gone", feedbackRejectionSeamLiteral) - } - if name != seam { - t.Errorf("feedback-rejection-flow SKILL.md frontmatter name is %q, but the FO contract invokes the seam %q — a renamed skill the FO invocation no longer reaches", name, seam) - } -} - -// TestFeedbackRejectionSkillIsFOInternal is a code-bound invariant binding the -// skill's `user-invocable` frontmatter to its ROLE: a skill the FO invokes mid-run -// via Skill(skill="spacedock:NAME") is FO-internal and MUST be -// `user-invocable: false`. The expected value is REQUIRED by the contract invoking -// the seam (an independent source), not a free literal; flipping the frontmatter -// to `true` while the FO still invokes the seam reds this. -func TestFeedbackRejectionSkillIsFOInternal(t *testing.T) { - markCodeBoundInvariant(t, "FO contract Skill(skill=\"spacedock:feedback-rejection-flow\") invocation implies FO-internal") - if invokedSeamName(foCore(t), feedbackRejectionSeamLiteral) == "" { - t.Fatalf("FO contract does not invoke the feedback-rejection-flow seam — the FO-internal premise no longer holds") - } - v, ok := feedbackRejectionFrontmatterValue(t, "user-invocable") - if !ok { - t.Fatal("feedback-rejection-flow SKILL.md frontmatter has no user-invocable field") - } - if v != "false" { - t.Errorf("feedback-rejection-flow SKILL.md frontmatter user-invocable is %q, but the FO contract invokes it as a mid-run seam — an FO-internal skill must be user-invocable: false", v) - } -} diff --git a/skills/integration/nonac_marker_test.go b/skills/integration/nonac_marker_test.go deleted file mode 100644 index 14da3eff..00000000 --- a/skills/integration/nonac_marker_test.go +++ /dev/null @@ -1,430 +0,0 @@ -// ABOUTME: The non-AC text-consistency marker + the AC-3 sweep meta-test — a -// ABOUTME: presence/absence check over an LLM-ingested instruction file is proof only if it declares its behavioral oracle. -package integration - -import ( - "go/ast" - "go/parser" - "go/token" - "os" - "strings" - "testing" -) - -// markNonAC is the explicit demotion seam required by the proof policy (f8b257cf): -// a string/substring/regex match over an instruction file the model reads NEVER -// satisfies a behavioral acceptance criterion. A test that matches such a file is -// legitimate ONLY as a text-consistency sanity check (the prose moved, a clause is -// present, a token is absent) — never as the proof of behavior. Calling -// markNonAC(t, oracle) declares that: -// - this test is a non-AC text-consistency lint, NOT a behavioral proof, and -// - `oracle` names where the behavior it touches is ACTUALLY proven (a live -// drive, a code-side invariant, or "n/a — pure text property" when the claim -// is itself about the text and has no behavior to drive). -// -// The AC-3 sweep meta-test (TestNoUndeclaredTautologicalProof) keys on this call: -// any test in this package that matches an ingested instruction file but does NOT -// call markNonAC is flagged as an undeclared tautology standing in for a -// behavioral claim. The call itself does nothing at runtime — its value is the -// declaration the sweep reads from the source. -func markNonAC(t *testing.T, oracle string) { - t.Helper() - if oracle == "" { - t.Fatal("markNonAC requires a non-empty behavioral oracle reference") - } -} - -// markCodeBoundInvariant is the second of the two explicit classifications the -// proof policy's litmus demands ("does the expected value come from a source OTHER -// than the file under test?"). A test calls markCodeBoundInvariant(t, source) to -// declare that its expectation is NOT a literal hardcoded against the file under -// test but is read from `source` — an independent code-side value (a shared Go -// const, the seam target the Skill() invocation uses, a manifest the binary -// parses) that can DIVERGE from the file. That divergence is exactly what makes -// the check able to fail as an invariant, so it is a legitimate AC-2 invariant, -// not a tautology. The AC-3 sweep treats a markCodeBoundInvariant test as -// declared (not an offender), the same as a markNonAC text-consistency lint — -// every text-matching test must self-classify as one or the other. The call does -// nothing at runtime; its value is the source-level declaration the sweep reads. -func markCodeBoundInvariant(t *testing.T, source string) { - t.Helper() - if source == "" { - t.Fatal("markCodeBoundInvariant requires a non-empty independent-source reference") - } -} - -// ingestedFileReaders are the named helper functions in this package that read an -// instruction file the model ingests (a contract, a workflow README, a skill -// body) — the recognized-reader allowlist. A test that calls one of these (the READ -// — how it then inspects the bytes is irrelevant under the match-axis positive rule) -// is a presence/absence check over an ingested file, exactly the shape the proof -// policy bans as standalone behavioral proof. The AC-3 sweep treats such a test as -// tautological unless it declares markNonAC, or unless it binds the expected value to -// a code-side source (a re-bound Bucket-B invariant — markCodeBoundInvariant). The -// sweep distinguishes the two by the declaration: a re-bound invariant whose -// expectation diverges from the file declares markCodeBoundInvariant; a pure -// text-consistency lint declares markNonAC. -var ingestedFileReaders = map[string]bool{ - "foCore": true, - "foRuntime": true, - "presentGateSkill": true, - "feedbackRejectionFlowSkill": true, - "usingClaudeTeamSkill": true, - "vendoredSkillFiles": true, - "presentGateFrontmatterValue": true, - "feedbackRejectionFrontmatterValue": true, -} - -// TestNoUndeclaredTautologicalProof is the AC-3 sweep, re-runnable offline. It -// parses every *_test.go in this package and flags any test function that READS a -// recognized instruction file's content — via a named reader helper, a DIRECT -// os.ReadFile/os.Open of a recognized instruction literal/segment, or a -// WalkDir-collected `.md` — unless it self-classifies via markNonAC or -// markCodeBoundInvariant. The count of undeclared offenders is the AC-3 metric: it -// must be zero. -// -// What the guard guarantees, and what it deliberately does NOT (two axes): -// -// - MATCH axis (closed, universal, load-bearing — STATICALLY GUARDED): the sweep -// keys on the READ, not on how the bytes are then inspected. ONCE a read of a -// recognized instruction file is detected, the test MUST declare regardless of -// the inspection idiom — strings.Contains/Index/EqualFold, bytes.*, -// regexp.Regexp.Match, len(Split)>1, a bare `==`, anything. Enumerating "match -// functions" was whack-a-mole; this rule closes the whole class because the -// trigger is the ingest, not the match. -// -// - READER axis (does an undeclared instruction-file read hide via an undiscovered -// read SHAPE? — DETACHED-AUDIT-BACKSTOPPED, NOT statically guarded): a read is -// detected only when it is DIRECT — an in-package read sink whose path arg subtree -// carries a recognized instruction literal/segment (isInstructionPathLiteral), a -// named ingestedFileReaders helper, or a WalkDir `.md` collector. A read reached -// through any other shape — param/local/field/method/closure taint flow, a -// transitive helper chain, a `[]string`/range-element flow, a cross-package -// reader, or a path in a package var defined in another file — is NOT statically -// flagged. This is a deliberate, documented boundary: a per-package go/ast scan -// structurally cannot see a cross-package read or a path built in another file, so -// chasing reader-shape completeness was the enumeration trap the proof policy -// itself warns against. The reader axis is covered by the detached adversarial -// audit required at every high-stakes-surface gate (the validation-stage policy), -// NOT by this sweep. The prior cycles' M-A (AGENTS.md, mods/*.md) / M-B / M-C / M-D -// shapes are this audited boundary, not silent gaps. -// -// This sweep is itself a code-side invariant over real parsed test source, not a -// text match over an instruction file — its expected value (which reads directly -// reach an instruction file) is independent of any contract prose, so it can fail -// when a future edit adds an undeclared DIRECT ingest. -func TestNoUndeclaredTautologicalProof(t *testing.T) { - offenders := sweepUndeclaredTautologies(t, ".") - for _, o := range offenders { - t.Errorf("%s reads an ingested instruction file's content without calling markNonAC or markCodeBoundInvariant — declare it a non-AC text-consistency lint (with its behavioral oracle) or re-bind its expectation to a code-side source; how the bytes are inspected does not matter", o) - } - if len(offenders) > 0 { - t.Fatalf("AC-3 sweep: %d undeclared tautological-behavioral-proof test(s); the count must be zero", len(offenders)) - } -} - -// sweepUndeclaredTautologies returns the sorted names of test functions in the -// package at dir that read an ingested instruction file, match it with a -// substring/regex call, and do NOT declare markNonAC. Exported as a helper so the -// sweep's own mutation control (TestSweepDetectsAnUndeclaredTautology) can call it -// against a synthetic fixture. -func sweepUndeclaredTautologies(t *testing.T, dir string) []string { - t.Helper() - fset := token.NewFileSet() - entries, err := os.ReadDir(dir) - if err != nil { - t.Fatalf("read package dir %s: %v", dir, err) - } - var files []*ast.File - for _, ent := range entries { - name := ent.Name() - if ent.IsDir() || !strings.HasSuffix(name, "_test.go") { - continue - } - f, err := parser.ParseFile(fset, dir+"/"+name, nil, 0) - if err != nil { - t.Fatalf("parse %s: %v", name, err) - } - files = append(files, f) - } - - // First pass: the recognized reader set is the named ingestedFileReaders allowlist - // plus any non-test helper that DIRECTLY reads an instruction file - // (readsInstructionContent — a read sink whose path arg carries a recognized - // instruction literal/segment, or a WalkDir-collected `.md`). The seeded named - // helpers cover readers that return a non-`.md`-literal handle (vendoredSkillFiles - // returns a map). - readers := map[string]bool{} - for r := range ingestedFileReaders { - readers[r] = true - } - for _, f := range files { - for _, decl := range f.Decls { - fn, ok := decl.(*ast.FuncDecl) - if !ok || strings.HasPrefix(fn.Name.Name, "Test") { - continue - } - if readsInstructionContent(fn) { - readers[fn.Name.Name] = true - } - } - } - - // Second pass: a test is an offender if it ingests instruction-file content — - // directly (readsInstructionContent) or via a recognized reader helper — and does - // NOT declare its proof standing. The sweep keys on the READ, not on a match-func - // allowlist: any inspection of ingested bytes (Contains/Index/EqualFold, bytes.*, - // regexp.Match, a bare ==, …) is covered because the trigger is the ingest itself. - var offenders []string - for _, f := range files { - for _, decl := range f.Decls { - fn, ok := decl.(*ast.FuncDecl) - if !ok || !strings.HasPrefix(fn.Name.Name, "Test") { - continue - } - calls := collectCalls(fn) - readsIngested := readsInstructionContent(fn) - for r := range readers { - if calls[r] { - readsIngested = true - break - } - } - declared := calls["markNonAC"] || calls["markCodeBoundInvariant"] - if readsIngested && !declared { - offenders = append(offenders, fn.Name.Name) - } - } - } - return sortedUnique(offenders) -} - -// readsInstructionContent reports whether fn DIRECTLY ingests a recognized -// instruction file's content: a read sink (os.ReadFile/os.Open/io.ReadAll/bufio -// scanner-reader) whose path arg subtree carries a recognized instruction -// literal/segment (isInstructionPathLiteral), or a WalkDir/Walk over a tree -// collecting instruction `.md` files (the reader-of-many shape its callers read). -// This is the direct-read predicate — no taint-flow tracking. A read reached only -// through a param/local/field/method/closure flow, a transitive helper chain, or a -// range-element flow is NOT detected here; that reader-shape axis is the -// detached-audit-backstopped boundary documented on TestNoUndeclaredTautologicalProof. -func readsInstructionContent(fn *ast.FuncDecl) bool { - found := false - ast.Inspect(fn, func(n ast.Node) bool { - call, ok := n.(*ast.CallExpr) - if !ok { - return true - } - sel, ok := call.Fun.(*ast.SelectorExpr) - if !ok { - return true - } - if readSinks[sel.Sel.Name] { - for _, arg := range call.Args { - if exprInstructionTainted(arg) { - found = true - } - } - } - // A WalkDir/Walk that filters on an instruction `.md` path is a reader-of-many: - // it collects the paths its callers read+inspect. - if (sel.Sel.Name == "WalkDir" || sel.Sel.Name == "Walk") && fnFiltersInstructionMarkdown(fn) { - found = true - } - return true - }) - return found -} - -// readSinks are the call selectors that ingest a file's content given a path: the -// os reads, io.ReadAll over an opened handle, and the bufio scanner/reader -// constructors. A tainted instruction path flowing into any of these is an ingest. -var readSinks = map[string]bool{ - "ReadFile": true, // os.ReadFile - "Open": true, // os.Open - "ReadAll": true, // io.ReadAll - "NewScanner": true, // bufio.NewScanner - "NewReader": true, // bufio.NewReader -} - -// exprInstructionTainted reports whether an expression DIRECTLY carries an -// instruction-file path: it references an instruction-path string literal/segment -// (isInstructionPathLiteral) anywhere in its subtree — so the `+` / strings.Join / -// filepath.Join / fmt.Sprintf path-build idioms (whose instruction operand is a node -// in the subtree) are covered when their operands are literals. -func exprInstructionTainted(expr ast.Expr) bool { - hit := false - ast.Inspect(expr, func(n ast.Node) bool { - if x, ok := n.(*ast.BasicLit); ok { - if x.Kind == token.STRING && isInstructionPathLiteral(strings.Trim(x.Value, "`\"")) { - hit = true - } - } - return true - }) - return hit -} - -// fnFiltersInstructionMarkdown reports whether fn's body filters paths by an -// instruction `.md` suffix — the WalkDir-collector signal. A `.md` HasSuffix check -// (or an instruction-`.md` literal) anywhere in a WalkDir helper marks it a -// reader-of-many over the instruction surface. -func fnFiltersInstructionMarkdown(fn *ast.FuncDecl) bool { - hit := false - ast.Inspect(fn, func(n ast.Node) bool { - if lit, ok := n.(*ast.BasicLit); ok && lit.Kind == token.STRING { - if strings.HasSuffix(strings.Trim(lit.Value, "`\""), ".md") { - hit = true - } - } - return true - }) - return hit -} - -// collectCalls walks a function body and returns the set of called function names -// (bare `foo(...)` and selector `pkg.Foo(...)`/`recv.Method(...)` trailing name). -// Used to detect calls to discovered reader helpers and the markNonAC / -// markCodeBoundInvariant declarations. -func collectCalls(fn *ast.FuncDecl) map[string]bool { - calls := map[string]bool{} - ast.Inspect(fn, func(n ast.Node) bool { - if call, ok := n.(*ast.CallExpr); ok { - switch f := call.Fun.(type) { - case *ast.Ident: - calls[f.Name] = true - case *ast.SelectorExpr: - calls[f.Sel.Name] = true - } - } - return true - }) - return calls -} - -// instructionPathSegments are the skill-tree / contract path segments that mark a -// path literal as targeting an instruction file the model ingests (a skill, -// contract, agent, or runtime adapter) rather than a binary-parsed artifact (a -// manifest .json) or a dev-only doc (docs/dev/*.md recipes are NOT an LLM -// instruction surface and are intentionally out of scope — Cycle-2 P1 divergence: -// the sweep scopes to the shipped skill/contract surface, not every `.md` in the -// repo). -// -// This is the RECOGNIZED-instruction-surface predicate (a deliberate bound, not a -// universal one): a path carrying one of these listed segments is an instruction -// path. A path fragment carrying a segment is recognized even before a `.md` suffix -// is appended, so a literal "…/first-officer-shared-core" matches on its segment. -// -// A real instruction surface whose path carries NONE of these segments — e.g. -// AGENTS.md or mods/*.md — is not recognized and a read of it is not statically -// flagged. That reader-shape gap is the detached-audit-backstopped boundary -// documented on TestNoUndeclaredTautologicalProof — covered by the adversarial audit -// at the high-stakes gate, not by enumerating more segments here. -var instructionPathSegments = map[string]bool{ - "skills": true, - "references": true, - "agents": true, - "first-officer": true, - "ensign": true, - "commission": true, - "present-gate": true, - "SKILL.md": true, -} - -// isInstructionPathLiteral reports whether a string literal is (a fragment of) an -// instruction-file path: it carries a skill-tree/contract segment. A `.json` -// manifest path or a docs/dev recipe path carries none and is not instruction. -func isInstructionPathLiteral(s string) bool { - if strings.HasSuffix(s, ".json") { - return false - } - for seg := range instructionPathSegments { - if s == seg || strings.Contains(s, seg) { - return true - } - } - return false -} - -// TestSweepDetectsAnUndeclaredTautology is the mutation control for the AC-3 -// sweep itself: the sweep is the AC-3 oracle, so it must be demonstrated to RED on -// the exact shape it polices and GREEN once that shape declares its demotion. -// Without this, the sweep could silently degrade to a no-op (e.g. an ingested-file -// reader renamed out of the map) and pass vacuously. It writes two synthetic test -// files to a temp dir and runs the sweep against it: -// - undeclared: a test that reads an ingested file (foCore) and matches it -// (strings.Contains) but never calls markNonAC -> MUST be flagged. -// - declared: the same shape plus a markNonAC call -> MUST NOT be flagged. -func TestSweepDetectsAnUndeclaredTautology(t *testing.T) { - dir := t.TempDir() - undeclared := `package fixture -import "strings" -func TestUndeclaredFixture(t *T) { - fo := foCore(t) - if strings.Contains(fo, "x") { _ = fo } -} -` - declared := `package fixture -func TestDeclaredFixture(t *T) { - markNonAC(t, "behavioral oracle: live gate-guardrail scenario") - fo := foCore(t) - if strings.Contains(fo, "x") { _ = fo } -} -` - writeFile(t, dir+"/undeclared_test.go", undeclared) - offenders := sweepUndeclaredTautologies(t, dir) - if !containsStr(offenders, "TestUndeclaredFixture") { - t.Fatalf("sweep failed to flag an undeclared presence-check over an ingested file; offenders=%v", offenders) - } - - writeFile(t, dir+"/declared_test.go", declared) - offenders = sweepUndeclaredTautologies(t, dir) - if containsStr(offenders, "TestDeclaredFixture") { - t.Fatalf("sweep wrongly flagged a declared (markNonAC) text-consistency lint; offenders=%v", offenders) - } - if !containsStr(offenders, "TestUndeclaredFixture") { - t.Fatalf("adding a declared fixture must not stop the sweep flagging the undeclared one; offenders=%v", offenders) - } - - // A code-bound invariant (expectation from an independent source) is the other - // valid self-classification and must clear the sweep too. - codeBound := `package fixture -func TestCodeBoundFixture(t *T) { - markCodeBoundInvariant(t, "shared const presentGateSeamName") - fo := foCore(t) - if strings.Contains(fo, presentGateSeamName) { _ = fo } -} -` - writeFile(t, dir+"/codebound_test.go", codeBound) - offenders = sweepUndeclaredTautologies(t, dir) - if containsStr(offenders, "TestCodeBoundFixture") { - t.Fatalf("sweep wrongly flagged a code-bound invariant; offenders=%v", offenders) - } -} - -func containsStr(in []string, want string) bool { - for _, s := range in { - if s == want { - return true - } - } - return false -} - -func sortedUnique(in []string) []string { - seen := map[string]bool{} - var out []string - for _, s := range in { - if !seen[s] { - seen[s] = true - out = append(out, s) - } - } - // simple insertion sort — the lists are tiny - for i := 1; i < len(out); i++ { - for j := i; j > 0 && out[j-1] > out[j]; j-- { - out[j-1], out[j] = out[j], out[j-1] - } - } - return out -} diff --git a/skills/integration/portability_test.go b/skills/integration/portability_test.go deleted file mode 100644 index 6f88b090..00000000 --- a/skills/integration/portability_test.go +++ /dev/null @@ -1,181 +0,0 @@ -// ABOUTME: Portability oracle — the shipped instruction surface assumes nothing -// ABOUTME: from the running user's machine (no HOME config, interpreter, or internal path). -package integration - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// A clean install must run for any user. The shipped instruction surface — the -// text a clean-room user actually reads and follows (skills/**/*.md excluding -// the test-only integration dir, plus mods/) — must therefore name none of three -// non-portable dependencies: -// -// 1. personal-config: a HOME-rooted ~/.claude / $HOME+.claude / -// os.UserHomeDir+.claude read (depends on the running user's home layout). -// 2. interpreter-on-PATH: a python / python3 shell-out, or a commission/bin -// helper invocation, on the dispatch critical path (depends on an -// interpreter or plugin-private script being installed). -// 3. internal-helper-path: a plugin-private absolute path baked into shipped -// instructions (skills/commission/bin/status, {spacedock_plugin_dir}, -// .agents/plugins/marketplace.json) — a path that does not exist on a fresh -// install. -// -// This is the shipped-instruction-surface complement to zs #246's two -// host-neutrality oracles (which police the Go source and the generic FO -// contract prose, one altitude down). The walk helper shippedSkillText is shared -// read-only with skill_surface_test.go; this file adds no new walk. - -// homeRootedClaudeRe matches only the HOME-rooted personal-config forms: a -// `~/.claude` tilde path, or `$HOME` / `os.UserHomeDir` joined with `.claude` on -// the same line. It deliberately does NOT match a project-relative `.claude/` -// path (e.g. commission's {project_root}/.claude/agents or debrief's -// .claude/worktrees prune note) — those exist in any checkout and are portable. -// That HOME-rooted-vs-project-relative distinction is the discriminator that -// keeps the personal-config check from false-positiving on legitimate usage. -var homeRootedClaudeRe = regexp.MustCompile(`~/\.claude|\$HOME[^\n]*\.claude|os\.UserHomeDir[^\n]*\.claude`) - -// interpreterRe matches an interpreter-on-PATH dependency on the dispatch path: -// a `python`/`python3` shell-out or a `commission/bin/...` helper invocation. -// `\bpython` avoids matching unrelated substrings; commission/bin is the -// pre-#246 plugin-private helper path the FO loop no longer needs. -var interpreterRe = regexp.MustCompile(`\bpython3?\b|commission/bin`) - -// internalHelperPaths are the plugin-private absolute paths that do not exist on -// a fresh install. Substring (not regex) — these are literal path fragments. -var internalHelperPaths = []string{ - "skills/commission/bin/status", - "{spacedock_plugin_dir}", - ".agents/plugins/marketplace.json", -} - -// isClaudeAdapter reports whether a shipped file is a Claude-host coupling -// surface: the Claude host-runtime adapters (claude-*-runtime.md) and the -// Claude-host-only using-claude-team skill. Per zs #246 a `~/.claude/teams` read -// is the legitimate, quarantined Claude coupling; it lives in the runtime -// adapters and — since the team-lifecycle extraction — in the using-claude-team -// skill, which is Claude-specific by construction (no Codex analog). The -// personal-config check — and ONLY that check — excludes these surfaces. The -// interpreter and internal-helper-path checks still apply to them. -func isClaudeAdapter(path string) bool { - base := filepath.Base(path) - if strings.HasPrefix(base, "claude-") && strings.HasSuffix(base, "-runtime.md") { - return true - } - return strings.Contains(path, filepath.Join("using-claude-team", "SKILL.md")) -} - -// TestShippedSurfaceHasNoHiddenMachineDependency locks AC-1: the shipped -// instruction surface names none of the three non-portable markers. It is -// falsifiable — reintroducing any marker into a shipped file turns it RED naming -// the file — and it is not a tautology: it guards against an empty walk so a -// future scope bug that empties shippedSkillText fails loudly rather than -// passing vacuously. -func TestShippedSurfaceHasNoHiddenMachineDependency(t *testing.T) { - markNonAC(t, "n/a — pure portability property of the shipped instruction surface (a clean install names no HOME-config/interpreter/plugin-private path). No positive behavioral seam can prove this absence; the empty-walk guard below keeps it from passing vacuously") - root := skillsRoot(t) - repo := repoRoot(t) - files := shippedSkillText(t, root, repo) - if len(files) == 0 { - t.Fatal("shippedSkillText walked zero files — scope bug; the portability oracle would pass vacuously") - } - for _, path := range files { - data, err := os.ReadFile(path) - if err != nil { - t.Errorf("read %s: %v", path, err) - continue - } - content := string(data) - - // personal-config: HOME-rooted ~/.claude, excluding the Claude adapters - // where a ~/.claude/teams read is the legitimate quarantined coupling. - if !isClaudeAdapter(path) { - if m := homeRootedClaudeRe.FindString(content); m != "" { - t.Errorf("%s carries a HOME-rooted personal-config dependency %q — a clean install has no such file", path, m) - } - } - - // interpreter-on-PATH and internal-helper-path apply to ALL shipped - // files, adapters included. - if m := interpreterRe.FindString(content); m != "" { - t.Errorf("%s carries an interpreter-on-PATH dependency %q — the dispatch path must not assume an installed interpreter/helper", path, m) - } - for _, p := range internalHelperPaths { - if strings.Contains(content, p) { - t.Errorf("%s bakes in plugin-private path %q — it does not exist on a fresh install", path, p) - } - } - } -} - -// TestPortabilityCheckDiscriminatesHostSpecific locks AC-2: the oracle is a -// discriminator, not a blunt absence test. It proves — against the REAL shipped -// surface — that the legitimately host-specific forms are present yet GREEN: -// -// - the Claude adapter's legitimate `~/.claude/teams` read is present in the -// walked surface (so the adapter exclusion is load-bearing, not vacuous), -// and -// - the project-relative `.claude/agents` / `.claude/worktrees` paths that -// commission/refit/debrief use are present in the walked surface yet the -// HOME-rooted regex does NOT match them (so the personal-config check does -// not false-positive on portable project-relative paths). -// -// If either positive control disappears from the surface, this test fails — -// catching a refactor that quietly removed the very thing the discriminator -// distinguishes, which would otherwise let TestShippedSurfaceHasNoHiddenMachineDependency -// pass for the wrong reason. -func TestPortabilityCheckDiscriminatesHostSpecific(t *testing.T) { - markNonAC(t, "n/a — pure portability property: the positive controls for the discriminator in TestShippedSurfaceHasNoHiddenMachineDependency (the Claude-adapter ~/.claude exclusion is load-bearing, the HOME-rooted regex does not false-positive on project-relative .claude/ paths). A property of the shipped surface, not a behavioral claim") - root := skillsRoot(t) - repo := repoRoot(t) - files := shippedSkillText(t, root, repo) - - var sawAdapterHomeClaude bool - var sawProjectRelativeClaude bool - for _, path := range files { - data, err := os.ReadFile(path) - if err != nil { - t.Errorf("read %s: %v", path, err) - continue - } - content := string(data) - - // Positive control 1: a Claude adapter carries a legitimate ~/.claude - // read that the personal-config check (correctly) excludes. - if isClaudeAdapter(path) && strings.Contains(content, "~/.claude") { - sawAdapterHomeClaude = true - } - - // Positive control 2: a non-adapter shipped file carries a - // project-relative .claude/ path. It must be present (proving the - // discriminator has something to discriminate) AND the HOME-rooted regex - // must not match it (proving no false positive). Scan line by line so a - // HOME-rooted hit elsewhere in the same file cannot mask a project- - // relative line. - if !isClaudeAdapter(path) { - for _, line := range strings.Split(content, "\n") { - if !strings.Contains(line, ".claude/") { - continue - } - if strings.Contains(line, "~/.claude") || strings.Contains(line, "$HOME") { - continue // a HOME-rooted line, not the project-relative form - } - sawProjectRelativeClaude = true - if homeRootedClaudeRe.MatchString(line) { - t.Errorf("%s: project-relative .claude line wrongly matched the HOME-rooted personal-config regex (false positive): %q", path, strings.TrimSpace(line)) - } - } - } - } - - if !sawAdapterHomeClaude { - t.Error("positive control missing: no Claude adapter carries a ~/.claude read — the adapter-exclusion in the personal-config check is no longer load-bearing") - } - if !sawProjectRelativeClaude { - t.Error("positive control missing: no shipped file carries a project-relative .claude/ path — the HOME-rooted-vs-project-relative discriminator has nothing to discriminate") - } -} diff --git a/skills/integration/present_gate_test.go b/skills/integration/present_gate_test.go deleted file mode 100644 index 51607807..00000000 --- a/skills/integration/present_gate_test.go +++ /dev/null @@ -1,260 +0,0 @@ -// ABOUTME: AC-1/AC-2 oracles for the present-gate extraction — the Gate -// ABOUTME: Presentation template + nine assembly rules live in the skill, not the FO core, with a Skill() seam. -package integration - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// assemblyRuleFingerprints uniquely identifies each of the nine captain-facing -// assembly rules moved into the present-gate skill. Each literal is unique-1 in -// the pre-change first-officer-shared-core.md (verified during ideation). The -// count is the teeth of AC-2: a dropped rule reds the absence of its -// fingerprint in the skill. -var assemblyRuleFingerprints = map[string]string{ - "lede-first/decision-last": "Lede first, decision last", - "chosen-direction-required": "Chosen direction is required as FO prose", - "cite-the-report": "Cite the Stage Report; render a one-line gist roll-up", - "reviewer-findings-in-tiers": "Reviewer findings render in priority tiers", - "recommendation-appears-once": "Recommendation appears exactly once", - "bounce-back-names-asks": "Bounce-back recommendations name the concrete asks", - "no-format-pedantry-asides": "No format-pedantry asides", - "one-sentence-worktree-heads-up": "One sentence of worktree heads-up when approval changes worktree state", - "target-15-25-lines": "Target length: 15-25 lines", -} - -// gatePresentationFingerprints identifies the moved Gate-Presentation content: -// the format template plus a representative subset of the assembly rules. AC-1(a) -// asserts presence in the skill; AC-1(b) asserts absence from the FO core. -var gatePresentationFingerprints = map[string]string{ - "template": "Gate review: {entity title}", - "lede-first/decision-last": "Lede first, decision last", - "chosen-direction-required": "Chosen direction is required as FO prose", - "no-format-pedantry-asides": "No format-pedantry asides", - "target-15-25-lines": "Target length: 15-25 lines", -} - -// presentGateSkill reads the new skill body under test. -func presentGateSkill(t *testing.T) string { - t.Helper() - p := filepath.Join(skillsRoot(t), "present-gate", "SKILL.md") - b, err := os.ReadFile(p) - if err != nil { - t.Fatalf("read present-gate SKILL.md: %v", err) - } - return string(b) -} - -// foCore reads the FO shared-core contract under test. -func foCore(t *testing.T) string { - t.Helper() - return vendoredSkillFiles(t)["first-officer/references/first-officer-shared-core.md"] -} - -// TestGatePresentationPresentInSkill is a non-AC text-consistency lint: it -// asserts the moved Gate-Presentation fingerprints (template + assembly rules) are -// present in skills/present-gate/SKILL.md — that the prose MOVED here, real -// authoring work. Per the proof policy (f8b257cf) this presence check does NOT -// prove the FO actually renders the gate from the skill: a meaning-inverted skill -// body keeps every fingerprint. The behavior — the FO loads present-gate via -// Skill() and presents the gate without self-approving — is proven by the live -// gate-guardrail scenario (internal/ensigncycle, runClaudeGateGuardrailScenario / -// runCodexGateGuardrailScenario, asserted by assertGateHeld) and its offline -// mutation control TestGateGuardrailNegativeBrokenStateTransition. This lint only -// guards against the fingerprints being dropped or the prose being deleted. -func TestGatePresentationPresentInSkill(t *testing.T) { - markNonAC(t, "live gate-guardrail scenario (assertGateHeld) + TestGateGuardrailNegativeBrokenStateTransition") - skill := presentGateSkill(t) - for name, fp := range gatePresentationFingerprints { - if !strings.Contains(skill, fp) { - t.Errorf("present-gate SKILL.md missing %s fingerprint %q", name, fp) - } - } -} - -// TestAllNineAssemblyRulesPresentInSkill is a non-AC text-consistency lint: it -// asserts the skill carries all nine captain-facing assembly-rule fingerprints -// (the count is the teeth — a dropped rule reds the absence of its fingerprint). -// Per the proof policy this is text authoring, not behavioral proof: an inverted -// rule body keeps the fingerprint. The behavior that the FO actually FOLLOWS the -// assembly rules when rendering a gate is proven by the live gate-guardrail -// scenario, not this presence check. -func TestAllNineAssemblyRulesPresentInSkill(t *testing.T) { - markNonAC(t, "live gate-guardrail scenario (assertGateHeld) + TestGateGuardrailNegativeBrokenStateTransition") - skill := presentGateSkill(t) - if len(assemblyRuleFingerprints) != 9 { - t.Fatalf("expected 9 assembly-rule fingerprints, have %d", len(assemblyRuleFingerprints)) - } - for name, fp := range assemblyRuleFingerprints { - if !strings.Contains(skill, fp) { - t.Errorf("present-gate SKILL.md missing assembly rule %s fingerprint %q", name, fp) - } - } -} - -// TestGatePresentationAbsentFromFOCore is a non-AC text-consistency lint (dedup): -// it asserts the moved fingerprints are NO LONGER present in -// first-officer-shared-core.md — moved, not duplicated. Whole-file (NOT -// region-scoped): region-scoping an absence check would false-pass content that -// moved elsewhere in the file. This is a structural dedup property, not a -// behavioral claim; the FO's gate behavior is proven by the live gate-guardrail -// scenario. The lint guards against the block being re-inlined (which would -// re-introduce a fingerprint and flip this RED). -func TestGatePresentationAbsentFromFOCore(t *testing.T) { - markNonAC(t, "dedup lint; behavior via live gate-guardrail scenario (assertGateHeld)") - fo := foCore(t) - for name, fp := range gatePresentationFingerprints { - if strings.Contains(fo, fp) { - t.Errorf("first-officer-shared-core.md still inlines %s fingerprint %q (moved, not duplicated)", name, fp) - } - } -} - -// presentGateAtInclude matches an `@`-prefixed path token that resolves toward -// the present-gate skill — `@present-gate`, `@./present-gate`, -// `@../present-gate/SKILL.md`, etc. The leading `@` plus any run of relative-path -// segments (`./`, `../`, bare) ending in a `present-gate` path component is the -// disproven cross-skill include the seam must NOT use. A structural scan, not an -// enumerated literal table — it catches the `@./present-gate/...` family the old -// enum missed. -var presentGateAtInclude = regexp.MustCompile(`@(?:\.{1,2}/)*present-gate\b`) - -// TestFOCoreInvokesPresentGateSkill is a non-AC text-consistency lint: it asserts -// the FO core's `## Completion and Gates` section carries the -// Skill(skill="spacedock:present-gate") invocation literal and no disproven -// cross-skill @-include. Per the proof policy this presence check does NOT prove -// the FO invokes the skill: a meaning-inverted clause ("NEVER invoke present-gate; -// self-approve silently") keeps the Skill(...) substring and passes (verified in -// ideation — the mutation harness left this GREEN under inversion). The behavior — -// the FO actually loads present-gate and presents the gate without self-approving -// — is proven only by the live gate-guardrail scenario (assertGateHeld) and its -// offline mutation control. This lint guards the seam STRING (so the skill name in -// the contract and the skill's own frontmatter cannot silently drift apart) and -// bans the @-include mechanism; it is the text half, not the behavioral proof. -func TestFOCoreInvokesPresentGateSkill(t *testing.T) { - markNonAC(t, "live gate-guardrail scenario (assertGateHeld) + TestGateGuardrailNegativeBrokenStateTransition") - fo := foCore(t) - region := sectionAfter(fo, "## Completion and Gates") - if region == "" { - t.Fatal("first-officer-shared-core.md has no `## Completion and Gates` section") - } - if !strings.Contains(region, `Skill(skill="spacedock:present-gate")`) { - t.Errorf("`## Completion and Gates` section does not invoke Skill(skill=\"spacedock:present-gate\")") - } - if m := presentGateAtInclude.FindString(region); m != "" { - t.Errorf("`## Completion and Gates` section uses the disproven cross-skill @-include %q", m) - } -} - -// presentGateBannedHelperPrefixes selects, from the code-derived spacedock -// vocabulary, the dispatch/status command PREFIXES the gate-presentation skill -// must not name — its prose is FO judgment/format, not shell wiring. It -// deliberately omits the stage-option keys (the skill legitimately references -// `{feedback-to target}` when describing a bounce-back decision), so it bans only -// the qualified command invocations. -func presentGateBannedHelperPrefixes(t *testing.T) []string { - t.Helper() - var out []string - for _, tok := range spacedockLeakageTokens(t) { - if tok == "spacedock dispatch" || tok == "spacedock status" { - out = append(out, tok) - } - } - return out -} - -// TestPresentGateSkillFreeOfDispatchHelperLeak is a code-bound invariant: the -// gate-presentation skill is free of the binary's `spacedock dispatch` / -// `spacedock status` command prefixes. The expected token set is DERIVED from the -// binary's registered command verbs (spacedockTopLevelCommands), not a literal -// frozen against the skill — so it diverges when a command verb is renamed in -// cli.go, which is what lets this fail as an invariant. A `spacedock dispatch` / -// `spacedock status` token leaking into the skill reds it. -func TestPresentGateSkillFreeOfDispatchHelperLeak(t *testing.T) { - markCodeBoundInvariant(t, "spacedockTopLevelCommands (cli.go Use: verbs)") - skill := presentGateSkill(t) - banned := presentGateBannedHelperPrefixes(t) - if len(banned) == 0 { - t.Fatal("derived zero command prefixes — the cli.go command surface diverged") - } - for _, b := range banned { - if strings.Contains(skill, b) { - t.Errorf("present-gate SKILL.md leaks spacedock command prefix %q (gate-presentation prose is FO judgment, not shell wiring)", b) - } - } -} - -// presentGateFrontmatterValue returns the trimmed scalar value of a top-level -// `key:` line in skills/present-gate/SKILL.md's YAML frontmatter, with any -// surrounding quotes stripped. The bool reports whether the key was found. -func presentGateFrontmatterValue(t *testing.T, key string) (string, bool) { - t.Helper() - fm, ok := frontmatter(presentGateSkill(t)) - if !ok { - t.Fatal("present-gate SKILL.md has no YAML frontmatter block") - } - prefix := key + ":" - for _, line := range strings.Split(fm, "\n") { - if strings.HasPrefix(line, prefix) { - v := strings.TrimSpace(strings.TrimPrefix(line, prefix)) - v = strings.Trim(v, `"'`) - return v, true - } - } - return "", false -} - -// presentGateSeamName is the seam target name the FO contract actually invokes. -// It is read from a DIFFERENT file than the skill under test — the FO shared core, -// the file that drives the FO — so the skill's frontmatter `name:` and the -// contract's `Skill(skill="spacedock:NAME")` invocation have independent sources -// that can diverge. -const presentGateSeamLiteral = "present-gate" - -// TestPresentGateSkillNameMatchesSeam is a code-bound invariant: the skill's -// frontmatter `name:` equals the seam name the FO CONTRACT invokes -// (Skill(skill="spacedock:NAME") in first-officer-shared-core.md). The expected -// value comes from the contract, not from the skill file under test — so renaming -// the skill's frontmatter, or renaming the contract's invocation, makes the two -// diverge and reds this. That is the seam-drift the check exists to catch: a -// renamed skill the FO's invocation no longer reaches. -func TestPresentGateSkillNameMatchesSeam(t *testing.T) { - markCodeBoundInvariant(t, "FO contract Skill(skill=\"spacedock:present-gate\") invocation (first-officer-shared-core.md)") - name, ok := presentGateFrontmatterValue(t, "name") - if !ok { - t.Fatal("present-gate SKILL.md frontmatter has no name field") - } - seam := invokedSeamName(foCore(t), presentGateSeamLiteral) - if seam == "" { - t.Fatalf("FO contract does not invoke Skill(skill=\"spacedock:%s\") — the seam the skill name must match is gone", presentGateSeamLiteral) - } - if name != seam { - t.Errorf("present-gate SKILL.md frontmatter name is %q, but the FO contract invokes the seam %q — a renamed skill the FO invocation no longer reaches", name, seam) - } -} - -// TestPresentGateSkillIsFOInternal is a code-bound invariant binding the skill's -// `user-invocable` frontmatter to its ROLE in the FO contract: a skill the FO -// invokes mid-run via Skill(skill="spacedock:NAME") is FO-internal and MUST be -// `user-invocable: false`, never a captain-facing user skill. The expected value -// is not a free literal — it is REQUIRED by the contract invoking the seam: the -// presence of the invocation (an independent source) is what makes -// `user-invocable: true` wrong. Flipping the frontmatter to `true` while the FO -// still invokes the seam reds this. -func TestPresentGateSkillIsFOInternal(t *testing.T) { - markCodeBoundInvariant(t, "FO contract Skill(skill=\"spacedock:present-gate\") invocation implies FO-internal") - if invokedSeamName(foCore(t), presentGateSeamLiteral) == "" { - t.Fatalf("FO contract does not invoke the present-gate seam — the FO-internal premise no longer holds") - } - v, ok := presentGateFrontmatterValue(t, "user-invocable") - if !ok { - t.Fatal("present-gate SKILL.md frontmatter has no user-invocable field") - } - if v != "false" { - t.Errorf("present-gate SKILL.md frontmatter user-invocable is %q, but the FO contract invokes it as a mid-run seam — an FO-internal skill must be user-invocable: false", v) - } -} diff --git a/skills/integration/reconcile_session_contract_test.go b/skills/integration/reconcile_session_contract_test.go deleted file mode 100644 index 3f46129d..00000000 --- a/skills/integration/reconcile_session_contract_test.go +++ /dev/null @@ -1,99 +0,0 @@ -// ABOUTME: AC-6 oracle over the FO event-loop step-0 reconcile prose — roster -// ABOUTME: reconciliation needs a team identity; bare reconcile is git-only D/E. -package integration - -import ( - "os" - "path/filepath" - "strings" - "testing" -) - -// claudeFORuntime reads the vendored Claude first-officer runtime contract text. -func claudeFORuntime(t *testing.T) string { - t.Helper() - p := filepath.Join(skillsRoot(t), "first-officer", "references", "claude-first-officer-runtime.md") - b, err := os.ReadFile(p) - if err != nil { - t.Fatalf("read Claude FO runtime: %v", err) - } - return string(b) -} - -// reconcileStep0Region returns the body of the FO event-loop step-0 reconcile -// item: from the `0. **Reconcile sweep.**` list line up to (but excluding) the -// `1.` list item that follows. Scopes the AC-6 assertions to the one numbered -// step that documents reconcile, so the check is a structural region oracle, not -// a free-floating grep over the whole 34KB file. -func reconcileStep0Region(t *testing.T, text string) string { - t.Helper() - lines := strings.Split(text, "\n") - start := -1 - for i, line := range lines { - if strings.HasPrefix(line, "0. **Reconcile sweep.**") { - start = i - break - } - } - if start == -1 { - t.Fatal("FO runtime has no `0. **Reconcile sweep.**` event-loop step") - } - end := len(lines) - for i := start + 1; i < len(lines); i++ { - if strings.HasPrefix(lines[i], "1. ") { - end = i - break - } - } - return strings.Join(lines[start:end], "\n") -} - -// TestReconcileStep0RequiresTeamIdentityForRoster locks AC-6: the FO event-loop -// step-0 prose must state that the roster-derived classes (A/B/C) require a team -// identity — an explicit --team-name or a current-session match — and that bare -// reconcile without one is git-only (Class D/E). This is the prose half of the -// guarantee the AC-1/AC-4 code gates enforce; pairing them keeps the contract -// from silently re-describing bare reconcile as a safe default. -// -// Oracle: a structural region scan over the single step-0 list item. It asserts -// the region binds the team-identity precondition to the roster classes and the -// git-only fallback to bare invocation — not the mere presence of a phrase -// anywhere in the file. A paraphrase that drops the precondition (re-inviting the -// unsafe bare-fallback) fails this. -func TestReconcileStep0RequiresTeamIdentityForRoster(t *testing.T) { - markNonAC(t, "internal/dispatch reconcile_session_test.go (TestReconcileSessionMatchedDiscovery, TestReconcileExplicitTeamNameIgnoresSession, TestReconcileDegradeEmitsGitClasses) — the code gates enforcing the team-identity->roster-class behavior") - region := reconcileStep0Region(t, claudeFORuntime(t)) - - // The region must tie roster reconciliation to a required team identity. - if !strings.Contains(region, "team identity") { - t.Errorf("step-0 region does not state roster reconciliation needs a team identity:\n%s", region) - } - // It must name both ways to supply that identity: explicit --team-name and - // a current-session match. - if !strings.Contains(region, "--team-name") { - t.Errorf("step-0 region does not reference explicit --team-name:\n%s", region) - } - if !strings.Contains(region, "current-session") && !strings.Contains(region, "current session") { - t.Errorf("step-0 region does not reference the current-session match:\n%s", region) - } - // It must state that bare reconcile (no team identity) is git-only D/E. - if !strings.Contains(region, "git-only") { - t.Errorf("step-0 region does not state bare reconcile is git-only:\n%s", region) - } - if !strings.Contains(region, "D/E") { - t.Errorf("step-0 region does not name the git/filesystem classes D/E for the bare path:\n%s", region) - } -} - -// TestReconcileStep0DropsOptionalTeamNameFraming locks the second half of AC-6: -// the step-0 invocation line must no longer frame --team-name as a free-floating -// optional flag (`[--team-name {team_name}]`) whose omission silently falls back -// to an unsafe heuristic. The bracketed-optional form is exactly the wording the -// entity flagged as inviting the bare unsafe invocation. -func TestReconcileStep0DropsOptionalTeamNameFraming(t *testing.T) { - markNonAC(t, "internal/dispatch reconcile_session_test.go (TestReconcileExplicitTeamNameIgnoresSession + TestReconcileGateSuppressesEvenWithPopulatedRoster) — the code gates that make bare reconcile git-only") - region := reconcileStep0Region(t, claudeFORuntime(t)) - if strings.Contains(region, "[--team-name {team_name}]") { - t.Errorf("step-0 region still frames --team-name as a bracketed-optional flag, re-inviting the unsafe bare fallback:\n%s", region) - } -} diff --git a/skills/integration/ship_local_ceremony_test.go b/skills/integration/ship_local_ceremony_test.go deleted file mode 100644 index 147cbe50..00000000 --- a/skills/integration/ship_local_ceremony_test.go +++ /dev/null @@ -1,58 +0,0 @@ -// ABOUTME: Structural lint over the FO shared core's ship-local ceremony block — -// ABOUTME: the named subsection exists and references the merge-policy branch and sentinel. -package integration - -import ( - "strings" - "testing" -) - -// subsectionAfter returns the body of the markdown subsection beginning at the -// line equal to heading, up to (but excluding) the next heading at the same or a -// higher level (`### ` or `## `). It is `sectionAfter`'s `###`-aware sibling, -// needed because a `### ` subsection is otherwise swept past `### ` siblings. -func subsectionAfter(text, heading string) string { - lines := strings.Split(text, "\n") - start := -1 - for i, line := range lines { - if strings.TrimSpace(line) == heading { - start = i + 1 - break - } - } - if start == -1 { - return "" - } - end := len(lines) - for i := start; i < len(lines); i++ { - if strings.HasPrefix(lines[i], "## ") || strings.HasPrefix(lines[i], "### ") { - end = i - break - } - } - return strings.Join(lines[start:end], "\n") -} - -// TestShipLocalCeremonyBlockExists is a structural lint: the FO shared core -// carries a single named ship-local ceremony block that references the -// merge-policy branch and the local-merge sentinel. It scopes to the ceremony -// subsection — a real on-disk anchor whose absence (or a missing policy/sentinel -// reference) breaks the documented ship-local flow. The no-`--force` behavioral -// property of the terminal/local-merge transition is enforced by the real -// status codepath in internal/status (merge_policy_guard_test.go: -// TestMergeLocalNoSentinelTerminalSetSucceeds and siblings), so this lint does -// not grep the ceremony prose for force-related wording. -func TestShipLocalCeremonyBlockExists(t *testing.T) { - markNonAC(t, "internal/status merge_policy_guard_test.go (TestMergeLocalNoSentinelTerminalSetSucceeds and siblings)") - fo := vendoredSkillFiles(t)["first-officer/references/first-officer-shared-core.md"] - region := subsectionAfter(fo, "### Ship-Local Ceremony") - if region == "" { - t.Fatal("FO shared core missing the `### Ship-Local Ceremony` block (AC-5)") - } - if !strings.Contains(region, "merge: local") { - t.Error("ship-local ceremony must reference the `merge: local` policy branch") - } - if !strings.Contains(region, "local-merge:") { - t.Error("ship-local ceremony must name the local-merge sentinel") - } -} diff --git a/skills/integration/skill_surface_test.go b/skills/integration/skill_surface_test.go deleted file mode 100644 index 68e447fe..00000000 --- a/skills/integration/skill_surface_test.go +++ /dev/null @@ -1,374 +0,0 @@ -// ABOUTME: AC-1 skill-surface audit — the six user skills ship with valid -// ABOUTME: SKILL.md + resolvable reference closure, integration is test-only. -package integration - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// userSkills is the published user skill surface: the six skills the host -// discovers (each owns a SKILL.md). `integration` is deliberately absent — it -// holds only *_test.go and must not publish. -var userSkills = []string{"commission", "debrief", "refit", "ensign", "first-officer", "using-claude-team", "present-gate", "feedback-rejection-flow"} - -// TestUserSkillsPresentWithFrontmatter locks AC-1: each of the five user skills -// ships a SKILL.md whose YAML frontmatter declares a `name` and a `description`. -func TestUserSkillsPresentWithFrontmatter(t *testing.T) { - markNonAC(t, "n/a — structural config lint (each SKILL.md frontmatter declares name+description); the skill-discovery behavior is exercised by the host loading the surface") - root := skillsRoot(t) - for _, skill := range userSkills { - path := filepath.Join(root, skill, "SKILL.md") - data, err := os.ReadFile(path) - if err != nil { - t.Errorf("user skill %q has no SKILL.md at %s: %v", skill, path, err) - continue - } - fm, ok := frontmatter(string(data)) - if !ok { - t.Errorf("%s/SKILL.md has no YAML frontmatter block", skill) - continue - } - if !strings.Contains(fm, "name:") { - t.Errorf("%s/SKILL.md frontmatter missing a name field", skill) - } - if !strings.Contains(fm, "description:") { - t.Errorf("%s/SKILL.md frontmatter missing a description field", skill) - } - } -} - -// TestIntegrationIsTestOnlyAndExcluded locks AC-1's exclusion mechanism: the -// `skills/integration/` directory carries Go test files and NO SKILL.md, so the -// host's SKILL.md-keyed discovery omits it from the published skill set — no -// allow/deny list needed. The audit verifies the property rather than presuming -// it: integration has *_test.go and zero SKILL.md. -func TestIntegrationIsTestOnlyAndExcluded(t *testing.T) { - root := skillsRoot(t) - dir := filepath.Join(root, "integration") - entries, err := os.ReadDir(dir) - if err != nil { - t.Fatalf("read integration dir: %v", err) - } - sawTest := false - for _, e := range entries { - if e.IsDir() { - continue - } - if e.Name() == "SKILL.md" { - t.Errorf("skills/integration ships a SKILL.md — it would publish as a user skill") - } - if strings.HasSuffix(e.Name(), "_test.go") { - sawTest = true - } - } - if !sawTest { - t.Errorf("skills/integration has no *_test.go — expected the test-only surface") - } -} - -// referenceRe matches the two reference-include forms a SKILL.md uses: an -// `@references/foo.md` directive and a bare `references/foo.md` read path. The -// path captured in group 1 is resolved relative to the skill directory. -var referenceRe = regexp.MustCompile(`@?(references/[A-Za-z0-9_./-]+\.md)`) - -// TestUserSkillReferenceClosureResolves locks AC-1's closure half: every -// `@references/...md` / `references/...md` path mentioned in a user SKILL.md -// resolves to a real file under that skill's directory. A dangling reference -// (a ported skill pointing at a path that does not exist on `next`) fails here. -// Brace-placeholder template paths (e.g. references/templates/{name}.md) are -// resolved against their concrete siblings rather than the literal `{name}`. -func TestPiRuntimeAdaptersAreLoadable(t *testing.T) { - markCodeBoundInvariant(t, "os.Stat against the real skill tree (the adapter file must resolve on disk) — an independent filesystem source, not the SKILL.md text") - root := skillsRoot(t) - cases := []struct { - skill string - ref string - }{ - {skill: "first-officer", ref: "references/pi-first-officer-runtime.md"}, - {skill: "ensign", ref: "references/pi-ensign-runtime.md"}, - } - for _, tc := range cases { - skillDir := filepath.Join(root, tc.skill) - data, err := os.ReadFile(filepath.Join(skillDir, "SKILL.md")) - if err != nil { - t.Fatalf("%s: %v", tc.skill, err) - } - if !strings.Contains(string(data), tc.ref) { - t.Errorf("%s/SKILL.md does not advertise Pi runtime adapter %s", tc.skill, tc.ref) - } - if _, err := os.Stat(filepath.Join(skillDir, tc.ref)); err != nil { - t.Errorf("%s: Pi runtime adapter %s is not loadable: %v", tc.skill, tc.ref, err) - } - } -} - -func TestPiFirstOfficerRuntimeRequiresFreshSubagentContextForStages(t *testing.T) { - markNonAC(t, "Pi live runner (internal/ensigncycle TestLivePiSubagentEnsignSmoke exercises the Pi subagent dispatch path with fresh context)") - root := skillsRoot(t) - path := filepath.Join(root, "first-officer", "references", "pi-first-officer-runtime.md") - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read Pi first-officer runtime: %v", err) - } - content := string(data) - dispatch := sectionAfter(content, "## Dispatch") - if dispatch == "" { - t.Fatal("Pi first-officer runtime is missing the Dispatch section") - } - - required := []string{ - "Spacedock stage dispatches", - "pi-subagents", - "subagent(...)", - "context: \"fresh\"", - "Do not rely on the worker agent's default context", - "task prompt/dispatch content", - "workflow directory", - "entity file", - "completion checklist", - } - for _, want := range required { - if !strings.Contains(dispatch, want) { - t.Errorf("Pi first-officer Dispatch section does not contain required fresh-context invariant %q", want) - } - } -} - -func TestPiFirstOfficerRuntimeRequiresDispatchBuildArtifactForStages(t *testing.T) { - markNonAC(t, "Pi dispatch builder behavior is exercised by internal/dispatch build_pi_host tests and the Pi live runner; this is the shipped FO instruction-surface lint that requires FO to consume the generated dispatch artifact") - root := skillsRoot(t) - path := filepath.Join(root, "first-officer", "references", "pi-first-officer-runtime.md") - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read Pi first-officer runtime: %v", err) - } - dispatch := sectionAfter(string(data), "## Dispatch") - if dispatch == "" { - t.Fatal("Pi first-officer runtime is missing the Dispatch section") - } - - required := []string{ - "spacedock dispatch build", - "host: \"pi\"", - "assignment source of truth", - "entity slug/name", - "entity path", - "workflow directory", - "target stage", - "worktree path", - "completion checklist", - "emitted dispatch file prompt or dispatch file content", - "without composing a replacement assignment", - "context: \"fresh\"", - "phase", - "label", - "additive", - "must not redefine or replace", - } - for _, want := range required { - if !strings.Contains(dispatch, want) { - t.Errorf("Pi first-officer Dispatch section does not contain required dispatch-build invariant %q", want) - } - } -} - -func TestPiFirstOfficerRuntimeLimitsManualPromptFallback(t *testing.T) { - markNonAC(t, "Pi dispatch builder behavior is exercised by internal/dispatch build_pi_host tests and the Pi live runner; this is the shipped FO instruction-surface lint limiting manual prompt composition to break-glass fallback") - root := skillsRoot(t) - path := filepath.Join(root, "first-officer", "references", "pi-first-officer-runtime.md") - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read Pi first-officer runtime: %v", err) - } - dispatch := sectionAfter(string(data), "## Dispatch") - if dispatch == "" { - t.Fatal("Pi first-officer runtime is missing the Dispatch section") - } - - required := []string{ - "Manual Pi prompt composition", - "break-glass fallback only", - "spacedock dispatch build", - "unavailable", - "exits non-zero", - "explicitly debugging the dispatch builder", - "record the builder failure or unavailability reason", - "minimum canonical schema facts", - "auditable", - } - for _, want := range required { - if !strings.Contains(dispatch, want) { - t.Errorf("Pi first-officer Dispatch section does not contain required manual-fallback invariant %q", want) - } - } -} - -func TestPiFirstOfficerRuntimeForbidsSubagentAcceptanceForStages(t *testing.T) { - markNonAC(t, "Pi live runner (internal/ensigncycle TestLivePiSubagentEnsignSmoke exercises the Pi subagent dispatch path)") - root := skillsRoot(t) - path := filepath.Join(root, "first-officer", "references", "pi-first-officer-runtime.md") - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read Pi first-officer runtime: %v", err) - } - content := string(data) - dispatch := sectionAfter(content, "## Dispatch") - if dispatch == "" { - t.Fatal("Pi first-officer runtime is missing the Dispatch section") - } - - required := []string{ - "pi-subagents", - "subagent(... acceptance: ...)", - "do not use", - "task prompt/dispatch content", - "independent implementation-to-validation workflow", - "entity stage reports", - "product/state commits", - "independent validation", - "not same-agent acceptance finalization", - } - for _, want := range required { - if !strings.Contains(dispatch, want) { - t.Errorf("Pi first-officer Dispatch section does not contain required acceptance-contract invariant %q", want) - } - } -} - -func TestPiFirstOfficerRuntimeFollowupsAreFreshByDefault(t *testing.T) { - markNonAC(t, "Pi live runner (internal/ensigncycle TestLivePiSubagentEnsignSmoke exercises the Pi subagent dispatch path; fresh-redispatch is the default it drives)") - root := skillsRoot(t) - path := filepath.Join(root, "first-officer", "references", "pi-first-officer-runtime.md") - data, err := os.ReadFile(path) - if err != nil { - t.Fatalf("read Pi first-officer runtime: %v", err) - } - content := string(data) - followup := sectionAfter(content, "## Follow-up and Reuse") - if followup == "" { - t.Fatal("Pi first-officer runtime is missing the Follow-up and Reuse section") - } - - required := []string{ - "Fresh redispatch is the default safe behavior", - "Normal follow-up and retry dispatches are fresh assignment cycles", - "not context resumes", - "increment the epoch", - "previous completion must never satisfy the new epoch", - "non-fresh resume is only allowed as an explicit manual/debug exception", - "worker label", - "substrate", - "run/session handle", - "entity slug", - "stage", - "state", - "completion epoch", - } - for _, want := range required { - if !strings.Contains(followup, want) { - t.Errorf("Pi first-officer Follow-up and Reuse section does not contain required fresh-retry invariant %q", want) - } - } -} - -func TestUserSkillReferenceClosureResolves(t *testing.T) { - markCodeBoundInvariant(t, "os.Stat against the real skill tree (every @references/ path must resolve on disk) — an independent filesystem source") - root := skillsRoot(t) - for _, skill := range userSkills { - skillDir := filepath.Join(root, skill) - data, err := os.ReadFile(filepath.Join(skillDir, "SKILL.md")) - if err != nil { - t.Errorf("%s: %v", skill, err) - continue - } - for _, m := range referenceRe.FindAllStringSubmatch(string(data), -1) { - rel := m[1] - if strings.Contains(rel, "{") { - // A brace placeholder (references/templates/{name}.md): assert the - // parent directory exists and holds at least one concrete .md. - parent := filepath.Join(skillDir, filepath.Dir(rel)) - glob, _ := filepath.Glob(filepath.Join(parent, "*.md")) - if len(glob) == 0 { - t.Errorf("%s: templated reference %q has no concrete .md under %s", skill, rel, parent) - } - continue - } - if _, err := os.Stat(filepath.Join(skillDir, rel)); err != nil { - t.Errorf("%s: dangling reference %q (resolved %s): %v", skill, rel, filepath.Join(skillDir, rel), err) - } - } - } -} - -// TestNoPluginPrivateStatusPathInUserSkills locks AC-1/AC-2 reconciliation: no -// user skill (SKILL.md or its references) or the canonical pr-merge mod -// references the plugin-private status path or the {spacedock_plugin_dir} token. -// The reconciled surface calls `spacedock status`; a blind-copied python-era -// path fails here. -func TestNoPluginPrivateStatusPathInUserSkills(t *testing.T) { - markNonAC(t, "behavioral coverage: the launcher smoke seam (TestLauncherListSetArchive drives the real `spacedock status` binary) + internal/status/* prove the positive `spacedock status` path; this is the structural-absence complement over the shipped skill surface no positive seam can prove") - root := skillsRoot(t) - repo := repoRoot(t) - banned := []string{ - "skills/commission/bin/status", - "commission/bin/status", - "{spacedock_plugin_dir}", - ".agents/plugins/marketplace.json", - } - for _, path := range shippedSkillText(t, root, repo) { - data, err := os.ReadFile(path) - if err != nil { - t.Errorf("read %s: %v", path, err) - continue - } - content := string(data) - for _, b := range banned { - if strings.Contains(content, b) { - t.Errorf("%s references banned plugin-private path %q", path, b) - } - } - } -} - -// shippedSkillText returns every markdown file under skills/ (excluding the -// test-only integration dir) plus the canonical mods/, the full shipped -// instruction surface the banned-path audit walks. -func shippedSkillText(t *testing.T, skillsRootDir, repoRootDir string) []string { - t.Helper() - var out []string - walk := func(base string) { - filepath.WalkDir(base, func(p string, d os.DirEntry, err error) error { - if err != nil { - return err - } - if d.IsDir() && d.Name() == "integration" { - return filepath.SkipDir - } - if !d.IsDir() && strings.HasSuffix(p, ".md") { - out = append(out, p) - } - return nil - }) - } - walk(skillsRootDir) - walk(filepath.Join(repoRootDir, "mods")) - return out -} - -// frontmatter returns the YAML frontmatter block (between the leading `---` and -// the next `---`) and whether the document opened with one. -func frontmatter(doc string) (string, bool) { - if !strings.HasPrefix(doc, "---\n") { - return "", false - } - rest := doc[len("---\n"):] - end := strings.Index(rest, "\n---") - if end < 0 { - return "", false - } - return rest[:end], true -} diff --git a/skills/integration/skill_text_test.go b/skills/integration/skill_text_test.go deleted file mode 100644 index 05f59951..00000000 --- a/skills/integration/skill_text_test.go +++ /dev/null @@ -1,297 +0,0 @@ -// ABOUTME: Absence-invariant tests over the vendored FO/ensign skill surface — -// ABOUTME: AC-1 (no plugin status path) and AC-6 (no new PR-merge / `## Hook:` mod), via the structural scope-fence. -package integration - -import ( - "os" - "path/filepath" - "regexp" - "strings" - "testing" -) - -// skillsRoot is the vendored skill tree under test (the project skills/ dir -// this test package lives inside). -func skillsRoot(t *testing.T) string { - t.Helper() - p, err := filepath.Abs("..") - if err != nil { - t.Fatal(err) - } - return p -} - -// vendoredSkillFiles returns the vendored skill instruction surface: the FO and -// ensign reference markdown. -func vendoredSkillFiles(t *testing.T) map[string]string { - t.Helper() - root := skillsRoot(t) - rel := []string{ - "first-officer/references/first-officer-shared-core.md", - "first-officer/references/claude-first-officer-runtime.md", - "first-officer/references/codex-first-officer-runtime.md", - "ensign/references/ensign-shared-core.md", - "debrief/SKILL.md", - } - out := make(map[string]string, len(rel)) - for _, r := range rel { - p := filepath.Join(root, r) - b, err := os.ReadFile(p) - if err != nil { - t.Fatalf("read vendored skill file %s: %v", p, err) - } - out[r] = string(b) - } - return out -} - -// TestNoPluginStatusPathInVendoredSkills is a non-AC structural absence lint: no -// file in the vendored skill instruction surface references the plugin-private -// status path or the {spacedock_plugin_dir} token. This is a structural negative -// over the on-disk surface, not a behavioral claim; the real status invocation -// path is exercised by internal/status. The lint catches a blind-copied python-era -// path being re-introduced into the instruction text. -func TestNoPluginStatusPathInVendoredSkills(t *testing.T) { - markNonAC(t, "n/a — structural absence over the instruction surface; status behavior via internal/status") - for name, content := range vendoredSkillFiles(t) { - if strings.Contains(content, "skills/commission/bin/status") { - t.Errorf("%s references plugin-private status path 'skills/commission/bin/status'", name) - } - if strings.Contains(content, "spacedock_plugin_dir") { - t.Errorf("%s still references {spacedock_plugin_dir} plugin root", name) - } - } -} - -// TestNoPRMergeOrModBehaviorIntroduced locks AC-6: the vendored skill surface -// introduces no new `## Hook:` mod heading and no PR-merge flow beyond the -// existing mod-block convention the surface already documents. The vendored -// files are copies of the plugin skill text plus the three amendments; the -// amendments add no new lifecycle hook or PR-merge command. This asserts the -// amendment regions do not introduce a `## Hook:` heading and that no PR-merge -// invocation (gh pr merge / git merge --no-ff into main) was added. -// -// Oracle: the amendment-region scope-fence — an absence invariant over the -// vendored on-disk skill surface (the ensign file and the FO Split-Root -// amendment region, scoped via sectionAfter). No positive behavioral seam can -// prove an absence of behavior: a re-introduced `## Hook:` lifecycle mod or a -// new PR-merge command would silently change the dispatch lifecycle, and only -// this structural scope-fence over the amendment regions catches it. This is -// NOT bare prose-grep — it asserts a structural negative over the amendments, -// not the presence of an instruction clause. -func TestNoPRMergeOrModBehaviorIntroduced(t *testing.T) { - markNonAC(t, "n/a — structural absence scope-fence over the amendment regions; merge/dispatch lifecycle behavior via internal/status guards") - files := vendoredSkillFiles(t) - - // The only `## Hook:` text legitimately present is the pre-existing Mod Hook - // Convention documentation in the FO shared core (describing startup/idle/ - // merge points). The amendments must not add a NEW `## Hook: {point}` mod - // declaration. Assert the ensign file (which the split-root amendment B - // touched) introduces no `## Hook:` heading at all. - ensign := files["ensign/references/ensign-shared-core.md"] - if strings.Contains(ensign, "## Hook:") { - t.Errorf("ensign reference unexpectedly introduces a `## Hook:` heading") - } - - // The amendment-introduced region in the FO file is the Split-Root Worktree - // Contract subsection. Assert that region introduces no `## Hook:` heading — - // the pre-existing Mod Hook Convention text lives in a different section. - fo := files["first-officer/references/first-officer-shared-core.md"] - if region := sectionAfter(fo, "### Split-Root Worktree Contract"); strings.Contains(region, "## Hook:") { - t.Errorf("FO split-root amendment region introduces a `## Hook:` heading") - } - - // No PR-merge invocation may be introduced anywhere in the vendored surface. - prMergeMarkers := []string{"gh pr merge", "git merge --no-ff", "git merge --ff-only main"} - for name, content := range files { - for _, m := range prMergeMarkers { - if strings.Contains(content, m) { - t.Errorf("%s introduces a PR-merge invocation %q (out of scope per AC-6)", name, m) - } - } - } -} - -// TestSkillSurfaceDocumentsSpacedockBinInvariant is a non-AC text-consistency -// lint: the FO/ensign/debrief surface documents the env-aware -// `${SPACEDOCK_BIN:-spacedock}` launcher token. The actual launcher-resolution -// behavior is exercised by the front-door tests in internal/cli (the binary path -// propagation), not by this presence check; this only guards that the documented -// invocation token is not silently dropped from the instruction surface. -func TestSkillSurfaceDocumentsSpacedockBinInvariant(t *testing.T) { - markNonAC(t, "internal/cli front-door tests (spacedock binary path propagation)") - files := vendoredSkillFiles(t) - for _, name := range []string{ - "first-officer/references/first-officer-shared-core.md", - "ensign/references/ensign-shared-core.md", - "debrief/SKILL.md", - } { - content := files[name] - if !strings.Contains(content, "${SPACEDOCK_BIN:-spacedock}") { - t.Errorf("%s does not document the env-aware spacedock launcher invariant", name) - } - } -} - -// TestFirstOfficerDispatchDocsUseFlagFileMode is a code-bound invariant on the -// dispatch-build ergonomics contract: the FO runtime docs must teach the -// file-backed dispatch-build flags the BINARY actually parses (derived from -// dispatch.go's isBuildRequestFlag via spacedockBuildRequestFlags), not inline -// shell JSON. The required-flag set comes from code, not a literal frozen against -// the docs, so renaming a build flag in the router makes the docs check diverge -// and red. The banned inline-JSON absence and the host-derivation tokens -// (CLAUDECODE / CODEX_THREAD_ID env vars) are documented alongside. -func TestFirstOfficerDispatchDocsUseFlagFileMode(t *testing.T) { - markCodeBoundInvariant(t, "spacedockBuildRequestFlags (dispatch.go isBuildRequestFlag)") - files := vendoredSkillFiles(t) - claude := files["first-officer/references/claude-first-officer-runtime.md"] - codex := files["first-officer/references/codex-first-officer-runtime.md"] - - // The primary file-backed flags the docs must teach are the intersection of the - // binary's build-request flags with the load-bearing trio the dispatch contract - // names; binding to the code source means a renamed flag reds here. - wantFlags := intersect(spacedockBuildRequestFlags(t), "--entity-path", "--stage", "--checklist-file") - if len(wantFlags) != 3 { - t.Fatalf("the binary no longer defines all of --entity-path/--stage/--checklist-file as build-request flags; derived %v", wantFlags) - } - for name, content := range map[string]string{ - "claude-first-officer-runtime.md": claude, - "codex-first-officer-runtime.md": codex, - } { - for _, want := range wantFlags { - if !strings.Contains(content, want) { - t.Errorf("%s primary dispatch docs do not mention %s", name, want) - } - } - for _, banned := range []string{"echo '' | spacedock dispatch build", "python3 <<", "PYEOF"} { - if strings.Contains(content, banned) { - t.Errorf("%s still teaches fragile inline JSON pattern %q", name, banned) - } - } - } - if !strings.Contains(claude, "CLAUDECODE") { - t.Errorf("Claude FO runtime does not document CLAUDECODE host derivation") - } - if !strings.Contains(codex, "CODEX_THREAD_ID") { - t.Errorf("Codex FO runtime does not document CODEX_THREAD_ID host derivation") - } - if !strings.Contains(codex, "must never forward a prompt containing `Skill(skill=\"spacedock:ensign\")`") { - t.Errorf("Codex FO runtime lacks the explicit no-Claude-shaped-prompt guard") - } -} - -// commissionStateBackendDecisionRows extracts the two state-backend decision rows -// from the commission SKILL.md decision table: each `- **{Label}** (...)` bullet -// directly under the `**State backend (...)**` decision lead-in, returned keyed by -// its bold branch label. It bounds each row to its own bullet line so an -// assertion can check what that row alone binds, not a free-floating substring -// anywhere in the file. -func commissionStateBackendDecisionRows(t *testing.T) map[string]string { - t.Helper() - p := filepath.Join(skillsRoot(t), "commission", "SKILL.md") - b, err := os.ReadFile(p) - if err != nil { - t.Fatalf("read commission SKILL.md: %v", err) - } - lines := strings.Split(string(b), "\n") - // Find the decision lead-in, then collect the contiguous `- **Label**` bullets - // that immediately follow (the decision-table rows), stopping at the first - // non-bullet line. - start := -1 - for i, line := range lines { - if strings.HasPrefix(line, "**State backend") { - start = i + 1 - break - } - } - if start == -1 { - return nil - } - rows := map[string]string{} - labelRow := regexp.MustCompile(`^- \*\*([^*]+)\*\*`) - for i := start; i < len(lines); i++ { - line := lines[i] - if strings.TrimSpace(line) == "" { - continue - } - m := labelRow.FindStringSubmatch(line) - if m == nil { - if len(rows) > 0 { - break // past the contiguous decision-table block - } - continue - } - rows[m[1]] = line - } - return rows -} - -// TestCommissionStateBackendDecisionRule asserts the structural shape of the -// commission SKILL.md state-backend decision table: two distinct labeled branches -// under one decision lead-in, each row binding its own frontmatter outcome — the -// split-root row binds `state: .spacedock-state`, the inline row binds the omit/ -// $inline outcome and NOT the split-root path. A scaffolding agent reads exactly -// this table to choose a backend, so the load-bearing property is that both -// branches exist as separate rows and neither collapses into the other. -// -// This DEMOTES the prior grep-over-prose check (it asserted three free-floating -// substrings exist anywhere in the file, which a meaning-inverting paraphrase — -// e.g. swapping which condition selects split-root — would keep green). REPLACE -// with a behavioral driver was not feasible: the commission flow is -// instruction-driven (the agent follows SKILL.md's "Write the README with ..." -// steps); there is no Go scaffolder function that takes a standalone-vs-code-repo -// input and emits frontmatter, so nothing behavioral is invocable from a test. -// The honest fallback per the task is this structural decision-table assertion, -// which pins the two-row shape and each row's bound outcome rather than prose -// wording, and FAILS if a branch is dropped, merged, or rebound to the wrong path. -func TestCommissionStateBackendDecisionRule(t *testing.T) { - markNonAC(t, "n/a — structural two-row decision-table self-consistency (no Go scaffolder takes standalone-vs-code-repo and emits frontmatter); the split-root behavior is proven by internal/cli TestStateNewBirthsSplitRoot") - rows := commissionStateBackendDecisionRows(t) - splitRoot, hasSplit := rows["Split-root"] - inline, hasInline := rows["Inline"] - if !hasSplit { - t.Fatal("commission SKILL.md state-backend table missing the `- **Split-root**` row") - } - if !hasInline { - t.Fatal("commission SKILL.md state-backend table missing the `- **Inline**` row") - } - // The split-root row must bind the orphan-state path; the inline row must NOT - // claim it (that is the binding that flips if the two branches are swapped or - // collapsed). - if !strings.Contains(splitRoot, "state: .spacedock-state") { - t.Errorf("split-root decision row does not bind `state: .spacedock-state`: %q", splitRoot) - } - if strings.Contains(inline, "state: .spacedock-state") { - t.Errorf("inline decision row wrongly binds the split-root path `state: .spacedock-state`: %q", inline) - } - // The inline row must bind its own outcome (the $inline value, or the omit - // guidance) so the two rows carry distinct frontmatter choices. - if !strings.Contains(inline, "$inline") && !strings.Contains(inline, "omit") { - t.Errorf("inline decision row binds no inline outcome ($inline or omit): %q", inline) - } -} - -// sectionAfter returns the body of the markdown section beginning at the line -// equal to heading, up to (but excluding) the next top-level `## ` heading, or -// "" when the heading is absent. Used to scope an assertion to one section. -func sectionAfter(text, heading string) string { - lines := strings.Split(text, "\n") - start := -1 - for i, line := range lines { - if strings.TrimSpace(line) == heading { - start = i + 1 - break - } - } - if start == -1 { - return "" - } - end := len(lines) - for i := start; i < len(lines); i++ { - if strings.HasPrefix(lines[i], "## ") { - end = i - break - } - } - return strings.Join(lines[start:end], "\n") -} diff --git a/skills/integration/spacedock_vocabulary_test.go b/skills/integration/spacedock_vocabulary_test.go deleted file mode 100644 index 1984906c..00000000 --- a/skills/integration/spacedock_vocabulary_test.go +++ /dev/null @@ -1,307 +0,0 @@ -// ABOUTME: Code-derived spacedock CLI vocabulary — the independent source the -// ABOUTME: leakage checks bind to, AST-extracted from the dispatch router + the status stage-option keys. -package integration - -import ( - "go/ast" - "go/parser" - "go/token" - "path/filepath" - "regexp" - "testing" -) - -// skillSeamRe captures the skill name from a `Skill(skill="spacedock:NAME")` -// invocation in an FO/ensign contract — the integration seam the FO actually -// invokes mid-run. The captured NAME is the independent source a skill's -// frontmatter `name:` binds to: if the contract's invocation and the skill's -// declared name drift apart, the seam breaks, and a check comparing the two REDs. -var skillSeamRe = regexp.MustCompile(`Skill\(skill="spacedock:([a-z0-9-]+)"\)`) - -// invokedSeamName returns the skill name the given contract text invokes for the -// expected target. It scans every Skill(skill="spacedock:NAME") invocation and -// returns the matching NAME, or "" if the contract never invokes that seam. The -// expected value is read from the CONTRACT (the file that drives the FO), not from -// the skill file under test, so the two have independent sources that can diverge. -func invokedSeamName(contract, want string) string { - for _, m := range skillSeamRe.FindAllStringSubmatch(contract, -1) { - if m[1] == want { - return m[1] - } - } - return "" -} - -// spacedockDispatchSubcommands AST-extracts the dispatch subcommand names the -// binary actually routes, from the `switch args[0] { case "..." }` in -// internal/dispatch/dispatch.go's Run. This is an independent code-side source: -// the binary parses these as commands, not from any instruction file the model -// reads, and a rename in the router shifts the set — which is exactly what lets a -// leakage check that binds to it diverge from a stale token frozen in a skill. -func spacedockDispatchSubcommands(t *testing.T) []string { - t.Helper() - src := filepath.Join(repoRoot(t), "internal", "dispatch", "dispatch.go") - fset := token.NewFileSet() - f, err := parser.ParseFile(fset, src, nil, 0) - if err != nil { - t.Fatalf("parse dispatch.go: %v", err) - } - var subs []string - ast.Inspect(f, func(n ast.Node) bool { - fn, ok := n.(*ast.FuncDecl) - if !ok || fn.Name.Name != "Run" { - return true - } - ast.Inspect(fn, func(m ast.Node) bool { - sw, ok := m.(*ast.SwitchStmt) - if !ok { - return true - } - // Only the subcommand switch (`switch args[0]`) yields command names. - tag, ok := sw.Tag.(*ast.IndexExpr) - if !ok { - return true - } - if id, ok := tag.X.(*ast.Ident); !ok || id.Name != "args" { - return true - } - for _, stmt := range sw.Body.List { - cc, ok := stmt.(*ast.CaseClause) - if !ok { - continue - } - for _, expr := range cc.List { - if lit, ok := expr.(*ast.BasicLit); ok && lit.Kind == token.STRING { - subs = append(subs, trimQuotes(lit.Value)) - } - } - } - return false - }) - return false - }) - if len(subs) == 0 { - t.Fatal("extracted zero dispatch subcommands from dispatch.go — the AST source diverged from the router shape") - } - return subs -} - -// spacedockBuildRequestFlags AST-extracts the dispatch-build request flag names -// the binary accepts, from the `case "--entity-path", ...` in dispatch.go's -// isBuildRequestFlag. These are the real flags the build path parses — an -// independent code-side source for the "docs teach file-backed dispatch input" -// invariant: a flag renamed in code shifts the set, so a docs check binding to it -// tracks the binary's actual flag surface rather than a frozen literal. -func spacedockBuildRequestFlags(t *testing.T) []string { - t.Helper() - src := filepath.Join(repoRoot(t), "internal", "dispatch", "dispatch.go") - fset := token.NewFileSet() - f, err := parser.ParseFile(fset, src, nil, 0) - if err != nil { - t.Fatalf("parse dispatch.go: %v", err) - } - var flags []string - ast.Inspect(f, func(n ast.Node) bool { - fn, ok := n.(*ast.FuncDecl) - if !ok || fn.Name.Name != "isBuildRequestFlag" { - return true - } - ast.Inspect(fn, func(m ast.Node) bool { - cc, ok := m.(*ast.CaseClause) - if !ok { - return true - } - for _, e := range cc.List { - if lit, ok := e.(*ast.BasicLit); ok && lit.Kind == token.STRING { - flags = append(flags, trimQuotes(lit.Value)) - } - } - return true - }) - return false - }) - if len(flags) == 0 { - t.Fatal("extracted zero build-request flags from isBuildRequestFlag in dispatch.go") - } - return flags -} - -// spacedockTopLevelCommands AST-extracts the binary's top-level command names -// from the `Use: " ..."` fields of the cobra commands in internal/cli/cli.go. -// The first word of each Use string is the command verb (`status`, `dispatch`, -// `claude`, ...). This is the independent source for the `spacedock dispatch` / -// `spacedock status` leakage prefixes: the binary registers these verbs, and a -// rename in cli.go shifts the set, so a leakage check that binds to it tracks the -// real command surface rather than a frozen literal. -func spacedockTopLevelCommands(t *testing.T) []string { - t.Helper() - src := filepath.Join(repoRoot(t), "internal", "cli", "cli.go") - fset := token.NewFileSet() - f, err := parser.ParseFile(fset, src, nil, 0) - if err != nil { - t.Fatalf("parse cli.go: %v", err) - } - var cmds []string - ast.Inspect(f, func(n ast.Node) bool { - kv, ok := n.(*ast.KeyValueExpr) - if !ok { - return true - } - key, ok := kv.Key.(*ast.Ident) - if !ok || key.Name != "Use" { - return true - } - lit, ok := kv.Value.(*ast.BasicLit) - if !ok || lit.Kind != token.STRING { - return true - } - use := trimQuotes(lit.Value) - if i := indexSpace(use); i >= 0 { - use = use[:i] - } - if use != "" && use != "spacedock" { - cmds = append(cmds, use) - } - return true - }) - if len(cmds) == 0 { - t.Fatal("extracted zero top-level commands from cli.go Use: fields") - } - return cmds -} - -func indexSpace(s string) int { - for i := 0; i < len(s); i++ { - if s[i] == ' ' { - return i - } - } - return -1 -} - -// spacedockStageOptionKeys AST-extracts the stage-option keys the binary parses, -// from the `[]string{"feedback-to", ...}` literal in internal/status/stages.go. -// These are real frontmatter keys the binary reads, not file prose — an -// independent source that shifts when a key is renamed in code. -func spacedockStageOptionKeys(t *testing.T) []string { - t.Helper() - src := filepath.Join(repoRoot(t), "internal", "status", "stages.go") - fset := token.NewFileSet() - f, err := parser.ParseFile(fset, src, nil, 0) - if err != nil { - t.Fatalf("parse stages.go: %v", err) - } - var keys []string - ast.Inspect(f, func(n ast.Node) bool { - cl, ok := n.(*ast.CompositeLit) - if !ok { - return true - } - arr, ok := cl.Type.(*ast.ArrayType) - if !ok { - return true - } - if id, ok := arr.Elt.(*ast.Ident); !ok || id.Name != "string" { - return true - } - var lits []string - for _, e := range cl.Elts { - lit, ok := e.(*ast.BasicLit) - if !ok || lit.Kind != token.STRING { - return true - } - lits = append(lits, trimQuotes(lit.Value)) - } - // The stage-option list is the one containing "feedback-to" — pin to that - // literal so an unrelated []string{...} does not contribute. - for _, l := range lits { - if l == "feedback-to" { - keys = lits - return false - } - } - return true - }) - if len(keys) == 0 { - t.Fatal("did not find the stage-option-keys []string{...} (feedback-to) in stages.go") - } - return keys -} - -// spacedockLeakageTokens is the canonical spacedock-specific token set the -// host-neutral / generic-skill bodies must NOT name. It is DERIVED from code (the -// dispatch router's subcommand surface + the status stage-option keys), not a -// literal frozen in a test, so the set and any skill body can diverge — that -// divergence is what makes a leakage check able to fail as an invariant rather -// than as a self-match. -// -// Only spacedock-SPECIFIC forms are included, so a generic-prose word never -// false-fires: the bare `spacedock dispatch` / `spacedock status` prefixes; the -// dispatch helper subcommands QUALIFIED with their `spacedock dispatch ` prefix -// (so a bare English `reconcile`/`build` in event-loop prose is fine, only the -// qualified invocation leaks); and the hyphenated stage-option keys (e.g. -// `feedback-to`), which are spacedock frontmatter vocabulary, not generic words — -// the single-word stage keys (agent/fresh/model) are excluded because they appear -// in legitimate generic prose. -func spacedockLeakageTokens(t *testing.T) []string { - t.Helper() - var tokens []string - // The leak-prone top-level prefixes: dispatch + status, derived from the - // binary's registered command verbs (not a frozen literal). - for _, cmd := range spacedockTopLevelCommands(t) { - if cmd == "dispatch" || cmd == "status" { - tokens = append(tokens, "spacedock "+cmd) - } - } - for _, sub := range spacedockDispatchSubcommands(t) { - // build/show-stage-def are the host-neutral surface a skill may name; the - // Claude-coupled helper subcommands are the leak-prone ones — banned only in - // their qualified `spacedock dispatch ` form so a generic English word - // (a bare `reconcile`) does not false-fire. - switch sub { - case "build", "show-stage-def": - default: - tokens = append(tokens, "spacedock dispatch "+sub) - } - } - for _, k := range spacedockStageOptionKeys(t) { - // Only the hyphenated, spacedock-specific stage keys (feedback-to) are - // leak-prone as a bare token; single-word keys appear in generic prose. - if isHyphenated(k) { - tokens = append(tokens, k) - } - } - return tokens -} - -func isHyphenated(s string) bool { - for i := 0; i < len(s); i++ { - if s[i] == '-' { - return true - } - } - return false -} - -// intersect returns the want values that appear in have — used to derive the -// load-bearing subset of a code-extracted set without hardcoding the full set. -func intersect(have []string, want ...string) []string { - present := map[string]bool{} - for _, h := range have { - present[h] = true - } - var out []string - for _, w := range want { - if present[w] { - out = append(out, w) - } - } - return out -} - -func trimQuotes(s string) string { - if len(s) >= 2 && (s[0] == '"' || s[0] == '`') { - return s[1 : len(s)-1] - } - return s -} diff --git a/skills/integration/terminal_teardown_retry_test.go b/skills/integration/terminal_teardown_retry_test.go deleted file mode 100644 index b632e48a..00000000 --- a/skills/integration/terminal_teardown_retry_test.go +++ /dev/null @@ -1,338 +0,0 @@ -// ABOUTME: Structural lint over the FO terminal-teardown contract — step 10 and the -// ABOUTME: Claude runtime require BOUNDED best-effort teardown then a verbatim marker + hold; the launcher owns the exit. -package integration - -import ( - "strconv" - "strings" - "testing" -) - -// terminalTeardownMarker is the contract-mandated terminal-status sentinel the FO -// emits on cap-exhaustion of the bounded teardown. It is the load-bearing -// discriminator the AC-2 offline grade and the AC-1 live grade key on: neither -// bug shape emits it (the pre-yy give-up ends the turn silently after one failed -// TeamDelete; the post-yy retry-loop retries TeamDelete past the cap and never -// reaches a marker). The contract MUST mandate this string verbatim so the -// streamwatcher has a fixed substring to grep — drop it from the contract and the -// oracle has nothing to key on. Kept in sync with the same constant in -// internal/ensigncycle (terminalTeardownMarker). -const terminalTeardownMarker = "TERMINAL_TEARDOWN_BOUNDED: best-effort teardown exhausted; member(s) stuck in registry; holding for launcher." - -// TestTerminalTeardownIsBoundedBestEffort is a structural lint pinning the -// REVERSAL of the sonnet-live-ci #282 contract: the FO terminal teardown is a -// BOUNDED best-effort — a cooperative shutdown, then a bounded set of TeamDelete -// attempts with an inter-attempt settle, then STOP, emit the verbatim -// terminal-status marker, and HOLD (no further teardown tool calls). The PROCESS -// EXIT is the launcher's responsibility, NOT the FO's. -// -// This INVERTS yy/#282's shipped mandate ("retry the team-teardown call to -// success" / "keep settling and retrying the teardown until it succeeds" / "Only -// a TeamDelete success lets the FO end its turn"). The live AC-1 confirmation -// proved retry-to-success is UNREACHABLE: the dispatched member approves shutdown -// and its session dies, but Claude Code never clears it from the team members[] -// (upstream #38116/#57681), so TeamDelete never succeeds; and claude -p will not -// self-exit while members[] is populated, so an FO-driven process exit is -// impossible. The contract therefore cannot demand a TeamDelete success OR an FO -// self-exit — it grades the FO's CORRECT bounded best-effort and hands the exit -// to the launcher. -// -// Inversion-resistant (two-sided): the lint asserts BOTH the bounded-best-effort -// mandate IS present (including the verbatim marker, the hold, and launcher-owned -// exit) AND the retry-to-success / FO-self-exit phrases are ABSENT, scoped to the -// terminal-teardown step only (shared-core step 10; the Claude `## Terminal Team -// Teardown` section). An edit re-introducing retry-to-success, or an FO self-exit, -// or dropping the marker mandate, turns it RED. -// -// Oracle honesty: this lint asserts the contract carries the bounded-best-effort -// + marker clause; it does NOT prove the live FO obeys it. The behavioral oracle -// is the live-e2e CI run (AC-1). The lint guards against the clause being dropped -// or re-inverted in a future edit. -func TestTerminalTeardownIsBoundedBestEffort(t *testing.T) { - markNonAC(t, "internal/ensigncycle teardown-grade drive (#285): TestTerminalTeardownGradePassesOnMarkerEmission + TestTerminalTeardownGradeFailsWhenMarkerNeverEmitted (expectTerminalTeardownGrade over real/synthetic streams) + the live-e2e CI run (AC-1)") - files := vendoredSkillFiles(t) - - // negatingPhrases are the inversion fingerprints. Two groups: - // - the ORIGINAL #275 give-up fingerprints (still forbidden — the terminal - // teardown must not silently abandon on the first failure with no marker). - // - the yy/#282 retry-to-success + FO-self-exit fingerprints now DISPROVEN - // (retry-to-success is unreachable; FO self-exit is impossible). Their - // presence in a terminal-teardown region means the contract was re-inverted - // back to a demand the runtime cannot satisfy. Lower-cased; regions matched - // lower. - negatingPhrases := []string{ - // #275 give-up inversions: give up on the first failure / no retry / no marker. - "end the turn on the first", // the give-up directive - "do not call teamdelete again", // the no-retry directive - "do not retry", // any flat no-retry directive in this region - "stop at the first", // give-up-on-first-failure phrasing - // yy/#282 retry-to-success + self-exit, now disproven by the live run. - "retry the team-teardown call to success", // retry-to-success (unreachable) - "until `teamdelete` succeeds", // retry-to-success (unreachable) - "only a `teamdelete` success lets the fo end", // retry-to-success exit gate - "keep settling and retrying the teardown until it", // unbounded retry obligation - "keep the settle-then-`teamdelete` loop going", // unbounded retry obligation - "hard-exit at the cap", // disproven FO self-exit - "residual cleanup at process death", // disproven FO self-exit - "the fo exits the process", // disproven FO self-exit - "the fo must keep the settle-then-`teamdelete` loop", // unbounded retry obligation - // audit-cycle-1 near-synonym re-introductions (M1): an unbounded - // retry-to-success that swaps the exact tokens for a "rounds-until-clears" - // paraphrase. These are the common rewordings that re-establish the - // retry-to-success the runtime CANNOT satisfy. - "continues issuing", // "continues issuing settle-then-TeamDelete rounds…" - "rounds until", // "…rounds until the registry finally clears" - "until the registry", // "…until the registry clears/settles/empties" - "finally clears", // "…until the registry finally clears" - // audit-cycle-1 FO-self-exit re-introductions (M2): the FO killing its own - // process group / runtime and demoting the launcher to a "backstop". Scoped - // to the self-exit FINGERPRINTS, not over-broad: "terminates its own" (not a - // bare "the fo terminates", which a legit "the FO terminates the attempts" - // would false-red) and "launcher is only a" (not a bare "backstop", which - // any legit launcher-as-backstop phrasing would false-red). - "kill -9", // self-kill via Bash - "$ppid", // kill the parent process (the harness) - "terminates its own", // "the FO terminates its own runtime…" - "launcher is only a", // "…the launcher is only a backstop" (demotes launcher-owned exit) - } - - // Shared core, step 10 in Merge and Cleanup. Scope to step 10 ALONE — the - // other nine steps legitimately contain no teardown language, and an inverting - // edit lands inside step 10, so a whole-section scope would dilute the signal. - // Then DROP the trailing delegation sentence ("…are the adapter's"): it NAMES - // the cap/settle/marker while delegating their realization to the adapter, so a - // required-phrase that matched only there would let a gut-edit of the - // behavioral mandate pass green. The load-bearing clauses must appear in the - // BEHAVIORAL prose, not the delegation tail. - core := files["first-officer/references/first-officer-shared-core.md"] - step10 := numberedStep(sectionAfter(core, "## Merge and Cleanup"), 10) - if step10 == "" { - t.Fatal("FO shared core missing Merge-and-Cleanup step 10 (terminal teardown)") - } - step10Behavioral := stripDelegationTail(step10) - // The verbatim marker must be present (asserted directly — it is NOT masked - // for its own presence check); the behavioral required phrases below are - // checked with the marker MASKED so a marker-substring cannot satisfy them. - if !strings.Contains(step10Behavioral, terminalTeardownMarker) { - t.Errorf("shared-core step 10 missing the verbatim terminal-status marker %q", terminalTeardownMarker) - } - assertDirectionalMandate(t, "shared-core step 10", step10Behavioral, negatingPhrases, - []string{ - "bounded best-effort", // the bounded framing (not retry-to-success) - "attempt cap", // bounded fast retries - "wait a short settle interval", // inter-attempt settle (load-bearing, not delegated) - // STOP-at-cap semantics the marker substring CANNOT satisfy: the FO STOPS - // the attempts at the cap (a near-synonym retry-to-success "continues - // issuing … rounds until … clears" cannot keep this). The post-marker HOLD - // is NOT required — the relaxed grade proved the real producer resumes a - // bounded best-effort on re-invoke; what the prose forbids is an UNBOUNDED - // retry loop that never reaches the marker. - "STOPS the teardown attempts", // the bound terminates the attempts - "unbounded retry loop", // the bounded framing: an unbounded loop is the forbidden shape - // Launcher-owned exit, distinct from the marker's "for launcher." (M2 fix): - // the prose must assert the launcher owns the exit AND it is not the FO's. - // A `kill -9 $PPID` self-exit that demotes the launcher to a "backstop" - // cannot keep this. - "launcher's** responsibility, not the FO's", // launcher owns the exit, not the FO - }) - - // The using-claude-team skill: the Awaiting-Completion section already bans - // retrying TeamDelete BEFORE the completion signal (the pre-completion wait - // phase). The terminal teardown is a DIFFERENT phase: a BOUNDED set of TeamDelete - // attempts with an inter-attempt settle, then STOP and emit the verbatim marker, - // with the exit owned by the launcher. Scope to the `## Terminal Team Teardown` - // section. The skill IS the adapter realization, so the behavioral phrases are - // asserted directly. - skill := usingClaudeTeamSkill(t) - teardown := sectionAfter(skill, "## Terminal Team Teardown") - if teardown == "" { - t.Fatal("using-claude-team skill missing the `## Terminal Team Teardown` section (the bounded-best-effort realization of shared-core step 10)") - } - if !strings.Contains(teardown, terminalTeardownMarker) { - t.Errorf("Terminal Team Teardown missing the verbatim terminal-status marker %q", terminalTeardownMarker) - } - assertDirectionalMandate(t, "Terminal Team Teardown", teardown, negatingPhrases, - []string{ - "TeamDelete", // names the call it attempts - "active member", // names the race it attempts past - "wait for the settle", // inter-attempt settle mandate - "sleep 2", // the concrete settle - "attempt cap", // bounded attempts - // STOP-at-cap semantics the marker substring CANNOT satisfy: on - // cap-exhaustion the FO STOPS calling TeamDelete (a "continues issuing … - // rounds until … clears" near-synonym cannot keep this). The post-marker - // HOLD is NOT required — the relaxed grade passes the real producer's - // bounded resume on re-invoke; what the prose forbids is an UNBOUNDED retry - // loop that never reaches the marker. - "STOPS calling", // the bound terminates the TeamDelete calls - "unbounded retry loop", // the bounded framing: an unbounded loop is the forbidden shape - // Launcher-owned exit, distinct from the marker's "for launcher." (M2 fix): - // the prose must explicitly forbid an FO self-exit. A `kill -9 $PPID` - // self-exit cannot keep "never an FO self-exit". - "never an FO self-exit", // the exit is the launcher's, never the FO's - }) -} - -// assertDirectionalMandate fails if any negating (inverted-mandate) phrase is -// present in region, or if any required positive-mandate phrase is missing. The -// two-sided check is what makes the lint inversion-resistant: an inverted edit -// either introduces a negating phrase or drops a positive one (or both), so it -// cannot pass by merely preserving grep tokens. -// -// Required phrases are checked against the region with the terminal-status MARKER -// MASKED OUT (markerStripped). The marker string itself contains the substrings -// "holding" and "for launcher.", so a naive `strings.Contains(region, "hold")` -// or `…, "launcher")` is satisfied by the marker MENTION alone — they add ZERO -// discriminating power and let a gut-edit that keeps the verbatim marker but -// guts the STOP/hold/launcher-exit BEHAVIORAL prose pass green (the audit-cycle-1 -// M1/M2 holes). Masking the marker forces the behavioral required phrases to be -// satisfied by the actionable prose, not the sentinel. The marker's own presence -// is asserted separately by the caller (it is a required phrase in its own right). -func assertDirectionalMandate(t *testing.T, label, region string, negating, required []string) { - t.Helper() - lower := strings.ToLower(region) - for _, neg := range negating { - if strings.Contains(lower, neg) { - t.Errorf("%s contains the inverted-mandate phrase %q — the terminal teardown is BOUNDED best-effort then a marker emission, NOT retry-to-success or an FO self-exit", label, neg) - } - } - masked := strings.ToLower(strings.Replace(region, terminalTeardownMarker, " ", -1)) - for _, req := range required { - if !strings.Contains(masked, strings.ToLower(req)) { - t.Errorf("%s missing required directional-mandate phrase %q (checked case-insensitively against prose with the marker masked, so a marker-substring does not count)", label, req) - } - } -} - -// stripDelegationTail removes the trailing "…are the adapter's" delegation -// sentence from a teardown step. That sentence NAMES the load-bearing concepts -// (settle interval, cap value, marker) while delegating their realization to the -// adapter — so a required-phrase that matched only there would let a gut-edit of -// the BEHAVIORAL mandate pass green. Dropping it forces the behavioral phrases to -// be asserted against the actionable prose. If the delegation sentence is absent, -// the region is returned unchanged. -func stripDelegationTail(region string) string { - // Target the FINAL "are the adapter's" — the trailing delegation sentence. - // Step 10 also says "(roster and decomposition are the adapter's)" near its - // start, so the first occurrence is the wrong one; the delegation tail is - // always the last. - marker := "are the adapter's" - idx := strings.LastIndex(region, marker) - if idx < 0 { - return region - } - // Cut back to the start of the sentence containing the marker. Sentences in - // this prose end with ". " or ".** "; find the last sentence boundary before - // the marker and drop everything from there. - head := region[:idx] - if cut := strings.LastIndex(head, ". "); cut >= 0 { - return region[:cut+1] - } - return region[:idx] -} - -// numberedStep returns the body of the markdown ordered-list item beginning -// "{n}. " in region, up to (but excluding) the next sibling item "{n+1}. " or a -// following blank-line+non-list boundary. It scopes the lint to a single -// numbered step so an inverting edit inside that step is not diluted by the rest -// of the section. Returns "" if the item is not found. -func numberedStep(region string, n int) string { - lines := strings.Split(region, "\n") - startPrefix := strconv.Itoa(n) + ". " - nextPrefix := strconv.Itoa(n+1) + ". " - start := -1 - for i, line := range lines { - if strings.HasPrefix(line, startPrefix) { - start = i - break - } - } - if start == -1 { - return "" - } - end := len(lines) - for i := start + 1; i < len(lines); i++ { - // Next sibling numbered item, or a markdown subheading, ends the step. - if strings.HasPrefix(lines[i], nextPrefix) || strings.HasPrefix(lines[i], "#") { - end = i - break - } - } - return strings.Join(lines[start:end], "\n") -} - -// TestAwaitingCompletionStillBansPreCompletionTeamDelete guards the boundary the -// fix must NOT erode: BEFORE the ensign's completion signal, the FO must still -// not emit TeamDelete (retrying it during the wait phase is the original -// premature-teardown bug). The terminal bounded-teardown clause is a separate -// phase; this lint ensures the Awaiting-Completion ban survives the reversal. -// -// Structural ban check, with an HONEST ceiling (audit-cycle-2 P2 correction): a -// bare substring-PRESENCE check on "emit `TeamDelete`" passes an INVERTED ban -// that keeps the token, and a CLOSED list of forbidden affirmative spellings -// passes any affirmative phrased OUTSIDE the list (e.g. "it is fine to go ahead -// and call `TeamDelete` right now"). This lint does two structural things that do -// NOT depend on an exact spelling: -// -// 1. Positive framing anchors — the ban must remain a `- emit `TeamDelete“ -// bullet under the `Do not:` prohibition, with the premature-teardown -// rationale. Flipping the ban into a directive drops the bullet form or the -// rationale. -// 2. Affirmative-permission CO-OCCURRENCE scan — any line that pairs a -// permission cue (fine to / go ahead / should / may / ok to / feel free / -// allowed to) with a `TeamDelete` reference is flagged, regardless of the -// exact spelling. This catches the open class of "go ahead and call -// TeamDelete" re-introductions, not just a fixed four. -// -// HONEST CEILING: prose lints are inherently reword-evadable — a sufficiently -// novel paraphrase of "you may tear down early" that uses none of the permission -// cues above will still pass. This lint raises the bar to "the common affirmative -// re-intros red AND the ban's positive framing is pinned"; it is NOT the -// behavioral proof. -// -// Behavioral coverage: every live team scenario (gate-guardrail, rejection-flow) -// dispatches a real ensign and awaits its completion, so a premature -// pre-completion TeamDelete would break those runs — the ban is exercised -// implicitly. There is NO dedicated mutation-controlled drive that ASSERTS the -// pre-completion-TeamDelete ban specifically (distinct from the terminal-teardown -// HANG covered by the #285 teardown-grade and TestSonnetTeamDeleteHangReplay). -// That dedicated drive is OWED and flagged to team-lead for a follow-up task; it -// is NOT silently capped here. -func TestAwaitingCompletionStillBansPreCompletionTeamDelete(t *testing.T) { - markNonAC(t, "OWED dedicated drive (flagged to team-lead, follow-up TBD): the pre-completion-TeamDelete ban. Implicit today: every live team scenario (gate-guardrail/rejection-flow) awaits a real ensign completion, so a premature teardown breaks the run") - skill := usingClaudeTeamSkill(t) - region := sectionAfter(skill, "## Awaiting Completion") - if region == "" { - t.Fatal("using-claude-team skill missing the `## Awaiting Completion` section") - } - if !strings.Contains(region, "emit `TeamDelete`") { - t.Error("Awaiting Completion must still ban emitting TeamDelete before the completion signal arrives") - } - // (1) Positive framing anchors. - if !strings.Contains(region, "Do not:") { - t.Error("Awaiting Completion must keep the negative `Do not:` framing that governs the TeamDelete ban") - } - if !strings.Contains(region, "- emit `TeamDelete`") { - t.Error("the TeamDelete ban must remain a `- emit `TeamDelete`` bullet under the `Do not:` list, not a free-standing (potentially affirmative) mention") - } - if !strings.Contains(region, "tearing down is premature") { - t.Error("the TeamDelete ban must keep its premature-teardown rationale (the semantic that pins it as a ban, not a directive)") - } - // (2) Affirmative-permission co-occurrence scan: any LINE pairing a permission - // cue with a TeamDelete reference is an affirmative re-intro, regardless of the - // exact wording. The legitimate `- emit `TeamDelete`` ban bullet carries NO - // permission cue, so it is not flagged; an inverted "it is fine to go ahead and - // call `TeamDelete` right now" line carries both and reds. - permissionCues := []string{"fine to", "go ahead", "should", "may ", "ok to", "okay to", "feel free", "allowed to"} - lower := strings.ToLower(region) - for _, line := range strings.Split(lower, "\n") { - if !strings.Contains(line, "teamdelete") { - continue - } - for _, cue := range permissionCues { - if strings.Contains(line, cue) { - t.Errorf("Awaiting Completion line pairs a permission cue %q with TeamDelete — the pre-completion ban was inverted into an affirmative directive: %q", strings.TrimSpace(cue), strings.TrimSpace(line)) - } - } - } -} diff --git a/skills/integration/using_claude_team_test.go b/skills/integration/using_claude_team_test.go deleted file mode 100644 index b9ffbdf8..00000000 --- a/skills/integration/using_claude_team_test.go +++ /dev/null @@ -1,193 +0,0 @@ -// ABOUTME: AC-1/AC-2 oracles for the using-claude-team extraction — the four -// ABOUTME: generic team-lifecycle blocks live in the skill, not the FO runtime, with a clean boundary. -package integration - -import ( - "os" - "path/filepath" - "strings" - "testing" -) - -// foRuntimeLineBaseline is the pre-change line count of -// claude-first-officer-runtime.md (ground-truthed 2026-06-04, 305 lines). The -// four extracted blocks span ~90 lines; AC-1(d) asserts a material drop. -const foRuntimeLineBaseline = 305 - -// foRuntimeLineDropFloor is the AC-1(d) minimum line removal. The four blocks -// span ~90 lines (Team Creation generic ~22, Awaiting Completion ~29, Terminal -// Teardown ~12, Degraded Mode ~31); a ≥70-line drop makes "drops materially" -// checkable rather than vibes. -const foRuntimeLineDropFloor = 70 - -// genericBlockFingerprints uniquely identifies each of the four extracted -// generic team-lifecycle blocks. Each literal is unique-1 in the pre-change -// claude-first-officer-runtime.md (verified during ideation). AC-1(a) asserts -// presence in the skill; AC-1(b) asserts absence from the FO runtime. -var genericBlockFingerprints = map[string]string{ - "Team Creation": "TeamCreate failure recovery (priority-ordered ladder)", - "Awaiting Completion": "A new `system init` entry in the stream is NOT a completion signal", - "Terminal Team Teardown": "TERMINAL_TEARDOWN_BOUNDED", - "Degraded Mode": "Cooperative Shutdown Sweep", -} - -// usingClaudeTeamSkill reads the new skill body under test. -func usingClaudeTeamSkill(t *testing.T) string { - t.Helper() - p := filepath.Join(skillsRoot(t), "using-claude-team", "SKILL.md") - b, err := os.ReadFile(p) - if err != nil { - t.Fatalf("read using-claude-team SKILL.md: %v", err) - } - return string(b) -} - -// foRuntime reads the FO Claude runtime adapter under test. -func foRuntime(t *testing.T) string { - t.Helper() - return vendoredSkillFiles(t)["first-officer/references/claude-first-officer-runtime.md"] -} - -// TestGenericBlocksPresentInSkill is a non-AC text-consistency lint: each of the -// four generic team-lifecycle blocks is present in -// skills/using-claude-team/SKILL.md (the blocks MOVED here). Per the proof policy -// this is text authoring, not behavioral proof; the behavior these blocks govern -// (the FO creates a team, dispatches workers, awaits completion, tears down) is -// exercised for real by every team-using live scenario (the gate-guardrail and -// rejection-flow runs both launch a real team via the FO runtime). This lint -// guards against a block fingerprint being dropped. -func TestGenericBlocksPresentInSkill(t *testing.T) { - markNonAC(t, "live team-using scenarios (gate-guardrail/rejection-flow launch a real team)") - skill := usingClaudeTeamSkill(t) - for block, fp := range genericBlockFingerprints { - if !strings.Contains(skill, fp) { - t.Errorf("using-claude-team SKILL.md missing %s block fingerprint %q", block, fp) - } - } -} - -// TestGenericBlocksAbsentFromFORuntime is a non-AC text-consistency lint (dedup): -// the four generic fingerprints are NO LONGER present in -// claude-first-officer-runtime.md — the blocks moved, not duplicated. Whole-file -// (NOT region-scoped) so content that moved elsewhere does not false-pass. This is -// a structural dedup property, not a behavioral claim; the team behavior is proven -// by the live team-using scenarios. Re-inlining any block re-introduces its -// fingerprint and flips this RED. -func TestGenericBlocksAbsentFromFORuntime(t *testing.T) { - markNonAC(t, "dedup lint; behavior via live team-using scenarios") - fo := foRuntime(t) - for block, fp := range genericBlockFingerprints { - if strings.Contains(fo, fp) { - t.Errorf("claude-first-officer-runtime.md still inlines %s block fingerprint %q (moved, not duplicated)", block, fp) - } - } -} - -// TestFORuntimeInvokesSkill is a non-AC text-consistency lint: it asserts the FO -// runtime's `## Team Creation` section carries the -// Skill(skill="spacedock:using-claude-team") invocation literal and no disproven -// cross-skill @-include. Per the proof policy this presence check does NOT prove -// the FO invokes the skill: an inverted clause keeps the substring. The behavior — -// the FO loads the team-harness discipline and runs a real team — is exercised by -// every team-using live scenario. This lint guards the seam STRING and bans the -// @-include mechanism; it is the text half, not the behavioral proof. -func TestFORuntimeInvokesSkill(t *testing.T) { - markNonAC(t, "live team-using scenarios (gate-guardrail/rejection-flow launch a real team)") - fo := foRuntime(t) - region := sectionAfter(fo, "## Team Creation") - if region == "" { - t.Fatal("claude-first-officer-runtime.md has no `## Team Creation` section") - } - if !strings.Contains(region, `Skill(skill="spacedock:using-claude-team")`) { - t.Errorf("`## Team Creation` section does not invoke Skill(skill=\"spacedock:using-claude-team\")") - } - for _, banned := range []string{"@../using-claude-team", "@using-claude-team"} { - if strings.Contains(region, banned) { - t.Errorf("`## Team Creation` section uses the disproven cross-skill @-include %q", banned) - } - } -} - -// TestFORuntimeDroppedMaterially is a non-AC structural lint: it asserts -// claude-first-officer-runtime.md dropped at least foRuntimeLineDropFloor lines vs -// a hardcoded pre-change baseline. This is a drift-prone numeric floor (a -// secondary signal to the fingerprint-absence dedup lints), not a behavioral -// claim; no behavior depends on the exact line count. Kept as a sanity check that -// the extraction actually removed bulk, not as proof of any AC. -func TestFORuntimeDroppedMaterially(t *testing.T) { - markNonAC(t, "n/a — structural line-count floor, no behavior to drive") - fo := foRuntime(t) - lines := strings.Count(fo, "\n") - if fo != "" && !strings.HasSuffix(fo, "\n") { - lines++ - } - dropped := foRuntimeLineBaseline - lines - if dropped < foRuntimeLineDropFloor { - t.Errorf("claude-first-officer-runtime.md dropped only %d lines (baseline %d, now %d); want ≥ %d removed", - dropped, foRuntimeLineBaseline, lines, foRuntimeLineDropFloor) - } -} - -// TestSkillFreeOfSpacedockTokens is a code-bound invariant: the generic -// team-harness skill is free of the spacedock-specific vocabulary. The banned set -// is DERIVED from code (spacedockLeakageTokens: the dispatch router's helper -// subcommands qualified with their `spacedock dispatch ` prefix, the -// `spacedock dispatch`/`spacedock status` command prefixes from cli.go, and the -// hyphenated stage-option keys like `feedback-to`), not a literal frozen against -// the skill — so the set shifts when the binary's command surface changes, which -// is what lets this fail as an invariant. A spacedock token leaking into the -// skill reds it. The qualified `spacedock dispatch ` forms mean a bare -// English word (a generic `reconcile` in event-loop prose) never false-fires. -func TestSkillFreeOfSpacedockTokens(t *testing.T) { - markCodeBoundInvariant(t, "spacedockLeakageTokens (dispatch router + cli.go verbs + status stage keys)") - skill := usingClaudeTeamSkill(t) - banned := spacedockLeakageTokens(t) - if len(banned) == 0 { - t.Fatal("derived zero leakage tokens — the code-side vocabulary source diverged") - } - for _, b := range banned { - if strings.Contains(skill, b) { - t.Errorf("using-claude-team SKILL.md leaks spacedock-specific token %q (must stay team-harness-generic)", b) - } - } -} - -// TestSpacedockDecisionsStayInFORuntime is a code-bound invariant: the spacedock -// decision points REMAIN in the FO contract. The required anchors are the -// QUALIFIED dispatch-helper invocations DERIVED from the dispatch router -// (`spacedock dispatch build`, `spacedock dispatch context-budget`, -// `spacedock dispatch reconcile`) rather than literals frozen against the file — -// the binary defines these subcommands, so the anchor set tracks the real command -// surface and a subcommand renamed in code shifts the expectation. A qualified -// decision call wrongly moved out of the FO contract reds this. -func TestSpacedockDecisionsStayInFORuntime(t *testing.T) { - markCodeBoundInvariant(t, "spacedockDispatchSubcommands (dispatch.go router)") - fo := foRuntime(t) - subs := spacedockDispatchSubcommands(t) - required := spacedockQualified(subs, "build", "context-budget", "reconcile") - if len(required) != 3 { - t.Fatalf("expected build/context-budget/reconcile in the dispatch router, derived %v from %v", required, subs) - } - for _, anchor := range required { - if !strings.Contains(fo, anchor) { - t.Errorf("claude-first-officer-runtime.md no longer contains spacedock decision anchor %q (must stay in the FO contract)", anchor) - } - } -} - -// spacedockQualified returns `spacedock dispatch ` for each `want` that the -// router actually exposes in subs — so a decision anchor cannot name a subcommand -// the binary does not route, and a renamed subcommand drops out of the set. -func spacedockQualified(subs []string, want ...string) []string { - have := map[string]bool{} - for _, s := range subs { - have[s] = true - } - var out []string - for _, w := range want { - if have[w] { - out = append(out, "spacedock dispatch "+w) - } - } - return out -} diff --git a/skills/integration/working_principles_test.go b/skills/integration/working_principles_test.go deleted file mode 100644 index 222a5f2a..00000000 --- a/skills/integration/working_principles_test.go +++ /dev/null @@ -1,68 +0,0 @@ -// ABOUTME: AC-1 text-presence audit — the team's proven working habits ship in -// ABOUTME: the three instruction files in plain language with zero insider jargon. -package integration - -import ( - "os" - "path/filepath" - "strings" - "testing" -) - -// shippedInstructionFiles is the trio of instruction surfaces this task encodes -// the working principles into: the workflow guide a captain reads, the FO -// operating contract, and the worker (ensign) contract. The map value is a -// human label used in failure messages. -func shippedInstructionFiles(t *testing.T) map[string]string { - t.Helper() - root := skillsRoot(t) - repo := repoRoot(t) - paths := map[string]string{ - "workflow guide (docs/dev/README.md)": filepath.Join(repo, "docs", "dev", "README.md"), - "FO contract (first-officer-shared-core.md)": filepath.Join(root, "first-officer", "references", "first-officer-shared-core.md"), - "ensign contract (ensign-shared-core.md)": filepath.Join(root, "ensign", "references", "ensign-shared-core.md"), - } - out := make(map[string]string, len(paths)) - for label, p := range paths { - b, err := os.ReadFile(p) - if err != nil { - t.Fatalf("read shipped instruction file %s (%s): %v", label, p, err) - } - out[label] = string(b) - } - return out -} - -// TestShippedInstructionsCarryNoInsiderJargon locks the plain-language guarantee -// of AC-1: the three shipped instruction files contain zero insider-jargon -// tokens. "oracle" is the named token — the design proposal that seeded this work -// uses it pervasively for "external check," and that jargon must not leak into the -// instructions a clean-room contributor reads. The check is case-insensitive so -// "Oracle"/"ORACLE" cannot sneak through. -func TestShippedInstructionsCarryNoInsiderJargon(t *testing.T) { - markNonAC(t, "n/a — the claim is about the shipped instruction text (plain language, no insider jargon); banned-token absence lint with no behavior to drive") - bannedJargon := []string{"oracle"} - for label, content := range shippedInstructionFiles(t) { - lower := strings.ToLower(content) - for _, token := range bannedJargon { - if strings.Contains(lower, token) { - t.Errorf("%s contains insider-jargon token %q — shipped instructions must be plain language", label, token) - } - } - } -} - -// TestFOContractCarriesWorkingPrinciplesSection is a structural lint: it asserts -// the `## Working Principles` section heading exists in the FO contract. The -// heading is a structural anchor — a real on-disk section that other instruction -// text and refits reference by name — so deleting it is a non-paraphrasable -// mutation the lint catches. It deliberately does NOT grep the section's prose: -// wording is doc review, not a Go assertion, and a paraphrase that keeps the -// heading is not something this lint should pass or fail on. -func TestFOContractCarriesWorkingPrinciplesSection(t *testing.T) { - markNonAC(t, "n/a — the claim is about the contract text (a structural section-heading anchor); no behavior to drive") - fo := shippedInstructionFiles(t)["FO contract (first-officer-shared-core.md)"] - if !strings.Contains(fo, "## Working Principles") { - t.Errorf("FO contract missing the `## Working Principles` section heading") - } -}