From f5bde9f7fc98a4200c88c186ec8350307799e5ba Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:04:51 -0400 Subject: [PATCH 01/21] docs: design + ADR 0003 for v6.3.0 pipeline hardening (#41/#58/#59/#60) --- ...3-implement-n-completion-trust-boundary.md | 59 +++++ ...06-01-pipeline-hardening-4issues-design.md | 222 ++++++++++++++++++ 2 files changed, 281 insertions(+) create mode 100644 decisions/0003-implement-n-completion-trust-boundary.md create mode 100644 docs/plans/2026-06-01-pipeline-hardening-4issues-design.md diff --git a/decisions/0003-implement-n-completion-trust-boundary.md b/decisions/0003-implement-n-completion-trust-boundary.md new file mode 100644 index 0000000..8e8a7b2 --- /dev/null +++ b/decisions/0003-implement-n-completion-trust-boundary.md @@ -0,0 +1,59 @@ +# 0003. Implement-N completion is a lead-verified trust boundary, not a hook-blocked invariant + +**Status:** Accepted +**Date:** 2026-06-01 +**Decision-makers:** autodev maintainers, autonomous pipeline +**Related:** issue #58; `skills/subagent-driven-development/SKILL.md`; `agents/team-conventions.md`; docs/plans/2026-06-01-pipeline-hardening-4issues-design.md + +## Context + +Across infra-admin v1 + v1.1 autonomous runs, `Implement: N` tasks were flipped to +`completed` by implementers or via a blockedBy-clear *before* the code-reviewer's +quality gate, violating the team-conventions contract ("only code-reviewer flips +Implement-N to completed"). In v1.1 this masked a non-compiling tree (uncommitted +helper) and a CI-failing hash regression — both reported "done"; only the lead's +independent `verification-before-completion` pass caught them. + +Issue #58 asked for **plugin/harness enforcement**: reject +`TaskUpdate(status=completed)` on `Implement: *` tasks unless `owner == +code-reviewer`. + +We investigated whether a deterministic plugin hook can enforce this. It cannot: + +- The PreToolUse payload for a `TaskUpdate` call carries the *tool input* + (`taskId`, `status`, `owner`) but **not** the task's current `subject` + ("Implement: N") nor the identity of the calling subagent. The task store is + harness state a bash hook cannot read (there is no `TaskList` available to a + hook). +- Therefore a hook cannot reliably answer the two questions the block requires — + "is this taskId an Implement task?" and "is the caller the code-reviewer?". The + only deterministic option, "block all `status=completed`", would break every + legitimate completion (spec-review, quality-review, and the orchestrator's own). + +## Decision + +We reject the infeasible hard-block and instead **shift the trust boundary**: a +flipped `Implement: N` is a *claim*, not *evidence*, and is **not trusted as done +until the lead runs `autodev:verification-before-completion`** (build + test from a +clean tree, CI green) before treating the task as complete or proceeding to +`finishing-a-development-branch`. + +We encode this in `skills/subagent-driven-development/SKILL.md` (the +"Completion is not trusted until lead-verified" rule) and restate the +implementer/code-reviewer conventions in `agents/team-conventions.md`. We do **not** +add an advisory `TaskUpdate` hook: it cannot block, it cannot identify Implement +tasks, and it would fire on every completion (noise) for no enforcement gain. + +## Consequences + +The harm #58 names — a premature "done" masking a broken tree — is addressed at the +point it actually bit: the lead's verification gate, which already caught both v1.1 +regressions. The convention ("code-reviewer is the sole flipper") remains as +team discipline but is no longer load-bearing for correctness; correctness rests on +lead verification, which does not depend on who flipped the checkbox. + +The limitation is documented so the infeasible hard-block is not re-proposed. If a +future harness exposes the task subject **and** the calling subagent identity in +the PreToolUse payload, the deterministic block becomes feasible and this ADR +should be revisited (the convention is then mechanically enforceable). Until then, +"green checkbox ≠ verified" is the operative rule. diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md new file mode 100644 index 0000000..0f9b37d --- /dev/null +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md @@ -0,0 +1,222 @@ +# Autodev Pipeline Hardening (v6.3.0) — 4 Recurring Gate-Miss Fixes — Design + +**Status:** Approved (autonomous — user pre-authorized full-pipeline execution; trigger v6.3.0 release after merge) +**Date:** 2026-06-01 +**Issues:** #41, #58, #59, #60 (GoCodeAlone/autonomous-dev-kit) +**ADR:** decisions/0003-implement-n-completion-trust-boundary.md (#58) + +## Problem + +Four open issues, all recurring gate-misses observed across infra-admin v1/v1.1 +autonomous runs + Codex compaction: + +- **#59** — plan-phase reviews can pass a design that says "behind X auth filter"/ + "RBAC enforced" while the plan wires a *weaker*, shape-matching gate (v1.1: + client-supplied `evidence.granted_permissions` made write-tier RBAC theater; + caught only on plan-review cycle 2). +- **#60** — a `run_in_background` pr-monitoring **Agent** told to poll-loop-with- + sleep instead returns after ~1 cycle and re-completes repeatedly without the + conclusive report (early-exited ~6× in v1.1). The lead fell back to self-polling + `gh pr checks`, which worked but lost fire-on-event semantics. +- **#41** — the PreCompact hook (via Codex) is reported as emitting invalid hook + JSON ("hook returned invalid PreCompact hook JSON output"), interrupting + compaction-time state handling. Root cause class: locale warnings / diagnostics + leaking onto the hook's stdout before/around its JSON. +- **#58** — `Implement: N` tasks get flipped to `completed` by implementers or via + blockedBy-clear *before* the code-reviewer's quality gate, violating the + team-conventions contract. In v1.1 this masked a non-compiling tree + a + CI-failing hash regression both reported "done"; only the lead's + verification-before-completion caught them. + +## Goals + +One coherent v6.3.0 release hardening the autonomous pipeline against these four. + +1. **#59** — add a plan-phase bug-class that walks the design's auth/authz chain + component-by-component vs the plan's wiring; flag any gate that is + client-asserted rather than server-enforced against an authenticated principal. +2. **#60** — make the **bash poll-loop** the sanctioned pr-monitoring wait pattern + (a `run_in_background` Bash sleep-loop that *blocks to completion* and + re-invokes the lead once on settle), documenting the subagent early-exit + failure mode. Keep the subagent monitor as a documented fallback. +3. **#41** — harden the **`run-hook.cmd` wrapper** (the single choke-point all + hooks run through) to (a) export a deterministic locale with portable fallback + and (b) enforce **stdout JSON discipline**: only valid-JSON-or-empty reaches + the host's hook parser; non-JSON noise is routed to stderr. Plus a regression + test for noisy/locale-warning output. +4. **#58** — fix the *real* harm via a **trust boundary**, since a hard hook-block + is infeasible (see Design): a flipped `Implement: N` is NOT trusted-done until + the lead runs `verification-before-completion` (build + test from a clean + tree). Strengthen `subagent-driven-development` + `team-conventions` + ADR. + +## Non-Goals + +- No deterministic hook that *blocks* `TaskUpdate(status=completed)` on `Implement:*` + — infeasible (see #58 design). No fragile heuristic that pretends to. +- No change to the hook event registrations in `hooks.json` (the wrapper change is + transparent to all of them). +- No new pr-monitoring tool; reuse the existing `Bash run_in_background` primitive. +- No change to GHA/other-platform behavior; these are skill-doc + hook + version + changes within the plugin. + +## Global Design Guidance + +`Guidance: none at docs/design-guidance.md; canon = README §Cross-LLM, +docs/plans/2026-04-25-cross-llm-portability-design.md, ADRs 0001/0002.` + +| guidance | response | +|---|---| +| Host-neutral / cross-LLM first | The wrapper fix (#41) is the host-neutral choke-point; the bash poll-loop (#60) and trust-boundary (#58) are host-described with Claude-Code/Codex variants. Bug-class (#59) is pure prose. | +| Checklist is the floor | #59 row joins the mandatory-scan plan-phase set. | +| Don't claim enforcement you can't deliver | #58 explicitly rejects the infeasible hard-block and documents why (ADR) — directly applies the Existence/runtime-validity discipline (#55). | +| Strict stdout discipline for hooks | #41 centralizes it in the wrapper, not per-hook. | + +## Design + +### #59 — Auth/authz chain-composition bug-class (plan phase) + +Add one row to the **plan-phase** checklist of `skills/adversarial-design-review/ +SKILL.md` (after `Config-validation schema rules`): + +``` +| **Auth/authz chain composition** | When the design names an auth/authz chain ("behind the X auth filter", "RBAC-enforced", "admin-only"), walk that chain component-by-component against the plan's actual wiring. For each gate, verify it is enforced **server-side against an authenticated principal**, not shape-matched by a client-asserted value. Flag any gate where the plan's check reads from request/client-supplied input (`evidence.granted_permissions`, a header, a body field) instead of an authenticated subject (`authz.Enforce(authenticatedSubject, …)`). A plan that wires a weaker gate than the design's chain implies = finding. | +``` + +(Plan-phase, not design-phase: chain *composition* is a plan-wiring concern — the +design states the intent; the plan is where a weaker gate slips in.) + +### #60 — Bash poll-loop as the sanctioned pr-monitoring pattern + +In `skills/pr-monitoring/SKILL.md`, add a **"Waiting for CI: the sanctioned +pattern"** section near the top of the process and make the bash poll-loop the +recommended default, demoting the long-lived background Agent to a documented +fallback: + +- **Recommended:** a `Bash` tool call with `run_in_background: true` running a + `for`/`until` sleep-loop that polls `gh pr checks` until no check is `pending` + (or a failure/timeout), then prints a settle line and exits. The harness + re-invokes the lead once on exit (≈0 tokens while sleeping). On settle the lead + reads the result and admin-merges on `failures=0`. +- **Why:** a `run_in_background` **Agent** instructed to sleep-loop tends to return + after ~1 cycle and re-complete repeatedly (the agent loop is not a blocking + sleep) — observed early-exiting ~6× (#60). The bash sleep-loop genuinely blocks. +- **Fallback:** the existing background-Agent monitor remains documented for + multi-PR review-comment handling where active fix-and-push is needed; note its + early-exit failure mode and the self-poll fallback. +- Cadence guidance: poll every 30–60s for fast checks; don't sleep past the + prompt-cache window unnecessarily; cap total wait. + +### #41 — Wrapper-level locale + stdout-JSON discipline + +Harden `hooks/run-hook.cmd` (Unix portion — the choke-point every hook runs +through): + +1. **Deterministic locale:** before exec, set `LC_ALL`/`LANG` to a locale that is + actually installed — prefer the existing valid locale; if the inherited locale + is `C.UTF-8`/unset and `C.UTF-8` is not installed, fall back to `C` (extends the + existing conditional fallback to also cover unset/empty + `LC_CTYPE`). +2. **Stdout JSON discipline:** run the hook with stdout captured. If `jq` is + available: when the captured stdout is **empty** → emit nothing; when it is + **valid JSON** → emit it verbatim; otherwise → route the whole captured stdout + to **stderr** (diagnostics) and emit nothing (a hook that emits non-JSON to + stdout is always a bug under the Claude-Code/Codex hook protocol, so suppressing + it from stdout cannot break a correct hook). Preserve the hook's exit code. + When `jq` is **absent**, pass stdout through unchanged (don't break hooks on + minimal hosts). + + Note: this changes `exec bash …` to a captured invocation. Stderr passes through + untouched. The wrapper must not add latency beyond the hook's own runtime. + +3. **Regression test:** extend `tests/hook-contracts.sh` (or add + `tests/hook-stdout-discipline.sh`) with cases: (a) a hook that prints a locale + warning then valid JSON → wrapper emits only the JSON; (b) a hook that prints + only noise → wrapper stdout empty, noise on stderr, exit code preserved; (c) a + hook that emits valid JSON → unchanged; (d) jq-absent path → pass-through. + +### #58 — Implement-N completion trust boundary (hard-block is infeasible) + +**Investigation (the key finding):** a deterministic plugin hook that *blocks* +`TaskUpdate(status=completed)` on `Implement:*` tasks unless `owner == +code-reviewer` is **not feasible**: +- The PreToolUse payload for a `TaskUpdate` call carries the *tool input* + (`taskId`, `status`, `owner`) but **not the task's current subject** ("Implement: + N") nor the calling subagent's identity. The task store is harness state the + hook cannot read (no `TaskList` from a bash hook). +- So the hook cannot reliably determine "is taskId an Implement task?" or "is the + caller the code-reviewer?" — the two facts the block needs. A heuristic (e.g. + block all `completed`) would break legitimate completions. + +**Feasible fix — shift the trust boundary (ADR 0003):** the harm in v1.1 was not +*who flipped the bit* but that a flipped `Implement: N` was **trusted as done** +while the tree didn't compile / CI failed. So: +- `skills/subagent-driven-development/SKILL.md`: add an explicit **"Completion is + not trusted until lead-verified"** rule — regardless of who/what flips an + `Implement: N` to `completed`, the lead MUST run + `autodev:verification-before-completion` (build + test from a clean tree, CI + green) before treating that task as truly done or proceeding to + `finishing-a-development-branch`. A green checkbox is a *claim*, not *evidence*. +- `agents/team-conventions.md`: restate the code-reviewer-sole-flipper convention + AND add the implementer rule "never self-complete an `Implement: N`; DM the + reviewer instead" + the lead's trust-boundary gate. +- ADR 0003 records: rejected the infeasible hard-block; chose the verification + trust-boundary; documented the hook-payload limitation so it isn't re-proposed. + +## Security Review + +- **#41** is security-adjacent: a hook that leaks non-JSON to stdout can corrupt + the host's view of a *block* decision (e.g. a scope-guard `{"decision":"block"}` + preceded by a locale warning could be ignored → a guarded action proceeds). + Centralizing JSON discipline in the wrapper makes every block decision reliably + delivered. No secrets touched; no network; the wrapper reads only the hook's own + stdout. +- **#59** *is* an auth/authz review-quality improvement — it strengthens detection + of client-asserted-permission theater in plans. +- **#58/#60** are process/doc; no runtime security surface. + +## Infrastructure Impact + +None at runtime. Plugin skill/hook/doc changes + a version bump. Release path is +the existing `release-tag.yml` (push to main touching `.claude-plugin/plugin.json` +→ version-check → tag → marketplace dispatch). + +## Multi-Component Validation + +- **#41 wrapper × hooks:** `tests/hook-stdout-discipline.sh` runs the **real** + `run-hook.cmd` against fixture hook scripts (noisy + clean + jq-absent) and + asserts stdout/stderr/exit-code — the real wrapper↔hook boundary, not a mock. + Also run the existing `tests/hook-contracts.sh` to confirm no regression to the + real hooks (session-start, scope-guard, completion-claim-guard, pre-compact). +- **#59/#60/#58:** `tests/skill-content-grep.sh` (host-neutral lint) + + `tests/skill-cross-refs.sh` pass on the edited skills; plan-phase reviewers will + enumerate the new class because the plan-phase checklist is embedded in their + dispatch prompt. + +## Assumptions + +| id | assumption | challenge | fallback | +|---|---|---|---| +| A1 | All autodev hooks emit JSON-or-nothing on stdout (Claude-Code/Codex protocol) | A hook might emit intentional plain-text stdout | Verified: every registered hook emits `{…}` JSON or nothing. If a future hook needs plain-text stdout it must opt out — but none do today. | +| A2 | The PreToolUse payload lacks task subject + caller identity | Harness could add it later | Verified against the documented hook payload; if it ever exposes both, the hard-block becomes feasible and ADR 0003 should be revisited. | +| A3 | A `run_in_background` Bash sleep-loop genuinely blocks + re-invokes the lead once on exit | Host could change background semantics | Directly observed working across this session's many CI waits ([[feedback_ci_wait_use_bash_poll_loop]]); documented as host-described. | +| A4 | The wrapper can capture stdout without breaking hooks that read stdin | Hooks read the harness JSON from stdin, not the wrapper | The wrapper only redirects the hook's *stdout*; stdin still flows from the host to the hook unchanged. | +| A5 | v6.3.0 is the right bump + the tag is free | Could collide (the #804 lesson) | Verified `git ls-remote --tags` shows v6.3.0 free; current 6.2.2 → 6.3.0 minor. | + +## Rollback + +- Revert the PR(s). #59/#60/#58 are additive prose; #41 reverts the wrapper to its + prior conditional-locale form (no migration). The version bump reverts with the + same commit; do not push the v6.3.0 tag if reverted pre-merge. +- **#41 runtime note:** the wrapper is on the hook hot-path. If the captured-stdout + change regresses a hook, rollback = revert `run-hook.cmd` to the `exec bash` form + + re-run `tests/hook-contracts.sh`. The regression test gates this pre-merge. + +## Self-Challenge + +- **Simplest alternative:** fix the PreCompact hook script alone (#41) instead of + the wrapper. Rejected — the issue explicitly wants the wrapper-level fix so + *every* hook gets stdout discipline, not just one. +- **Most fragile assumption:** A1 (hooks emit JSON-or-nothing). Mitigated by the + jq-absent pass-through + the fact that all current hooks comply. +- **YAGNI sweep:** no advisory `TaskUpdate` hook for #58 (it can't block, fires on + every completion → noise); no new monitor tool for #60; no per-hook locale edits + (#41 is wrapper-only). All rejected as surface the issues didn't ask for. From d68697575b1cf52f470402673f76dfd4624e7cb6 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:15:35 -0400 Subject: [PATCH 02/21] docs: revise v6.3.0 design per adversarial cycle 1 (C1 JSON-extract + I1/I2/I3 + m1/m2/m3) + add #61 reminder dedup --- ...3-implement-n-completion-trust-boundary.md | 7 + ...06-01-pipeline-hardening-4issues-design.md | 166 +++++++++++++----- 2 files changed, 127 insertions(+), 46 deletions(-) diff --git a/decisions/0003-implement-n-completion-trust-boundary.md b/decisions/0003-implement-n-completion-trust-boundary.md index 8e8a7b2..0f2c4dd 100644 --- a/decisions/0003-implement-n-completion-trust-boundary.md +++ b/decisions/0003-implement-n-completion-trust-boundary.md @@ -29,6 +29,13 @@ We investigated whether a deterministic plugin hook can enforce this. It cannot: "is this taskId an Implement task?" and "is the caller the code-reviewer?". The only deterministic option, "block all `status=completed`", would break every legitimate completion (spec-review, quality-review, and the orchestrator's own). +- The closest feasible hook approach — a `SubagentStop` hook reading + `transcript_path` to map `taskId → "Implement: N"` (via `TaskCreate` records) and + flag a matching `TaskUpdate(status=completed)` — was also rejected: it fires + *after* completion (warn, not block), sees only subagent completions (not a + lead-level `TaskUpdate`), depends on the unstable transcript JSONL shape, and is + racy across subagents. Fragile transcript-format coupling for marginal, + after-the-fact value. ## Decision diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md index 0f9b37d..662dfaf 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md @@ -1,9 +1,10 @@ -# Autodev Pipeline Hardening (v6.3.0) — 4 Recurring Gate-Miss Fixes — Design +# Autodev Pipeline Hardening (v6.3.0) — 5 Recurring Gate-Miss / Context-Waste Fixes — Design **Status:** Approved (autonomous — user pre-authorized full-pipeline execution; trigger v6.3.0 release after merge) **Date:** 2026-06-01 -**Issues:** #41, #58, #59, #60 (GoCodeAlone/autonomous-dev-kit) +**Issues:** #41, #58, #59, #60, #61 (GoCodeAlone/autonomous-dev-kit) **ADR:** decisions/0003-implement-n-completion-trust-boundary.md (#58) +**Adversarial review:** design cycle 1 = FAIL (1C/3I/3m) → all resolved this revision; #61 added (user-reported same session). ## Problem @@ -27,6 +28,9 @@ autonomous runs + Codex compaction: team-conventions contract. In v1.1 this masked a non-compiling tree + a CI-failing hash regression both reported "done"; only the lead's verification-before-completion caught them. +- **#61** — the `pretool-pr-review-reminder` hook injects its ~12-line gh-version + + Copilot-reviewer `` block on **every** `gh pr create` (observed 7× + in one session) — wasted context for advice the agent already has. ## Goals @@ -48,6 +52,9 @@ One coherent v6.3.0 release hardening the autonomous pipeline against these four is infeasible (see Design): a flipped `Implement: N` is NOT trusted-done until the lead runs `verification-before-completion` (build + test from a clean tree). Strengthen `subagent-driven-development` + `team-conventions` + ADR. +5. **#61** — emit the pr-review reminder **once per session** (deduped via a + `.claude/autodev-state` marker like `session-start`), reset by the PreCompact + hook so it re-emits once after a compaction. ## Non-Goals @@ -66,7 +73,7 @@ docs/plans/2026-04-25-cross-llm-portability-design.md, ADRs 0001/0002.` | guidance | response | |---|---| -| Host-neutral / cross-LLM first | The wrapper fix (#41) is the host-neutral choke-point; the bash poll-loop (#60) and trust-boundary (#58) are host-described with Claude-Code/Codex variants. Bug-class (#59) is pure prose. | +| Host-neutral / cross-LLM first | The wrapper fix (#41) is the host-neutral choke-point. #60 is explicitly **host-scoped**: bash poll-loop under ``, self-poll fallback documented for `` (no false cross-LLM claim). #58 trust-boundary + #59 bug-class + #61 dedup-marker are host-neutral prose/state. #61 degrades to emit-every-time on identity-less hosts. | | Checklist is the floor | #59 row joins the mandatory-scan plan-phase set. | | Don't claim enforcement you can't deliver | #58 explicitly rejects the infeasible hard-block and documents why (ADR) — directly applies the Existence/runtime-validity discipline (#55). | | Strict stdout discipline for hooks | #41 centralizes it in the wrapper, not per-hook. | @@ -76,7 +83,9 @@ docs/plans/2026-04-25-cross-llm-portability-design.md, ADRs 0001/0002.` ### #59 — Auth/authz chain-composition bug-class (plan phase) Add one row to the **plan-phase** checklist of `skills/adversarial-design-review/ -SKILL.md` (after `Config-validation schema rules`): +SKILL.md`. Place it **near the top** of the plan-phase table (right after +`Verification-class mismatch`), not after the plugin-layout/config-schema rows — +auth/authz is a security-class finding and ordering signals priority (m2): ``` | **Auth/authz chain composition** | When the design names an auth/authz chain ("behind the X auth filter", "RBAC-enforced", "admin-only"), walk that chain component-by-component against the plan's actual wiring. For each gate, verify it is enforced **server-side against an authenticated principal**, not shape-matched by a client-asserted value. Flag any gate where the plan's check reads from request/client-supplied input (`evidence.granted_permissions`, a header, a body field) instead of an authenticated subject (`authz.Enforce(authenticatedSubject, …)`). A plan that wires a weaker gate than the design's chain implies = finding. | @@ -88,50 +97,73 @@ design states the intent; the plan is where a weaker gate slips in.) ### #60 — Bash poll-loop as the sanctioned pr-monitoring pattern In `skills/pr-monitoring/SKILL.md`, add a **"Waiting for CI: the sanctioned -pattern"** section near the top of the process and make the bash poll-loop the -recommended default, demoting the long-lived background Agent to a documented -fallback: - -- **Recommended:** a `Bash` tool call with `run_in_background: true` running a - `for`/`until` sleep-loop that polls `gh pr checks` until no check is `pending` - (or a failure/timeout), then prints a settle line and exits. The harness - re-invokes the lead once on exit (≈0 tokens while sleeping). On settle the lead - reads the result and admin-merges on `failures=0`. +pattern"** section near the top of the process. The recommendation is **host-scoped** +(the bash poll-loop is a Claude-Code Bash-`run_in_background` construct; Codex/Cursor +don't expose it identically): + +- **`` — Recommended default:** a `Bash` tool call with + `run_in_background: true` running a bounded sleep-loop that polls `gh pr checks` + until no check is `pending` (or a failure), then prints a settle line and exits. + The harness re-invokes the lead once on exit (≈0 tokens while sleeping). On + settle the lead reads the result and admin-merges on `failures=0`. **Concrete + cap (m3):** bound the loop (e.g. `for i in $(seq 1 120)` × `sleep 30` = 60 min) + so it can never spin forever; on cap, print a timeout line and exit for the lead + to restart. - **Why:** a `run_in_background` **Agent** instructed to sleep-loop tends to return after ~1 cycle and re-complete repeatedly (the agent loop is not a blocking - sleep) — observed early-exiting ~6× (#60). The bash sleep-loop genuinely blocks. -- **Fallback:** the existing background-Agent monitor remains documented for - multi-PR review-comment handling where active fix-and-push is needed; note its - early-exit failure mode and the self-poll fallback. -- Cadence guidance: poll every 30–60s for fast checks; don't sleep past the - prompt-cache window unnecessarily; cap total wait. - -### #41 — Wrapper-level locale + stdout-JSON discipline + sleep) — observed early-exiting ~6× (#60). The bash sleep-loop genuinely blocks + to completion. +- **``:** these hosts keep the existing `` guidance (host's equivalent poll mechanism). Where no + blocking-background-bash exists, the sanctioned fallback is **self-poll on each + lead wakeup** — the lead runs `gh pr checks` once per turn and re-checks on the + next turn — which loses fire-on-event but is reliable. Document this explicitly + so the Codex path isn't left undefined. +- **Fallback (all hosts):** the existing background-Agent monitor stays documented + for multi-PR review-comment handling needing active fix-and-push; note its + early-exit failure mode. + +### #41 — Wrapper-level stdout-JSON discipline (+ keep existing locale logic) Harden `hooks/run-hook.cmd` (Unix portion — the choke-point every hook runs -through): - -1. **Deterministic locale:** before exec, set `LC_ALL`/`LANG` to a locale that is - actually installed — prefer the existing valid locale; if the inherited locale - is `C.UTF-8`/unset and `C.UTF-8` is not installed, fall back to `C` (extends the - existing conditional fallback to also cover unset/empty + `LC_CTYPE`). -2. **Stdout JSON discipline:** run the hook with stdout captured. If `jq` is - available: when the captured stdout is **empty** → emit nothing; when it is - **valid JSON** → emit it verbatim; otherwise → route the whole captured stdout - to **stderr** (diagnostics) and emit nothing (a hook that emits non-JSON to - stdout is always a bug under the Claude-Code/Codex hook protocol, so suppressing - it from stdout cannot break a correct hook). Preserve the hook's exit code. - When `jq` is **absent**, pass stdout through unchanged (don't break hooks on - minimal hosts). - - Note: this changes `exec bash …` to a captured invocation. Stderr passes through - untouched. The wrapper must not add latency beyond the hook's own runtime. - -3. **Regression test:** extend `tests/hook-contracts.sh` (or add - `tests/hook-stdout-discipline.sh`) with cases: (a) a hook that prints a locale - warning then valid JSON → wrapper emits only the JSON; (b) a hook that prints - only noise → wrapper stdout empty, noise on stderr, exit code preserved; (c) a - hook that emits valid JSON → unchanged; (d) jq-absent path → pass-through. +through). The **primary** fix is stdout discipline; the existing locale handling +is already correct and stays. + +1. **Locale (UNCHANGED):** the wrapper already falls back `C.UTF-8 → C` (covering + `LC_ALL`/`LC_CTYPE`/`LANG`) when `C.UTF-8` is inherited but not installed — + keep it as-is. We do **not** extend it to unset/empty locales (no evidence the + Codex failure is an unset locale, and forcing `C` on an otherwise-working + UTF-8 default could corrupt multibyte hook output). The real #41 cause is + diagnostics leaking onto stdout, addressed below — the locale fallback already + prevents the *bash* locale warning; stdout discipline catches whatever else + (perl/git/jq diagnostics, shell tracing) leaks. + +2. **Stdout JSON discipline (the fix):** run the hook with stdout captured (stderr + passes through untouched; stdin still flows from host → hook). Then, when `jq` + is available: + - **empty** stdout → emit nothing. + - stdout is **valid JSON as a whole** → emit it verbatim. + - otherwise → **extract** the hook's JSON rather than suppress it: take the last + line beginning with `{` (`printf '%s\n' "$out" | grep -E '^\{' | tail -1`); + if that line is valid JSON (`jq -e .`), emit **that** on stdout and route the + rest (the leading warning/diagnostic prefix) to **stderr**. If no JSON line is + found, route the whole capture to stderr and emit nothing. + - This is the C1 correction: a `decision:block` JSON preceded by a locale/perl + warning (`warning\n{"decision":"block"}`) is **not** valid-as-a-whole, so the + extraction path delivers the block instead of dropping it. + - Preserve the hook's exit code. When `jq` is **absent**, pass stdout through + unchanged (don't break hooks on minimal hosts). + - **Trailing-newline note (I3):** capturing via `out=$(…)` strips trailing + newlines; re-emitting via `printf '%s\n' "$out"` restores exactly one. JSON + parsers accept one trailing newline — a regression case asserts this. + +3. **Regression test** (`tests/hook-stdout-discipline.sh`, runs the real + `run-hook.cmd` against fixture hooks): (a) warning-then-`{"decision":"block"}` + → wrapper emits **only** the block JSON on stdout, warning on stderr, exit code + preserved; (b) only-noise → stdout empty, noise on stderr; (c) clean valid JSON + → unchanged, exactly one trailing newline; (d) jq-absent (PATH stub) → + pass-through verbatim. Also run `tests/hook-contracts.sh` for no real-hook + regression. ### #58 — Implement-N completion trust boundary (hard-block is infeasible) @@ -145,6 +177,16 @@ code-reviewer` is **not feasible**: - So the hook cannot reliably determine "is taskId an Implement task?" or "is the caller the code-reviewer?" — the two facts the block needs. A heuristic (e.g. block all `completed`) would break legitimate completions. +- **Closest feasible hook approach, also rejected (m1):** a `SubagentStop` hook + *does* receive `transcript_path` (used by `subagent-scope-guard` / + `completion-claim-guard`), so it could parse the transcript for `TaskCreate` + calls to map `taskId → "Implement: N"` subject and flag a matching + `TaskUpdate(status=completed)`. Rejected as too unreliable: it only sees + *subagent* completions (not a lead-level `TaskUpdate`), the transcript JSONL + shape is not a stable API surface, and event ordering across subagents is racy. + It would also only *warn after the fact* (SubagentStop fires post-completion), + not block. So it does not deliver the asked-for enforcement and adds a fragile + transcript-format dependency for marginal value. **Feasible fix — shift the trust boundary (ADR 0003):** the harm in v1.1 was not *who flipped the bit* but that a flipped `Implement: N` was **trusted as done** @@ -161,7 +203,34 @@ while the tree didn't compile / CI failed. So: - ADR 0003 records: rejected the infeasible hard-block; chose the verification trust-boundary; documented the hook-payload limitation so it isn't re-proposed. -## Security Review +### #61 — pr-review reminder: once per session, reset on compaction + +`hooks/pretool-pr-review-reminder` currently emits its `` block on every +`gh pr create`. Add session-dedup mirroring `hooks/session-start`'s +`session-start-seen` precedent: + +- Derive a session key from `transcript_path` basename (the same key + `pre-compact-snapshot`/`session-start` use). Marker file: + `${cwd}/.claude/autodev-state/pr-reminder-seen` containing the session key(s) + already reminded. +- On a matched `gh pr create`: if the current session key is already in the marker + → `exit 0` silently (no emit). Else → emit the reminder once and append the + session key to the marker. +- **Reset on compaction:** `hooks/pre-compact-snapshot` (PreCompact) removes the + current session's key from `pr-reminder-seen` (or deletes the marker) so the + next `gh pr create` after a compaction re-emits exactly once — the + post-compaction context lost the earlier reminder. +- Degrade gracefully: no `transcript_path` (hookless/identity-less host) → fall + back to current behavior (emit every time) rather than suppress, so a host that + can't dedup still gets the advice. +- **Match precision (minor, in-scope):** tighten the trigger so it matches an + actual `gh pr create` invocation, not the literal substring inside a quoted + argument (e.g. an issue/PR *body* that mentions `gh pr create`) — anchor on a + word boundary / command position. (Observed: this design's own tracking-issue + body tripped the hook.) +- **Regression test:** two `gh pr create` payloads with the same session key → + emitted once; a `pre-compact-snapshot` run between them → emitted again + (post-compact reset); no `transcript_path` → emits each time. - **#41** is security-adjacent: a hook that leaks non-JSON to stdout can corrupt the host's view of a *block* decision (e.g. a scope-guard `{"decision":"block"}` @@ -186,6 +255,10 @@ the existing `release-tag.yml` (push to main touching `.claude-plugin/plugin.jso asserts stdout/stderr/exit-code — the real wrapper↔hook boundary, not a mock. Also run the existing `tests/hook-contracts.sh` to confirm no regression to the real hooks (session-start, scope-guard, completion-claim-guard, pre-compact). +- **#61 reminder dedup:** the regression test runs the real + `pretool-pr-review-reminder` against two same-session `gh pr create` payloads + (emit once) + a `pre-compact-snapshot` between them (re-emit once) + a + no-`transcript_path` payload (emit each time) — the real hook↔state boundary. - **#59/#60/#58:** `tests/skill-content-grep.sh` (host-neutral lint) + `tests/skill-cross-refs.sh` pass on the edited skills; plan-phase reviewers will enumerate the new class because the plan-phase checklist is embedded in their @@ -197,7 +270,8 @@ the existing `release-tag.yml` (push to main touching `.claude-plugin/plugin.jso |---|---|---|---| | A1 | All autodev hooks emit JSON-or-nothing on stdout (Claude-Code/Codex protocol) | A hook might emit intentional plain-text stdout | Verified: every registered hook emits `{…}` JSON or nothing. If a future hook needs plain-text stdout it must opt out — but none do today. | | A2 | The PreToolUse payload lacks task subject + caller identity | Harness could add it later | Verified against the documented hook payload; if it ever exposes both, the hard-block becomes feasible and ADR 0003 should be revisited. | -| A3 | A `run_in_background` Bash sleep-loop genuinely blocks + re-invokes the lead once on exit | Host could change background semantics | Directly observed working across this session's many CI waits ([[feedback_ci_wait_use_bash_poll_loop]]); documented as host-described. | +| A3 | A `run_in_background` Bash sleep-loop genuinely blocks + re-invokes the lead once on exit (Claude Code) | Codex/Cursor lack this primitive | Directly observed working across this session's many CI waits ([[feedback_ci_wait_use_bash_poll_loop]]); scoped to ``, with the self-poll fallback documented for other hosts. | +| A6 | #41 last-JSON-line extraction recovers a block decision behind a warning | A hook could emit multi-line pretty-printed JSON (not single-line) | autodev hooks all emit single-line `jq -n` JSON (one object, one line); the `grep '^{' | tail -1` extraction matches. If a future hook pretty-prints, the whole-blob `jq .` path still handles the no-warning case; the test covers the warning-prefixed single-line case. | | A4 | The wrapper can capture stdout without breaking hooks that read stdin | Hooks read the harness JSON from stdin, not the wrapper | The wrapper only redirects the hook's *stdout*; stdin still flows from the host to the hook unchanged. | | A5 | v6.3.0 is the right bump + the tag is free | Could collide (the #804 lesson) | Verified `git ls-remote --tags` shows v6.3.0 free; current 6.2.2 → 6.3.0 minor. | From 0a23755d782f56e010c989d8e184cce6b4058e14 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:18:23 -0400 Subject: [PATCH 03/21] docs: revise v6.3.0 design per adversarial cycle 2 (I-NEW-1 pre-compact clear placement before early-exit) --- ...06-01-pipeline-hardening-4issues-design.md | 31 ++++++++++++------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md index 662dfaf..79381e8 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md @@ -4,7 +4,7 @@ **Date:** 2026-06-01 **Issues:** #41, #58, #59, #60, #61 (GoCodeAlone/autonomous-dev-kit) **ADR:** decisions/0003-implement-n-completion-trust-boundary.md (#58) -**Adversarial review:** design cycle 1 = FAIL (1C/3I/3m) → all resolved this revision; #61 added (user-reported same session). +**Adversarial review:** cycle 1 = FAIL (1C/3I/3m, all resolved); #61 added (user-reported same session). cycle 2 = FAIL (1 new Important: pre-compact-snapshot early-exit would skip the #61 marker-clear → fixed: clear placed unconditionally before the early-exit) → resolved this revision. ## Problem @@ -209,17 +209,25 @@ while the tree didn't compile / CI failed. So: `gh pr create`. Add session-dedup mirroring `hooks/session-start`'s `session-start-seen` precedent: -- Derive a session key from `transcript_path` basename (the same key - `pre-compact-snapshot`/`session-start` use). Marker file: - `${cwd}/.claude/autodev-state/pr-reminder-seen` containing the session key(s) - already reminded. +- Derive a session key from `transcript_path` basename — the same value + `pre-compact-snapshot` already computes as `session_key` for session-locks + attribution (note: this is *not* `session-start`'s primary dedup key, which is + `session_id:source_kind`; basename is a sufficient, stable per-session key for a + file marker). Marker file: `${cwd}/.claude/autodev-state/pr-reminder-seen` + containing the session key(s) already reminded. - On a matched `gh pr create`: if the current session key is already in the marker → `exit 0` silently (no emit). Else → emit the reminder once and append the session key to the marker. -- **Reset on compaction:** `hooks/pre-compact-snapshot` (PreCompact) removes the - current session's key from `pr-reminder-seen` (or deletes the marker) so the - next `gh pr create` after a compaction re-emits exactly once — the - post-compaction context lost the earlier reminder. +- **Reset on compaction (I-NEW-1 — placement is load-bearing):** + `hooks/pre-compact-snapshot` removes the current session's key from + `pr-reminder-seen` so the next `gh pr create` after a compaction re-emits once. + This clear MUST be placed **unconditionally near the top of pre-compact-snapshot + — immediately after `session_key`/`cwd_dir` are computed (≈ line 30) and BEFORE + the `[ -z "$state_section" ] && exit 0` no-locked-plans early-exit**. Co-locating + it with the emit block (after the early-exit) would silently skip the clear in + the common no-locked-plan case, leaving the reminder permanently silenced after + the first PR. The regression test asserts the clear fires even when there are no + locked plans. - Degrade gracefully: no `transcript_path` (hookless/identity-less host) → fall back to current behavior (emit every time) rather than suppress, so a host that can't dedup still gets the advice. @@ -229,8 +237,9 @@ while the tree didn't compile / CI failed. So: word boundary / command position. (Observed: this design's own tracking-issue body tripped the hook.) - **Regression test:** two `gh pr create` payloads with the same session key → - emitted once; a `pre-compact-snapshot` run between them → emitted again - (post-compact reset); no `transcript_path` → emits each time. + emitted once; a `pre-compact-snapshot` run between them **with no locked plans + present** (the early-exit case) → still clears the marker so the reminder is + emitted again (post-compact reset); no `transcript_path` → emits each time. - **#41** is security-adjacent: a hook that leaks non-JSON to stdout can corrupt the host's view of a *block* decision (e.g. a scope-guard `{"decision":"block"}` From 85f58ac8fa2a24c2f897d15bd3d3083cb1a612a8 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:20:30 -0400 Subject: [PATCH 04/21] docs: v6.3.0 design PASS adversarial cycle-3 (converged; empty-session_key guard) --- docs/plans/2026-06-01-pipeline-hardening-4issues-design.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md index 79381e8..cd570f1 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md @@ -4,7 +4,7 @@ **Date:** 2026-06-01 **Issues:** #41, #58, #59, #60, #61 (GoCodeAlone/autonomous-dev-kit) **ADR:** decisions/0003-implement-n-completion-trust-boundary.md (#58) -**Adversarial review:** cycle 1 = FAIL (1C/3I/3m, all resolved); #61 added (user-reported same session). cycle 2 = FAIL (1 new Important: pre-compact-snapshot early-exit would skip the #61 marker-clear → fixed: clear placed unconditionally before the early-exit) → resolved this revision. +**Adversarial review:** cycle 1 = FAIL (1C/3I/3m, all resolved); #61 added (user-reported same session). cycle 2 = FAIL (1 new Important: pre-compact-snapshot early-exit would skip the #61 marker-clear → fixed: clear placed unconditionally before the early-exit). cycle 3 = PASS (converged; applied the empty-session_key guard). ## Problem @@ -227,7 +227,9 @@ while the tree didn't compile / CI failed. So: it with the emit block (after the early-exit) would silently skip the clear in the common no-locked-plan case, leaving the reminder permanently silenced after the first PR. The regression test asserts the clear fires even when there are no - locked plans. + locked plans. Guard the clear with `[ -n "$session_key" ]` — when + `transcript_path` is absent (`session_key=""`), skip the clear (no blanket file + wipe), matching the reminder's emit-every-time degradation on identity-less hosts. - Degrade gracefully: no `transcript_path` (hookless/identity-less host) → fall back to current behavior (emit every time) rather than suppress, so a host that can't dedup still gets the advice. From 8875c9bc2474370d3880a2e6389b4f279cbc251a Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:24:01 -0400 Subject: [PATCH 05/21] docs: implementation plan for v6.3.0 pipeline hardening (#41/#58/#59/#60/#61) --- .../2026-06-01-pipeline-hardening-4issues.md | 462 ++++++++++++++++++ 1 file changed, 462 insertions(+) create mode 100644 docs/plans/2026-06-01-pipeline-hardening-4issues.md diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues.md b/docs/plans/2026-06-01-pipeline-hardening-4issues.md new file mode 100644 index 0000000..a2893d4 --- /dev/null +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues.md @@ -0,0 +1,462 @@ +# Autodev Pipeline Hardening (v6.3.0) Implementation Plan + +> **For the implementing agent:** REQUIRED SUB-SKILL: Use autodev:executing-plans to implement this plan task-by-task. + +**Goal:** Ship 5 recurring-gate-miss / context-waste fixes (#41/#58/#59/#60/#61) as one v6.3.0 release. + +**Architecture:** Skill-doc additions (#59 bug-class, #60 pr-monitoring pattern, #58 trust-boundary) + two hook changes through the choke-points (#41 `run-hook.cmd` stdout-JSON discipline; #61 `pretool-pr-review-reminder` session-dedup + `pre-compact-snapshot` marker-clear) + a CI workflow that runs the hook regression tests + a version bump that triggers `release-tag.yml`. + +**Tech Stack:** Markdown skill files; bash hooks + `jq`; bash test harness (`tests/hook-contracts.sh`); GitHub Actions. + +**Base branch:** main + +--- + +## Scope Manifest + +**PR Count:** 1 +**Tasks:** 7 +**Estimated Lines of Change:** ~320 (informational; not enforced) + +**Out of scope:** +- A deterministic hook that *blocks* `TaskUpdate(status=completed)` on `Implement:*` (infeasible — ADR 0003). +- Any advisory `TaskUpdate` hook for #58 (rejected — noisy, can't identify Implement tasks). +- Extending the wrapper locale logic to unset/empty locales (#41 keeps existing locale handling). +- Any new pr-monitoring tool / Codex-specific background primitive (#60 documents host-scoped patterns only). +- Changing hook event registrations in `hooks.json` (the wrapper change is transparent; only the new `hooks-check.yml` workflow is added). + +**PR Grouping:** + +| PR # | Title | Tasks | Branch | +|------|-------|-------|--------| +| 1 | feat: v6.3.0 pipeline hardening — auth/authz bug-class, pr-monitoring pattern, completion trust-boundary, hook stdout-JSON discipline, reminder dedup (#41/#58/#59/#60/#61) | Task 1, Task 2, Task 3, Task 4, Task 5, Task 6, Task 7 | feat/pipeline-hardening-4issues-v6.3.0 | + +**Status:** Draft + +--- + +## Project Design Guidance + +`Guidance: none at docs/design-guidance.md; canon = README §Cross-LLM + ADRs 0001/0002/0003.` Mapping: +- Host-neutral → #41 wrapper + #59/#58/#61 are host-neutral; #60 explicitly host-scoped (verified by `tests/skill-content-grep.sh` host-token lint). +- Don't claim enforcement you can't deliver → #58 (ADR 0003) + #41 (a test that runs in CI, Task 6). +- Checklist is the floor → #59 joins the mandatory plan-phase scan. + +--- + +### Task 1: #59 — auth/authz chain-composition bug-class (plan-phase) + +**Files:** +- Modify: `skills/adversarial-design-review/SKILL.md` (plan-phase table, insert right after the `Verification-class mismatch` row) + +**Step 1: Verify the insertion anchor** +Run: `grep -n "Verification-class mismatch" skills/adversarial-design-review/SKILL.md` +Expected: one match in the plan-phase table (lines ~104-113). The new row goes immediately after it. + +**Step 2: Insert the row** (exact, after the `Verification-class mismatch` row): +``` +| **Auth/authz chain composition** | When the design names an auth/authz chain ("behind the X auth filter", "RBAC-enforced", "admin-only"), walk that chain component-by-component against the plan's actual wiring. For each gate, verify it is enforced **server-side against an authenticated principal**, not shape-matched by a client-asserted value. Flag any gate where the plan's check reads from request/client-supplied input (`evidence.granted_permissions`, a header, a body field) instead of an authenticated subject (`authz.Enforce(authenticatedSubject, …)`). A plan that wires a weaker gate than the design's chain implies = finding. | +``` + +**Step 3: Verify placement + lint** +Run: `awk '/Bug-class checklist — plan phase/{p=1} p && /Auth\/authz chain composition/{print NR": ok"}' skills/adversarial-design-review/SKILL.md` +Expected: one line (row is in the plan-phase section). +Run: `bash tests/skill-content-grep.sh 2>&1 | tail -1` +Expected: `PASS: …`. + +**Step 4: Commit** +```bash +git add skills/adversarial-design-review/SKILL.md +git commit -m "feat(adversarial-design-review): add auth/authz chain-composition plan-phase bug-class (#59)" +``` +Rollback: revert commit (additive prose row). + +--- + +### Task 2: #60 — pr-monitoring bash poll-loop sanctioned pattern + +**Files:** +- Modify: `skills/pr-monitoring/SKILL.md` (add a "Waiting for CI: the sanctioned pattern" subsection near the top of the process, after the overview) + +**Step 1: Add the section** (host-scoped, per design §#60). Insert after the "## When to Use" / before "## The Process" (or near the top of the process): + +```markdown +## Waiting for CI: the sanctioned pattern + + +**Recommended default — a bash poll-loop, not a long-lived monitor Agent.** Issue a +`Bash` tool call with `run_in_background: true` running a **bounded** sleep-loop that +polls `gh pr checks ` until no check is `pending` (or a failure), prints a settle +line, and exits. The harness re-invokes the lead **once** on exit (≈0 tokens while it +sleeps); the lead then reads the result and admin-merges on `failures=0`. + +Bound the loop so it can never spin forever, e.g.: +```bash +for i in $(seq 1 120); do # 120 × 30s = 60 min cap + raw=$(gh pr checks --json bucket 2>/dev/null) + fail=$(echo "$raw" | jq '[.[]|select(.bucket=="fail")]|length') + pend=$(echo "$raw" | jq '[.[]|select(.bucket=="pending")]|length') + { [ "$fail" != 0 ] && [ -n "$fail" ]; } && { echo "FAILURES=$fail"; break; } + { [ "$pend" = 0 ] || [ -z "$pend" ]; } && { echo "SETTLED"; break; } + sleep 30 +done +``` +On the 120-iteration cap, print a timeout line and exit for the lead to restart. + +**Why not a background Agent:** a `run_in_background` **Agent** told to sleep-loop +tends to return after ~1 cycle and re-complete repeatedly (the agent loop is not a +blocking sleep) — observed early-exiting ~6× in one run. The bash sleep-loop genuinely +blocks to completion. + + + +Use your host's equivalent poll mechanism. Where no blocking-background-bash exists, +the sanctioned fallback is **self-poll on each lead wakeup**: run `gh pr checks ` +once per turn and re-check next turn. This loses fire-on-event but is reliable; do not +dispatch a background Agent expecting it to block on a sleep-loop. + + +**Fallback (all hosts):** the background-Agent monitor below remains documented for +multi-PR review-comment handling needing active fix-and-push; note its early-exit +failure mode and prefer the poll-loop / self-poll for pure CI-wait. +``` + +**Step 2: Verify lint + cross-refs** +Run: `bash tests/skill-content-grep.sh 2>&1 | tail -1` → `PASS` +Run: `bash tests/skill-cross-refs.sh 2>&1 | tail -1` → `PASS: all cross-skill references resolve.` + +**Step 3: Commit** +```bash +git add skills/pr-monitoring/SKILL.md +git commit -m "docs(pr-monitoring): sanction the bash poll-loop CI-wait pattern, host-scoped (#60)" +``` +Rollback: revert commit. + +--- + +### Task 3: #58 — Implement-N completion trust boundary + +**Files:** +- Modify: `skills/subagent-driven-development/SKILL.md` (add a "Completion is not trusted until lead-verified" rule) +- Modify: `agents/team-conventions.md` (restate the implementer/code-reviewer rules + the lead trust-boundary) +- (ADR `decisions/0003-implement-n-completion-trust-boundary.md` already committed with the design.) + +**Step 1: Add the trust-boundary rule to subagent-driven-development** — near the "Red Flags"/completion section, add: +```markdown +### Completion is not trusted until lead-verified + +A green `Implement: N` checkbox is a **claim, not evidence**. Regardless of who or +what flips an `Implement: N` task to `completed` (implementer, a blockedBy-clear, or +the code-reviewer), the lead MUST run `autodev:verification-before-completion` — +build + test from a clean tree, CI green — before treating that task as truly done or +invoking `finishing-a-development-branch`. The team-conventions "only code-reviewer +flips Implement-N" rule remains team discipline, but correctness rests on lead +verification, which does not depend on who flipped the bit. (A deterministic hook that +blocks the flip is infeasible — the PreToolUse payload lacks the task subject and +caller identity; see decisions/0003-implement-n-completion-trust-boundary.md.) +``` + +**Step 2: Add the rules to team-conventions.md** — under the Implementer + a Code-reviewer/Lead section: +- Implementer: "Never self-complete an `Implement: N` task. DM the spec-reviewer when ready; the code-reviewer is the sole flipper." +- Lead: "Treat any `completed` Implement-N as a claim; run `verification-before-completion` (clean build + tests + CI green) before accepting it or finishing the branch." + +**Step 3: Verify lint + cross-refs** +Run: `bash tests/skill-content-grep.sh 2>&1 | tail -1` → `PASS` +Run: `bash tests/skill-cross-refs.sh 2>&1 | tail -1` → `PASS` + +**Step 4: Commit** +```bash +git add skills/subagent-driven-development/SKILL.md agents/team-conventions.md +git commit -m "docs(subagent-driven-development): completion trust-boundary for Implement-N (#58, ADR 0003)" +``` +Rollback: revert commit. + +--- + +### Task 4: #41 — run-hook.cmd stdout-JSON discipline + test + +**Files:** +- Modify: `hooks/run-hook.cmd` (Unix portion — replace the final `exec bash …` with a capture-then-discipline block; keep the locale logic unchanged) +- Create: `tests/hook-stdout-discipline.sh` + +**Step 1: Write the failing test** (`tests/hook-stdout-discipline.sh`) — runs the REAL `run-hook.cmd` against fixture hooks: +```bash +#!/usr/bin/env bash +# tests/hook-stdout-discipline.sh — verify run-hook.cmd enforces stdout JSON discipline. +set -uo pipefail +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +WRAPPER="$REPO_ROOT/hooks/run-hook.cmd" +failures=0 +pass() { printf 'PASS: %s\n' "$*"; } +fail() { printf 'FAIL: %s\n' "$*" >&2; failures=$((failures+1)); } +command -v jq >/dev/null 2>&1 || { echo "SKIP: jq required"; exit 0; } + +tmp="$(mktemp -d)"; trap 'rm -rf "$tmp"' EXIT +mkfix() { printf '%s\n' "$1" > "$tmp/$2"; chmod +x "$tmp/$2"; } + +# Fixture A: locale warning to stderr-or-stdout then a block JSON on stdout. +mkfix '#!/usr/bin/env bash +echo "perl: warning: Setting locale failed." # leaks to stdout +printf "%s\n" "{\"decision\":\"block\",\"reason\":\"x\"}"' fix-warn-then-json +# Fixture B: only noise, no JSON. +mkfix '#!/usr/bin/env bash +echo "just a diagnostic line"' fix-noise +# Fixture C: clean single-line JSON. +mkfix '#!/usr/bin/env bash +printf "%s\n" "{\"hookSpecificOutput\":{\"hookEventName\":\"X\"}}"' fix-clean + +run() { local out err rc; out="$("$WRAPPER" "$1" 2>"$tmp/err")"; rc=$?; printf '%s' "$out" > "$tmp/out"; err="$(cat "$tmp/err")"; OUT="$out"; ERR="$err"; RC=$rc; } + +# Point the wrapper at the tmp fixtures by copying them next to run-hook.cmd is +# heavy; instead invoke the wrapper with a fixture in hooks/ via a temp symlink dir. +# Simplest: call the wrapper with HOOK_DIR override is not supported, so copy fixtures +# into hooks/ under a unique prefix and clean up. +for f in fix-warn-then-json fix-noise fix-clean; do cp "$tmp/$f" "$REPO_ROOT/hooks/$f"; done +trap 'rm -f "$REPO_ROOT"/hooks/fix-warn-then-json "$REPO_ROOT"/hooks/fix-noise "$REPO_ROOT"/hooks/fix-clean; rm -rf "$tmp"' EXIT + +# (a) warning + block JSON → stdout is ONLY the block JSON, valid, warning on stderr. +run fix-warn-then-json +if printf '%s' "$OUT" | jq -e '.decision=="block"' >/dev/null 2>&1 \ + && ! printf '%s' "$OUT" | grep -q 'perl: warning'; then + pass "(a) block JSON delivered, warning stripped from stdout" +else fail "(a) expected only block JSON on stdout, got: $OUT"; fi + +# (b) only noise → stdout empty. +run fix-noise +[ -z "$OUT" ] && pass "(b) noise suppressed from stdout" || fail "(b) expected empty stdout, got: $OUT" + +# (c) clean JSON → unchanged + valid. +run fix-clean +printf '%s' "$OUT" | jq -e '.hookSpecificOutput.hookEventName=="X"' >/dev/null 2>&1 \ + && pass "(c) clean JSON passthrough" || fail "(c) clean JSON broke, got: $OUT" + +echo ""; echo "Results: $failures failure(s)"; [ "$failures" -eq 0 ] +``` + +> Implementation note: the fixtures are copied into `hooks/` (the wrapper resolves +> scripts relative to its own dir) and cleaned up in the trap. The jq-absent path is +> covered by the early `command -v jq` SKIP (the wrapper's own jq-absent branch is +> exercised manually in the design's case (d); a PATH-stub case may be added). + +**Step 2: Run → FAIL** (wrapper still `exec bash`, doesn't strip the warning): +Run: `bash tests/hook-stdout-discipline.sh 2>&1 | tail -6` +Expected: `(a)` FAILs (stdout contains the perl warning + JSON). + +**Step 3: Implement the wrapper discipline** — replace the final `exec bash "${SCRIPT_DIR}/${SCRIPT_NAME}" "$@"` line of `hooks/run-hook.cmd` with: +```bash +# Run the hook with stdout captured (stderr + stdin pass through untouched). +# Enforce stdout JSON discipline: only valid-JSON-or-empty reaches the host's hook +# parser; diagnostics that leak onto stdout (locale/perl/git warnings) are routed to +# stderr. A block decision preceded by a warning is recovered, not dropped (#41). +if command -v jq >/dev/null 2>&1; then + hook_out="$(bash "${SCRIPT_DIR}/${SCRIPT_NAME}" "$@")" + hook_rc=$? + if [ -z "$hook_out" ]; then + : # nothing to emit + elif printf '%s' "$hook_out" | jq -e . >/dev/null 2>&1; then + printf '%s\n' "$hook_out" # valid JSON as a whole + else + json_line="$(printf '%s\n' "$hook_out" | grep -E '^\{' | tail -1)" + if [ -n "$json_line" ] && printf '%s' "$json_line" | jq -e . >/dev/null 2>&1; then + printf '%s\n' "$hook_out" | grep -vF "$json_line" >&2 # diagnostics → stderr + printf '%s\n' "$json_line" # recovered JSON → stdout + else + printf '%s\n' "$hook_out" >&2 # all noise → stderr + fi + fi + exit "$hook_rc" +else + exec bash "${SCRIPT_DIR}/${SCRIPT_NAME}" "$@" # jq absent: pass through unchanged +fi +``` + +**Step 4: Run → PASS** +Run: `bash tests/hook-stdout-discipline.sh 2>&1 | tail -6` +Expected: `Results: 0 failure(s)` (all of a/b/c pass). +Run: `bash tests/hook-contracts.sh 2>&1 | tail -3` +Expected: existing hook contracts still pass (no regression to real hooks through the wrapper). + +**Step 5: Commit** +```bash +git add hooks/run-hook.cmd tests/hook-stdout-discipline.sh +git commit -m "fix(run-hook.cmd): enforce stdout JSON discipline, recover block decisions behind warnings (#41)" +``` +Rollback: revert `run-hook.cmd` to the `exec bash` form + re-run `tests/hook-contracts.sh`. Hook-path change — verified by the regression test (Task 6 CI-gates it). + +--- + +### Task 5: #61 — pr-review reminder dedup + pre-compact clear + +**Files:** +- Modify: `hooks/pretool-pr-review-reminder` (session-dedup + tighter match) +- Modify: `hooks/pre-compact-snapshot` (clear the marker, unconditionally, before the early-exit) +- Modify: `tests/hook-contracts.sh` (dedup + post-compact-reset cases) + +**Step 1: Add the dedup + post-compact tests to `tests/hook-contracts.sh`** (reuse `run_hook` + a tmp cwd): +```bash +test_pr_reminder_dedup() { + local tmp; tmp="$(mktemp -d)" + local sess='/x/transcripts/sess-abc.jsonl' + local payload='{"tool_name":"Bash","tool_input":{"command":"gh pr create --title t --body b"},"cwd":"'"$tmp"'","transcript_path":"'"$sess"'"}' + # first call emits + local out1; out1="$(run_hook pretool-pr-review-reminder "$payload")" + printf '%s' "$out1" | jq -e '.hookSpecificOutput.additionalContext' >/dev/null 2>&1 \ + || fail "pr-reminder: first call should emit" + # second call (same session) is silent + local out2; out2="$(run_hook pretool-pr-review-reminder "$payload")" + [ -z "$out2" ] && pass "pr-reminder: deduped within session" \ + || fail "pr-reminder: second call should be silent, got: $out2" + # a PreCompact run with NO locked plans must still clear the marker + run_hook pre-compact-snapshot '{"cwd":"'"$tmp"'","transcript_path":"'"$sess"'"}' >/dev/null 2>&1 || true + local out3; out3="$(run_hook pretool-pr-review-reminder "$payload")" + printf '%s' "$out3" | jq -e '.hookSpecificOutput.additionalContext' >/dev/null 2>&1 \ + && pass "pr-reminder: re-emits after PreCompact reset" \ + || fail "pr-reminder: should re-emit after compaction, got: $out3" + rm -rf "$tmp" +} +# (call test_pr_reminder_dedup in the main run sequence) +``` +Also: a no-`transcript_path` payload still emits each time (degrade gracefully). + +**Step 2: Run → FAIL** (reminder has no dedup yet): +Run: `bash tests/hook-contracts.sh 2>&1 | grep -i "pr-reminder" | head` +Expected: the "deduped within session" case FAILs (second call still emits). + +**Step 3: Implement dedup in `pretool-pr-review-reminder`** — after computing `cmd` and confirming `gh pr create`, before emitting: +- Tighten the match: `printf '%s' "$cmd" | grep -Eq '(^|[;&| ])gh +pr +create([ ]|$)'` (command position, not substring inside a quoted body). +- Compute `session_key`: `transcript_path=$(… jq -r .transcript_path …); session_key=""; [ -n "$transcript_path" ] && session_key=$(basename "$transcript_path")`. +- Marker: `marker="${cwd_dir}/.claude/autodev-state/pr-reminder-seen"`. +- If `[ -n "$session_key" ]` and `grep -qxF "$session_key" "$marker" 2>/dev/null` → `exit 0` (already reminded). Else emit, and if `[ -n "$session_key" ]` append: `mkdir -p "$(dirname "$marker")"; printf '%s\n' "$session_key" >> "$marker"`. +- No `session_key` (no transcript) → emit every time (current behavior). + +**Step 4: Implement the clear in `pre-compact-snapshot`** — UNCONDITIONALLY, immediately after `session_key` is computed (≈ line 30) and BEFORE the `[ -z "$state_section" ] && exit 0` early-exit: +```bash +# #61: clear this session's pr-review-reminder marker so the reminder re-emits once +# after compaction (the post-compaction context lost the earlier reminder). Must run +# before the no-locked-plans early-exit. Guard on a non-empty session key. +if [ -n "${session_key:-}" ]; then + reminder_marker="${cwd_dir}/.claude/autodev-state/pr-reminder-seen" + if [ -f "$reminder_marker" ]; then + grep -vxF "$session_key" "$reminder_marker" > "${reminder_marker}.tmp" 2>/dev/null \ + && mv "${reminder_marker}.tmp" "$reminder_marker" || rm -f "${reminder_marker}.tmp" + fi +fi +``` + +**Step 5: Run → PASS** +Run: `bash tests/hook-contracts.sh 2>&1 | tail -4` +Expected: all pass, including `pr-reminder: deduped within session` + `pr-reminder: re-emits after PreCompact reset`. + +**Step 6: Commit** +```bash +git add hooks/pretool-pr-review-reminder hooks/pre-compact-snapshot tests/hook-contracts.sh +git commit -m "fix(hooks): pr-review reminder once-per-session + PreCompact reset (#61)" +``` +Rollback: revert commit (hooks restore prior emit-every-time behavior). + +--- + +### Task 6: CI-gate the hook regression tests + +**Files:** +- Create: `.github/workflows/hooks-check.yml` + +**Rationale:** #41 is a hook-reliability fix; its regression test must actually run in CI (a test that never runs is theater — the existence/runtime-validity discipline). No workflow currently runs the hook tests. + +**Step 1: Add the workflow** +```yaml +name: Hooks Check +on: + push: + paths: + - 'hooks/**' + - 'tests/hook-contracts.sh' + - 'tests/hook-stdout-discipline.sh' + - '.github/workflows/hooks-check.yml' + pull_request: + paths: + - 'hooks/**' + - 'tests/hook-contracts.sh' + - 'tests/hook-stdout-discipline.sh' +jobs: + hooks: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Install jq + run: sudo apt-get update && sudo apt-get install -y jq + - name: Hook contract tests + run: bash tests/hook-contracts.sh + - name: Hook stdout discipline tests + run: bash tests/hook-stdout-discipline.sh +``` + +**Step 2: Verify both test scripts currently pass locally (the gate is real)** +Run: `bash tests/hook-contracts.sh 2>&1 | tail -2 && bash tests/hook-stdout-discipline.sh 2>&1 | tail -2` +Expected: both report 0 failures (after Tasks 4+5). + +**Step 3: Lint the workflow YAML** +Run: `python3 -c "import yaml; yaml.safe_load(open('.github/workflows/hooks-check.yml'))" && echo "YAML OK"` +Expected: `YAML OK`. + +**Step 4: Commit** +```bash +git add .github/workflows/hooks-check.yml +git commit -m "ci: run hook contract + stdout-discipline tests on hooks/tests changes (#41/#61)" +``` +Rollback: revert commit (removes the workflow). + +--- + +### Task 7: Version bump → v6.3.0 + RELEASE-NOTES + +**Files:** +- Modify: `.claude-plugin/plugin.json`, `.claude-plugin/marketplace.json`, `.cursor-plugin/plugin.json` (via `scripts/bump-version.sh`) +- Modify: `RELEASE-NOTES.md` + +**Step 1: Confirm start version + the tag is free** +Run: `grep -m1 '"version"' .claude-plugin/plugin.json; git ls-remote --tags origin refs/tags/v6.3.0` +Expected: `6.2.2`; `git ls-remote` prints nothing (v6.3.0 free). + +**Step 2: Bump** +Run: `scripts/bump-version.sh 6.3.0` +Expected: `Bumping version: 6.2.2 → 6.3.0` across the 3 manifests. + +**Step 3: Consistency gate (the exact check `release-tag.yml` runs)** +Run: `bash tests/version-check.sh` +Expected: `OK: All version files agree on version 6.3.0`. + +**Step 4: Add RELEASE-NOTES entry** — a `## v6.3.0 — 2026-06-01` section at the top (below the title), summarizing the 5 fixes (#41/#58/#59/#60/#61). + +**Step 5: Commit** +```bash +git add .claude-plugin/plugin.json .claude-plugin/marketplace.json .cursor-plugin/plugin.json RELEASE-NOTES.md +git commit -m "chore: bump version to 6.3.0 (#41/#58/#59/#60/#61)" +``` +Rollback: `scripts/bump-version.sh 6.3.0 6.2.2` + revert; pre-merge revert prevents the v6.3.0 tag (release fires only on merge-to-main touching plugin.json). + +> After merge: `release-tag.yml` auto-tags v6.3.0 + dispatches the marketplace. Create the GH Release from the RELEASE-NOTES v6.3.0 section (tag-only is the workflow's behavior; v6.2.0/6.2.2 precedent). + +--- + +## Verification summary (change-class mapping) + +| Task | Change class | Verification | Expected | +|---|---|---|---| +| 1 | Documentation (skill) | skill-content-grep + awk placement | PASS; row in plan-phase section | +| 2 | Documentation (skill) | skill-content-grep + skill-cross-refs | PASS | +| 3 | Documentation (skill+agent) | skill-content-grep + skill-cross-refs | PASS | +| 4 | Hook (wrapper) | `tests/hook-stdout-discipline.sh` + `tests/hook-contracts.sh` | 0 failures; block JSON recovered behind warning | +| 5 | Hook (reminder+precompact) | `tests/hook-contracts.sh` dedup+reset cases | 0 failures; deduped + re-emits post-compact | +| 6 | CI workflow | YAML lint + both hook test scripts pass locally | YAML OK; 0 failures | +| 7 | Version pin | `tests/version-check.sh` | all 3 manifests agree on 6.3.0 | + +## Multi-Component / Integration proof + +The real boundaries: (a) wrapper↔hook — Task 4's `hook-stdout-discipline.sh` runs the +**real** `run-hook.cmd` against fixture hooks (warning+JSON, noise, clean) and asserts +stdout/stderr; `hook-contracts.sh` runs the real registered hooks through the wrapper +(no regression). (b) reminder↔state↔pre-compact — Task 5's tests run the **real** +`pretool-pr-review-reminder` + `pre-compact-snapshot` against a shared tmp +`.claude/autodev-state`, asserting dedup + post-compact reset. (c) Task 6 CI-gates both +so the hook fixes can't silently regress. From 97c42085521252a785fcb23784c731bfcdba76a0 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:31:18 -0400 Subject: [PATCH 06/21] docs: revise v6.3.0 plan per adversarial plan-phase (I1 quote-strip + I2 jq-absent test + I3 trap/no-transcript + m1) --- .../2026-06-01-pipeline-hardening-4issues.md | 61 ++++++++++++++----- 1 file changed, 46 insertions(+), 15 deletions(-) diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues.md b/docs/plans/2026-06-01-pipeline-hardening-4issues.md index a2893d4..c0e200f 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues.md +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues.md @@ -191,7 +191,9 @@ pass() { printf 'PASS: %s\n' "$*"; } fail() { printf 'FAIL: %s\n' "$*" >&2; failures=$((failures+1)); } command -v jq >/dev/null 2>&1 || { echo "SKIP: jq required"; exit 0; } -tmp="$(mktemp -d)"; trap 'rm -rf "$tmp"' EXIT +tmp="$(mktemp -d)" +# Cleanup trap set BEFORE any fixture is copied into hooks/ (rm -f on absent files is safe). +trap 'rm -f "$REPO_ROOT"/hooks/fix-warn-then-json "$REPO_ROOT"/hooks/fix-noise "$REPO_ROOT"/hooks/fix-clean; rm -rf "$tmp"' EXIT mkfix() { printf '%s\n' "$1" > "$tmp/$2"; chmod +x "$tmp/$2"; } # Fixture A: locale warning to stderr-or-stdout then a block JSON on stdout. @@ -212,31 +214,42 @@ run() { local out err rc; out="$("$WRAPPER" "$1" 2>"$tmp/err")"; rc=$?; printf ' # Simplest: call the wrapper with HOOK_DIR override is not supported, so copy fixtures # into hooks/ under a unique prefix and clean up. for f in fix-warn-then-json fix-noise fix-clean; do cp "$tmp/$f" "$REPO_ROOT/hooks/$f"; done -trap 'rm -f "$REPO_ROOT"/hooks/fix-warn-then-json "$REPO_ROOT"/hooks/fix-noise "$REPO_ROOT"/hooks/fix-clean; rm -rf "$tmp"' EXIT -# (a) warning + block JSON → stdout is ONLY the block JSON, valid, warning on stderr. +# (a) warning + block JSON → stdout is ONLY the block JSON; warning ON stderr (m1). run fix-warn-then-json if printf '%s' "$OUT" | jq -e '.decision=="block"' >/dev/null 2>&1 \ - && ! printf '%s' "$OUT" | grep -q 'perl: warning'; then - pass "(a) block JSON delivered, warning stripped from stdout" -else fail "(a) expected only block JSON on stdout, got: $OUT"; fi + && ! printf '%s' "$OUT" | grep -q 'perl: warning' \ + && printf '%s' "$ERR" | grep -q 'perl: warning'; then + pass "(a) block JSON on stdout, warning routed to stderr" +else fail "(a) expected block JSON on stdout + warning on stderr, got OUT=[$OUT] ERR=[$ERR]"; fi -# (b) only noise → stdout empty. +# (b) only noise → stdout empty, noise on stderr. run fix-noise -[ -z "$OUT" ] && pass "(b) noise suppressed from stdout" || fail "(b) expected empty stdout, got: $OUT" +{ [ -z "$OUT" ] && printf '%s' "$ERR" | grep -q 'diagnostic'; } \ + && pass "(b) noise suppressed from stdout, routed to stderr" || fail "(b) expected empty stdout, got: $OUT" # (c) clean JSON → unchanged + valid. run fix-clean printf '%s' "$OUT" | jq -e '.hookSpecificOutput.hookEventName=="X"' >/dev/null 2>&1 \ && pass "(c) clean JSON passthrough" || fail "(c) clean JSON broke, got: $OUT" +# (d) jq-absent (I2) → wrapper passes stdout through VERBATIM (warning + JSON both present). +# Stub PATH so the wrapper's `command -v jq` fails; assert via grep (no jq needed here). +nojq="$tmp/nojq"; mkdir -p "$nojq" +OUTD="$(PATH="$nojq" "$WRAPPER" fix-warn-then-json 2>/dev/null)" +{ printf '%s' "$OUTD" | grep -q 'perl: warning' && printf '%s' "$OUTD" | grep -q '"decision":"block"'; } \ + && pass "(d) jq-absent → verbatim passthrough (no discipline applied)" \ + || fail "(d) expected verbatim passthrough with jq absent, got: $OUTD" + echo ""; echo "Results: $failures failure(s)"; [ "$failures" -eq 0 ] ``` -> Implementation note: the fixtures are copied into `hooks/` (the wrapper resolves -> scripts relative to its own dir) and cleaned up in the trap. The jq-absent path is -> covered by the early `command -v jq` SKIP (the wrapper's own jq-absent branch is -> exercised manually in the design's case (d); a PATH-stub case may be added). +> Implementation note: fixtures are copied into `hooks/` (the wrapper resolves +> scripts relative to its own dir) and cleaned up in the trap (set BEFORE the copy so +> a mid-test failure still cleans up). The top `command -v jq` guard SKIPs only when +> the **test machine** has no jq (can't run a/b/c); on CI (jq present) all four cases +> run — case (d) stubs PATH so the **wrapper** sees no jq and asserts verbatim +> passthrough via grep (no jq needed for that assertion). **Step 2: Run → FAIL** (wrapper still `exec bash`, doesn't strip the warning): Run: `bash tests/hook-stdout-discipline.sh 2>&1 | tail -6` @@ -296,6 +309,7 @@ Rollback: revert `run-hook.cmd` to the `exec bash` form + re-run `tests/hook-con ```bash test_pr_reminder_dedup() { local tmp; tmp="$(mktemp -d)" + trap 'rm -rf "$tmp"' RETURN # I3: repo-convention cleanup local sess='/x/transcripts/sess-abc.jsonl' local payload='{"tool_name":"Bash","tool_input":{"command":"gh pr create --title t --body b"},"cwd":"'"$tmp"'","transcript_path":"'"$sess"'"}' # first call emits @@ -312,18 +326,35 @@ test_pr_reminder_dedup() { printf '%s' "$out3" | jq -e '.hookSpecificOutput.additionalContext' >/dev/null 2>&1 \ && pass "pr-reminder: re-emits after PreCompact reset" \ || fail "pr-reminder: should re-emit after compaction, got: $out3" - rm -rf "$tmp" + # I1: a quoted body that merely mentions 'gh pr create' must NOT emit + local fp='{"tool_name":"Bash","tool_input":{"command":"gh issue create --title t --body \"see gh pr create docs\""},"cwd":"'"$tmp"'","transcript_path":"'"$sess"'"}' + local outfp; outfp="$(run_hook pretool-pr-review-reminder "$fp")" + [ -z "$outfp" ] && pass "pr-reminder: not tripped by 'gh pr create' inside a quoted body" \ + || fail "pr-reminder: false-positive on quoted body, got: $outfp" + # degrade gracefully: no transcript_path → emits every time (no marker write) + local tmp2; tmp2="$(mktemp -d)" + local np='{"tool_name":"Bash","tool_input":{"command":"gh pr create --title t"},"cwd":"'"$tmp2"'"}' + local na; na="$(run_hook pretool-pr-review-reminder "$np")" + local nb; nb="$(run_hook pretool-pr-review-reminder "$np")" + { printf '%s' "$na" | jq -e '.hookSpecificOutput' >/dev/null 2>&1 && printf '%s' "$nb" | jq -e '.hookSpecificOutput' >/dev/null 2>&1; } \ + && pass "pr-reminder: no transcript_path → emits every time" \ + || fail "pr-reminder: should emit each time without transcript_path" + rm -rf "$tmp2" } # (call test_pr_reminder_dedup in the main run sequence) ``` -Also: a no-`transcript_path` payload still emits each time (degrade gracefully). **Step 2: Run → FAIL** (reminder has no dedup yet): Run: `bash tests/hook-contracts.sh 2>&1 | grep -i "pr-reminder" | head` Expected: the "deduped within session" case FAILs (second call still emits). **Step 3: Implement dedup in `pretool-pr-review-reminder`** — after computing `cmd` and confirming `gh pr create`, before emitting: -- Tighten the match: `printf '%s' "$cmd" | grep -Eq '(^|[;&| ])gh +pr +create([ ]|$)'` (command position, not substring inside a quoted body). +- **Tighten the match via quote-stripping (I1 — mirror `pre-tool-scope-guard`'s precedent), NOT a boundary regex.** A boundary regex still matches `gh pr create` inside a quoted `--body`. Strip quoted segments first, then match on the stripped command: + ```bash + cmd_unquoted=$(printf '%s' "$cmd" | sed "s/'[^']*'//g; s/\"[^\"]*\"//g") + printf '%s' "$cmd_unquoted" | grep -q 'gh pr create' || exit 0 + ``` + So `gh issue create --body "… gh pr create …"` → stripped → `gh issue create --body ` → no match → no emit; a real `gh pr create --title t --body b` → stripped → `gh pr create --title t --body ` → matches. - Compute `session_key`: `transcript_path=$(… jq -r .transcript_path …); session_key=""; [ -n "$transcript_path" ] && session_key=$(basename "$transcript_path")`. - Marker: `marker="${cwd_dir}/.claude/autodev-state/pr-reminder-seen"`. - If `[ -n "$session_key" ]` and `grep -qxF "$session_key" "$marker" 2>/dev/null` → `exit 0` (already reminded). Else emit, and if `[ -n "$session_key" ]` append: `mkdir -p "$(dirname "$marker")"; printf '%s\n' "$session_key" >> "$marker"`. From 7a1c35c02f00b2ee9e7265c1535cc0ed52f97e43 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:34:49 -0400 Subject: [PATCH 07/21] docs: plan PASS adversarial plan-phase cycle-2 (converged; sed-order mirrors pre-tool-scope-guard) --- docs/plans/2026-06-01-pipeline-hardening-4issues.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues.md b/docs/plans/2026-06-01-pipeline-hardening-4issues.md index c0e200f..4fff7a4 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues.md +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues.md @@ -351,7 +351,7 @@ Expected: the "deduped within session" case FAILs (second call still emits). **Step 3: Implement dedup in `pretool-pr-review-reminder`** — after computing `cmd` and confirming `gh pr create`, before emitting: - **Tighten the match via quote-stripping (I1 — mirror `pre-tool-scope-guard`'s precedent), NOT a boundary regex.** A boundary regex still matches `gh pr create` inside a quoted `--body`. Strip quoted segments first, then match on the stripped command: ```bash - cmd_unquoted=$(printf '%s' "$cmd" | sed "s/'[^']*'//g; s/\"[^\"]*\"//g") + cmd_unquoted=$(printf '%s' "$cmd" | sed "s/\"[^\"]*\"//g; s/'[^']*'//g") # double-first, matching pre-tool-scope-guard printf '%s' "$cmd_unquoted" | grep -q 'gh pr create' || exit 0 ``` So `gh issue create --body "… gh pr create …"` → stripped → `gh issue create --body ` → no match → no emit; a real `gh pr create --title t --body b` → stripped → `gh pr create --title t --body ` → matches. From 9a32eb7bbb834207a52c8b01db8b4a5c507f7945 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:35:54 -0400 Subject: [PATCH 08/21] chore: lock scope for v6.3.0 pipeline hardening (alignment passed) --- docs/plans/2026-06-01-pipeline-hardening-4issues.md | 2 +- docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) create mode 100644 docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues.md b/docs/plans/2026-06-01-pipeline-hardening-4issues.md index 4fff7a4..825971a 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues.md +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues.md @@ -31,7 +31,7 @@ |------|-------|-------|--------| | 1 | feat: v6.3.0 pipeline hardening — auth/authz bug-class, pr-monitoring pattern, completion trust-boundary, hook stdout-JSON discipline, reminder dedup (#41/#58/#59/#60/#61) | Task 1, Task 2, Task 3, Task 4, Task 5, Task 6, Task 7 | feat/pipeline-hardening-4issues-v6.3.0 | -**Status:** Draft +**Status:** Locked 2026-06-01T05:35:54Z --- diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock b/docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock new file mode 100644 index 0000000..87d988e --- /dev/null +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock @@ -0,0 +1 @@ +e93b292e724f4b39bd6d99a2fc1bff2317cd5c32ca35f6f6156bab322318f11c From 54bf62a9b262bdd0a02629dc0c4d641679491183 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:36:25 -0400 Subject: [PATCH 09/21] feat(adversarial-design-review): add auth/authz chain-composition plan-phase bug-class (#59) --- skills/adversarial-design-review/SKILL.md | 1 + 1 file changed, 1 insertion(+) diff --git a/skills/adversarial-design-review/SKILL.md b/skills/adversarial-design-review/SKILL.md index cdbbfc8..71778ba 100644 --- a/skills/adversarial-design-review/SKILL.md +++ b/skills/adversarial-design-review/SKILL.md @@ -106,6 +106,7 @@ inherits the design's blast radius) and adds: |---|---| | **Over-decomposition / under-decomposition** | Does task granularity match `writing-plans`'s 2–5-minute target per step? A 40-step plan for a CSV export is suspect. A 2-step plan for a schema migration is suspect. Flag both directions. | | **Verification-class mismatch** | For each task, does its verification step match its change class per the table in `skills/writing-plans/SKILL.md` ("Verification per change class")? A schema migration verified by unit tests = finding. An API endpoint with no curl invocation = finding. | +| **Auth/authz chain composition** | When the design names an auth/authz chain ("behind the X auth filter", "RBAC-enforced", "admin-only"), walk that chain component-by-component against the plan's actual wiring. For each gate, verify it is enforced **server-side against an authenticated principal**, not shape-matched by a client-asserted value. Flag any gate where the plan's check reads from request/client-supplied input (`evidence.granted_permissions`, a header, a body field) instead of an authenticated subject (`authz.Enforce(authenticatedSubject, …)`). A plan that wires a weaker gate than the design's chain implies = finding. | | **Hidden serial dependencies** | Tasks the plan claims are independent but actually share state (same file, same DB row, same config key). If executed in parallel, they'll collide. Flag any such pair. | | **Missing rollback wiring** | The design specifies a rollback story (per the design-phase class above). Is it actually implemented in the plan as a task or step? Or is it a paragraph nobody is going to write code for? | | **Missing integration proof** | For multi-component changes, does the plan include an end-to-end or integration verification that exercises the real boundary? If it only tests each component with mocks, flag it unless the design explicitly justified that as sufficient. | From cd1fbfe4744e12ef85b53a12f52d18107e4427eb Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:37:19 -0400 Subject: [PATCH 10/21] docs(pr-monitoring): sanction the bash poll-loop CI-wait pattern, host-scoped (#60) --- skills/pr-monitoring/SKILL.md | 44 +++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) diff --git a/skills/pr-monitoring/SKILL.md b/skills/pr-monitoring/SKILL.md index a395cd4..626a1ee 100644 --- a/skills/pr-monitoring/SKILL.md +++ b/skills/pr-monitoring/SKILL.md @@ -21,6 +21,50 @@ Invoked automatically by `finishing-a-development-branch` in autonomous mode aft When multiple PRs were created in the same session, prefer launching **one monitor agent that covers all PRs** to reduce GitHub API request volume. You may launch one agent per PR instead if the PRs are on unrelated codebases, if you anticipate heavy parallel CI load, or if a previous shared monitor was rate-limited. Default to the single-agent approach and only deviate with a reason. +## Waiting for CI: the sanctioned pattern + +For a pure CI-wait (poll until checks settle, then act), prefer a blocking poll-loop +over a long-lived monitor Agent. The pattern is host-scoped. + + +**Recommended default — a bash poll-loop, not a long-lived monitor Agent.** Issue a +`Bash` tool call with `run_in_background: true` running a **bounded** sleep-loop that +polls `gh pr checks ` until no check is `pending` (or a failure), prints a settle +line, and exits. The harness re-invokes the lead **once** on exit (≈0 tokens while it +sleeps); the lead then reads the result and admin-merges on `failures=0`. + +Bound the loop so it can never spin forever, e.g.: + +```bash +for i in $(seq 1 120); do # 120 × 30s = 60 min cap + raw=$(gh pr checks --json bucket 2>/dev/null) + fail=$(echo "$raw" | jq '[.[]|select(.bucket=="fail")]|length') + pend=$(echo "$raw" | jq '[.[]|select(.bucket=="pending")]|length') + { [ "$fail" != 0 ] && [ -n "$fail" ]; } && { echo "FAILURES=$fail"; break; } + { [ "$pend" = 0 ] || [ -z "$pend" ]; } && { echo "SETTLED"; break; } + sleep 30 +done +``` + +On the 120-iteration cap, print a timeout line and exit for the lead to restart. + +**Why not a background Agent for this:** a `run_in_background` **Agent** told to +sleep-loop tends to return after ~1 cycle and re-complete repeatedly (the agent loop is +not a blocking sleep) — observed early-exiting ~6× in one run. The bash sleep-loop +genuinely blocks to completion. + + + +Use your host's equivalent poll mechanism. Where no blocking-background-bash exists, the +sanctioned fallback is **self-poll on each lead wakeup**: run `gh pr checks ` once +per turn and re-check on the next turn. This loses fire-on-event but is reliable; do not +dispatch a background Agent expecting it to block on a sleep-loop. + + +**Fallback (all hosts):** the background-Agent monitor in "The Process" below remains the +tool for multi-PR review-comment handling that needs active fix-and-push; note its +early-exit failure mode and prefer the poll-loop / self-poll for a pure CI-wait. + ## The Process Run a `balanced`-tier agent that monitors PRs in a loop for up to **60 minutes**, polling every **10 minutes**, until all CI checks pass and no unresolved reviews remain. From 77149010ce881340477192465e8ebeea98adeffb Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:38:29 -0400 Subject: [PATCH 11/21] docs(subagent-driven-development): completion trust-boundary for Implement-N (#58, ADR 0003) --- agents/team-conventions.md | 15 +++++++++++++++ skills/subagent-driven-development/SKILL.md | 20 ++++++++++++++++++++ 2 files changed, 35 insertions(+) diff --git a/agents/team-conventions.md b/agents/team-conventions.md index 86c4ef0..dac0a5e 100644 --- a/agents/team-conventions.md +++ b/agents/team-conventions.md @@ -25,6 +25,9 @@ them. - Request code review per `skills/requesting-code-review/SKILL.md` using the adversarial-framing brief. - Address review per `skills/receiving-code-review/SKILL.md`. +- **Never self-complete an `Implement: N` task.** Do not flip your own Implement task + to `completed` and do not clear its `blockedBy` to back-door completion. DM the + spec-reviewer when ready; the code-reviewer is the sole flipper of Implement tasks. ## Spec reviewer @@ -46,6 +49,18 @@ them. `skills/requesting-code-review/SKILL.md` until verdict is SHIP-IT (or REVERT-AND-REWRITE after max rounds). - Reflexive approval is forbidden. +- **You are the sole role that flips an `Implement: N` task to `completed`**, and only + after quality review passes. Flip BOTH the "Review quality:" task and the + corresponding "Implement:" task. + +## Lead / orchestrator + +- **A `completed` Implement-N is a claim, not evidence.** Whoever flipped it, run + `skills/verification-before-completion/SKILL.md` (clean build + tests + CI green) + before accepting the task as done or invoking + `skills/finishing-a-development-branch/SKILL.md`. Correctness rests on this gate, not + on who flipped the checkbox. See + `decisions/0003-implement-n-completion-trust-boundary.md`. ## Modes diff --git a/skills/subagent-driven-development/SKILL.md b/skills/subagent-driven-development/SKILL.md index 807005d..dbe97d5 100644 --- a/skills/subagent-driven-development/SKILL.md +++ b/skills/subagent-driven-development/SKILL.md @@ -296,6 +296,26 @@ Codex subagents do not share a task list. Use these conventions instead: --- +## Completion is not trusted until lead-verified + +A green `Implement: N` checkbox is a **claim, not evidence**. Regardless of who or what +flips an `Implement: N` task to `completed` (an implementer, a `blockedBy`-clear, or the +code-reviewer), the lead MUST run `autodev:verification-before-completion` — build + test +from a clean tree, CI green — before treating that task as truly done or invoking +`finishing-a-development-branch`. Recurring failure mode (infra-admin v1.1): a +non-compiling tree and a CI-failing regression were both marked "done"; only the lead's +verification caught them. + +The team-conventions rule "only the code-reviewer flips `Implement: N` to completed" +remains team discipline, but **correctness rests on lead verification, which does not +depend on who flipped the bit**. A deterministic hook that *blocks* the flip is +infeasible — the pre-tool payload a hook sees carries the task id and status but not the +task's subject or the calling sub-agent's identity, so a hook cannot tell an `Implement` +task from any other or identify the caller. See +`decisions/0003-implement-n-completion-trust-boundary.md`. + +--- + ## Red Flags **Never:** From ff3489a36fb520dcf7eafe807ac8c43ad460e08a Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:41:41 -0400 Subject: [PATCH 12/21] fix(run-hook.cmd): enforce stdout JSON discipline, recover block decisions behind warnings (#41) --- hooks/run-hook.cmd | 25 +++++++++++++- tests/hook-stdout-discipline.sh | 61 +++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+), 1 deletion(-) create mode 100755 tests/hook-stdout-discipline.sh diff --git a/hooks/run-hook.cmd b/hooks/run-hook.cmd index 45955ea..380d182 100755 --- a/hooks/run-hook.cmd +++ b/hooks/run-hook.cmd @@ -55,4 +55,27 @@ if { [ "${LC_ALL:-}" = "C.UTF-8" ] || [ "${LC_CTYPE:-}" = "C.UTF-8" ] || [ "${LA [ "${LANG:-}" = "C.UTF-8" ] && export LANG=C fi -exec bash "${SCRIPT_DIR}/${SCRIPT_NAME}" "$@" +# Run the hook with stdout captured (stderr + stdin pass through untouched). +# Enforce stdout JSON discipline: only valid-JSON-or-empty reaches the host's hook +# parser; diagnostics that leak onto stdout (locale/perl/git warnings) are routed to +# stderr. A block decision preceded by a warning is recovered, not dropped (#41). +if command -v jq >/dev/null 2>&1; then + hook_out="$(bash "${SCRIPT_DIR}/${SCRIPT_NAME}" "$@")" + hook_rc=$? + if [ -z "$hook_out" ]; then + : # nothing to emit + elif printf '%s' "$hook_out" | jq -e . >/dev/null 2>&1; then + printf '%s\n' "$hook_out" # valid JSON as a whole + else + json_line="$(printf '%s\n' "$hook_out" | grep -E '^\{' | tail -1)" + if [ -n "$json_line" ] && printf '%s' "$json_line" | jq -e . >/dev/null 2>&1; then + printf '%s\n' "$hook_out" | grep -vF "$json_line" >&2 # diagnostics -> stderr + printf '%s\n' "$json_line" # recovered JSON -> stdout + else + printf '%s\n' "$hook_out" >&2 # all noise -> stderr + fi + fi + exit "$hook_rc" +else + exec bash "${SCRIPT_DIR}/${SCRIPT_NAME}" "$@" # jq absent: pass through unchanged +fi diff --git a/tests/hook-stdout-discipline.sh b/tests/hook-stdout-discipline.sh new file mode 100755 index 0000000..196859b --- /dev/null +++ b/tests/hook-stdout-discipline.sh @@ -0,0 +1,61 @@ +#!/usr/bin/env bash +# tests/hook-stdout-discipline.sh — verify run-hook.cmd enforces stdout JSON discipline. +set -uo pipefail +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +WRAPPER="$REPO_ROOT/hooks/run-hook.cmd" +failures=0 +pass() { printf 'PASS: %s\n' "$*"; } +fail() { printf 'FAIL: %s\n' "$*" >&2; failures=$((failures+1)); } +command -v jq >/dev/null 2>&1 || { echo "SKIP: jq required for a/b/c"; exit 0; } + +tmp="$(mktemp -d)" +# Cleanup trap set BEFORE any fixture is copied into hooks/ (rm -f on absent files is safe). +trap 'rm -f "$REPO_ROOT"/hooks/fix-warn-then-json "$REPO_ROOT"/hooks/fix-noise "$REPO_ROOT"/hooks/fix-clean; rm -rf "$tmp"' EXIT +mkfix() { printf '%s\n' "$1" > "$tmp/$2"; chmod +x "$tmp/$2"; } + +# Fixture A: a warning leaks to stdout, then a block JSON on stdout. +mkfix '#!/usr/bin/env bash +echo "perl: warning: Setting locale failed." +printf "%s\n" "{\"decision\":\"block\",\"reason\":\"x\"}"' fix-warn-then-json +# Fixture B: only noise, no JSON. +mkfix '#!/usr/bin/env bash +echo "just a diagnostic line"' fix-noise +# Fixture C: clean single-line JSON. +mkfix '#!/usr/bin/env bash +printf "%s\n" "{\"hookSpecificOutput\":{\"hookEventName\":\"X\"}}"' fix-clean + +for f in fix-warn-then-json fix-noise fix-clean; do cp "$tmp/$f" "$REPO_ROOT/hooks/$f"; done + +run() { OUT="$("$WRAPPER" "$1" 2>"$tmp/err")"; RC=$?; ERR="$(cat "$tmp/err")"; } + +# (a) warning + block JSON → stdout ONLY the block JSON; warning ON stderr. +run fix-warn-then-json +if printf '%s' "$OUT" | jq -e '.decision=="block"' >/dev/null 2>&1 \ + && ! printf '%s' "$OUT" | grep -q 'perl: warning' \ + && printf '%s' "$ERR" | grep -q 'perl: warning'; then + pass "(a) block JSON on stdout, warning routed to stderr" +else fail "(a) expected block JSON on stdout + warning on stderr, got OUT=[$OUT] ERR=[$ERR]"; fi + +# (b) only noise → stdout empty, noise on stderr. +run fix-noise +{ [ -z "$OUT" ] && printf '%s' "$ERR" | grep -q 'diagnostic'; } \ + && pass "(b) noise suppressed from stdout, routed to stderr" || fail "(b) expected empty stdout, got: $OUT" + +# (c) clean JSON → unchanged + valid. +run fix-clean +printf '%s' "$OUT" | jq -e '.hookSpecificOutput.hookEventName=="X"' >/dev/null 2>&1 \ + && pass "(c) clean JSON passthrough" || fail "(c) clean JSON broke, got: $OUT" + +# (d) jq-absent → wrapper passes stdout through VERBATIM (warning + JSON both present). +# Stub PATH so the wrapper's `command -v jq` fails, but `bash` is still resolvable +# (the wrapper's jq-absent branch `exec bash`s the fixture). assert via grep (no jq). +nojq="$tmp/nojq"; mkdir -p "$nojq" +# Symlink the externals the wrapper's jq-absent path needs (bash for `exec bash`, +# dirname for SCRIPT_DIR) but NOT jq, so `command -v jq` fails. +for bin in bash dirname; do ln -sf "$(command -v "$bin")" "$nojq/$bin"; done +OUTD="$(PATH="$nojq" "$WRAPPER" fix-warn-then-json 2>/dev/null)" +{ printf '%s' "$OUTD" | grep -q 'perl: warning' && printf '%s' "$OUTD" | grep -q '"decision":"block"'; } \ + && pass "(d) jq-absent → verbatim passthrough (no discipline applied)" \ + || fail "(d) expected verbatim passthrough with jq absent, got: $OUTD" + +echo ""; echo "Results: $failures failure(s)"; [ "$failures" -eq 0 ] From 2b5ddc48067f9f05561cb634daa280e14b4fa3d1 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:44:52 -0400 Subject: [PATCH 13/21] fix(hooks): pr-review reminder once-per-session + PreCompact reset (#61) --- hooks/pre-compact-snapshot | 17 ++++++++++++++++ hooks/pretool-pr-review-reminder | 25 ++++++++++++++++++++++-- tests/hook-contracts.sh | 33 ++++++++++++++++++++++++++++++++ 3 files changed, 73 insertions(+), 2 deletions(-) diff --git a/hooks/pre-compact-snapshot b/hooks/pre-compact-snapshot index a5284e2..92d5a0f 100755 --- a/hooks/pre-compact-snapshot +++ b/hooks/pre-compact-snapshot @@ -29,6 +29,23 @@ transcript_path=$(printf '%s' "$hook_input" | jq -r '.transcript_path // empty' session_key="" [ -n "$transcript_path" ] && session_key=$(basename "$transcript_path") +# ── #61: clear this session's pr-review-reminder marker ─────────────────────── +# So the reminder re-emits once after compaction (the post-compaction context lost the +# earlier reminder). MUST run before the no-locked-plans early-exit below. Guard on a +# non-empty session key (no blanket file wipe on identity-less hosts). +if [ -n "${session_key:-}" ]; then + reminder_marker="${cwd_dir}/.claude/autodev-state/pr-reminder-seen" + if [ -f "$reminder_marker" ]; then + # grep -v exits 1 when no lines remain (only our key) — tolerate it under set -e. + remaining="$(grep -vxF "$session_key" "$reminder_marker" 2>/dev/null || true)" + if [ -n "$remaining" ]; then + printf '%s\n' "$remaining" > "$reminder_marker" # keep other sessions' keys + else + rm -f "$reminder_marker" # only our key → remove marker + fi + fi +fi + # ── Find locked plans and collect their state ───────────────────────────────── plans_dir="${cwd_dir}/docs/plans" state_section="" diff --git a/hooks/pretool-pr-review-reminder b/hooks/pretool-pr-review-reminder index a74bbfa..500e8c8 100755 --- a/hooks/pretool-pr-review-reminder +++ b/hooks/pretool-pr-review-reminder @@ -26,8 +26,23 @@ tool_name=$(printf '%s' "$hook_input" | jq -r '.tool_name // empty' 2>/dev/null cmd=$(printf '%s' "$hook_input" | jq -r '.tool_input.command // empty' 2>/dev/null || true) [ -z "$cmd" ] && exit 0 -# Only act on PR creation commands. -printf '%s' "$cmd" | grep -q 'gh pr create' || exit 0 +# Only act on real PR-creation commands. Strip quoted segments first so a quoted +# --body that merely mentions "gh pr create" does not trip the hook (#61; mirrors +# pre-tool-scope-guard's quote-strip precedent). +cmd_unquoted=$(printf '%s' "$cmd" | sed "s/\"[^\"]*\"//g; s/'[^']*'//g") +printf '%s' "$cmd_unquoted" | grep -q 'gh pr create' || exit 0 + +# Session dedup (#61): emit once per session; reset by pre-compact-snapshot on +# compaction. Keyed by transcript basename; degrade to emit-every-time when absent. +cwd_dir=$(printf '%s' "$hook_input" | jq -r '.cwd // empty' 2>/dev/null || true) +[ -z "$cwd_dir" ] && cwd_dir="${PWD}" +transcript_path=$(printf '%s' "$hook_input" | jq -r '.transcript_path // empty' 2>/dev/null || true) +session_key="" +[ -n "$transcript_path" ] && session_key=$(basename "$transcript_path") +marker="${cwd_dir}/.claude/autodev-state/pr-reminder-seen" +if [ -n "$session_key" ] && grep -qxF "$session_key" "$marker" 2>/dev/null; then + exit 0 # already reminded this session +fi reminder=$(cat <<'REMINDER' @@ -52,4 +67,10 @@ emit_additional_context() { emit_additional_context "PreToolUse" "$reminder" +# Record this session so the reminder doesn't repeat (only when we have a session key). +if [ -n "$session_key" ]; then + mkdir -p "$(dirname "$marker")" 2>/dev/null || true + printf '%s\n' "$session_key" >> "$marker" 2>/dev/null || true +fi + exit 0 diff --git a/tests/hook-contracts.sh b/tests/hook-contracts.sh index 3023e5e..6defdf3 100755 --- a/tests/hook-contracts.sh +++ b/tests/hook-contracts.sh @@ -419,6 +419,38 @@ test_pretool_pr_review_json() { assert_hook_context_json "pretool-pr-review-reminder" "PreToolUse" "$output" } +test_pr_reminder_dedup() { + local tmp; tmp="$(mktemp -d)" + trap 'rm -rf "$tmp"' RETURN + local sess='/x/transcripts/sess-abc.jsonl' + local payload='{"tool_name":"Bash","tool_input":{"command":"gh pr create --title t --body b"},"cwd":"'"$tmp"'","transcript_path":"'"$sess"'"}' + local out1; out1="$(run_hook pretool-pr-review-reminder "$payload")" + printf '%s' "$out1" | jq -e '.hookSpecificOutput.additionalContext' >/dev/null 2>&1 \ + && pass "pr-reminder: first call emits" || fail "pr-reminder: first call should emit" + local out2; out2="$(run_hook pretool-pr-review-reminder "$payload")" + [ -z "$out2" ] && pass "pr-reminder: deduped within session" \ + || fail "pr-reminder: second call should be silent, got: $out2" + # PreCompact with NO locked plans must still clear the marker. + run_hook pre-compact-snapshot '{"cwd":"'"$tmp"'","transcript_path":"'"$sess"'"}' >/dev/null 2>&1 || true + local out3; out3="$(run_hook pretool-pr-review-reminder "$payload")" + printf '%s' "$out3" | jq -e '.hookSpecificOutput.additionalContext' >/dev/null 2>&1 \ + && pass "pr-reminder: re-emits after PreCompact reset" \ + || fail "pr-reminder: should re-emit after compaction, got: $out3" + # A quoted body that merely mentions 'gh pr create' must NOT emit. + local fp='{"tool_name":"Bash","tool_input":{"command":"gh issue create --title t --body \"see gh pr create docs\""},"cwd":"'"$tmp"'","transcript_path":"'"$sess"'"}' + local outfp; outfp="$(run_hook pretool-pr-review-reminder "$fp")" + [ -z "$outfp" ] && pass "pr-reminder: not tripped by 'gh pr create' inside a quoted body" \ + || fail "pr-reminder: false-positive on quoted body, got: $outfp" + # Degrade gracefully: no transcript_path -> emits every time. + local tmp2; tmp2="$(mktemp -d)" + local np='{"tool_name":"Bash","tool_input":{"command":"gh pr create --title t"},"cwd":"'"$tmp2"'"}' + local na nb; na="$(run_hook pretool-pr-review-reminder "$np")"; nb="$(run_hook pretool-pr-review-reminder "$np")" + { printf '%s' "$na" | jq -e '.hookSpecificOutput' >/dev/null 2>&1 && printf '%s' "$nb" | jq -e '.hookSpecificOutput' >/dev/null 2>&1; } \ + && pass "pr-reminder: no transcript_path -> emits every time" \ + || fail "pr-reminder: should emit each time without transcript_path" + rm -rf "$tmp2" +} + test_posttool_pr_created_json() { local output output="$(run_hook posttool-pr-created '{"tool_name":"Bash","tool_input":{"command":"gh pr create --title test --body test"},"tool_response":"https://github.com/owner/repo/pull/123","cwd":"'"$REPO_ROOT"'"}')" @@ -1778,6 +1810,7 @@ test_prompt_strict_ignores_single_workspace_lock_when_session_has_no_lock test_prompt_strict_uses_session_locked_plan_only test_prompt_strict_ignores_prose_mention_of_locked_status test_pretool_pr_review_json +test_pr_reminder_dedup test_posttool_pr_created_json test_pre_compact_snapshot_json test_wrapper_suppresses_pre_compact_locale_noise From 64bdda9bd70daa3108ebc403f6671bf7170b4411 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:45:05 -0400 Subject: [PATCH 14/21] ci: run hook contract + stdout-discipline tests on hooks/tests changes (#41/#61) --- .github/workflows/hooks-check.yml | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 .github/workflows/hooks-check.yml diff --git a/.github/workflows/hooks-check.yml b/.github/workflows/hooks-check.yml new file mode 100644 index 0000000..a881576 --- /dev/null +++ b/.github/workflows/hooks-check.yml @@ -0,0 +1,24 @@ +name: Hooks Check +on: + push: + paths: + - 'hooks/**' + - 'tests/hook-contracts.sh' + - 'tests/hook-stdout-discipline.sh' + - '.github/workflows/hooks-check.yml' + pull_request: + paths: + - 'hooks/**' + - 'tests/hook-contracts.sh' + - 'tests/hook-stdout-discipline.sh' +jobs: + hooks: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Install jq + run: sudo apt-get update && sudo apt-get install -y jq + - name: Hook contract tests + run: bash tests/hook-contracts.sh + - name: Hook stdout discipline tests + run: bash tests/hook-stdout-discipline.sh From 0a14316ac710905329574918414e842278f43cb7 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:45:43 -0400 Subject: [PATCH 15/21] chore: bump version to 6.3.0 (#41/#58/#59/#60/#61) --- .claude-plugin/marketplace.json | 2 +- .claude-plugin/plugin.json | 2 +- .cursor-plugin/plugin.json | 2 +- RELEASE-NOTES.md | 31 +++++++++++++++++++++++++++++++ 4 files changed, 34 insertions(+), 3 deletions(-) diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index f4dafd1..91db5b7 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -9,7 +9,7 @@ { "name": "autodev", "description": "Autonomous development workflow skills for coding agents", - "version": "6.2.2", + "version": "6.3.0", "source": "./", "author": { "name": "Jon Langevin", diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index 9728fa1..940fae4 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "autodev", "description": "Autonomous development workflow skills for coding agents: design, review, planning, execution, monitoring, and retrospectives", - "version": "6.2.2", + "version": "6.3.0", "author": { "name": "Jon Langevin", "email": "jon@gocodealone.com" diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json index e77ff1f..47f5913 100644 --- a/.cursor-plugin/plugin.json +++ b/.cursor-plugin/plugin.json @@ -2,7 +2,7 @@ "name": "autodev", "displayName": "Autonomous Dev Kit", "description": "Autonomous development workflow skills for coding agents", - "version": "6.2.2", + "version": "6.3.0", "author": { "name": "Jon Langevin", "email": "jon@gocodealone.com" diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md index d6a71d7..a5e612b 100644 --- a/RELEASE-NOTES.md +++ b/RELEASE-NOTES.md @@ -1,5 +1,36 @@ # Autonomous Dev Kit Release Notes +## v6.3.0 — 2026-06-01 + +Pipeline-hardening release closing five recurring gate-miss / context-waste issues +observed across autonomous runs and Codex compaction. + +- **`adversarial-design-review` — auth/authz chain-composition bug-class (#59):** a new + plan-phase row that walks the design's auth/authz chain component-by-component against + the plan's wiring and flags any gate enforced by a *client-asserted* value + (`evidence.granted_permissions`, a header) instead of server-side against an + authenticated principal. +- **`pr-monitoring` — sanctioned bash poll-loop (#60):** documents the host-scoped + CI-wait pattern. Under Claude Code, a bounded `run_in_background` bash sleep-loop that + blocks to completion and re-invokes the lead once on settle (the prior background-Agent + monitor early-exited ~6× per run); Codex/Cursor use a self-poll-on-wakeup fallback. +- **`subagent-driven-development` / `team-conventions` — completion trust-boundary (#58, + ADR 0003):** a flipped `Implement: N` is a claim, not evidence — the lead must run + `verification-before-completion` before trusting it. A deterministic hook-block is + infeasible (the pre-tool payload lacks the task subject + caller identity), so + correctness rests on lead verification, not on who flipped the checkbox. +- **`run-hook.cmd` — stdout JSON discipline (#41):** the wrapper now captures each hook's + stdout and emits only valid-JSON-or-empty to the host's hook parser, recovering a block + decision even when a locale/diagnostic warning precedes it (previously such noise could + invalidate the hook's JSON). Diagnostics are routed to stderr; jq-absent hosts pass + through unchanged. New `tests/hook-stdout-discipline.sh`. +- **`pretool-pr-review-reminder` — once-per-session (#61):** the gh-version/Copilot + reviewer reminder now emits once per session (deduped via a `.claude/autodev-state` + marker, quote-strip-matched so a quoted `--body` mentioning `gh pr create` no longer + trips it) and is reset by `pre-compact-snapshot` so it re-emits once after a compaction. +- **CI:** new `hooks-check.yml` runs the hook contract + stdout-discipline tests on any + `hooks/`/test change, so these fixes are regression-gated. + ## v6.2.2 — 2026-05-31 New **Existence / runtime-validity** bug-class in `adversarial-design-review` From 308cecf30c2e49f351c64c6f13870fcb361ea00d Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:52:01 -0400 Subject: [PATCH 16/21] fix(hooks): grep -vxF full-line diagnostic routing + atomic marker rewrite (PR review) --- hooks/pre-compact-snapshot | 5 ++++- hooks/run-hook.cmd | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/hooks/pre-compact-snapshot b/hooks/pre-compact-snapshot index 92d5a0f..5bc19ef 100755 --- a/hooks/pre-compact-snapshot +++ b/hooks/pre-compact-snapshot @@ -39,7 +39,10 @@ if [ -n "${session_key:-}" ]; then # grep -v exits 1 when no lines remain (only our key) — tolerate it under set -e. remaining="$(grep -vxF "$session_key" "$reminder_marker" 2>/dev/null || true)" if [ -n "$remaining" ]; then - printf '%s\n' "$remaining" > "$reminder_marker" # keep other sessions' keys + # Atomic: write other sessions' keys to a temp file then rename. + printf '%s\n' "$remaining" > "${reminder_marker}.tmp" 2>/dev/null \ + && mv "${reminder_marker}.tmp" "$reminder_marker" \ + || rm -f "${reminder_marker}.tmp" else rm -f "$reminder_marker" # only our key → remove marker fi diff --git a/hooks/run-hook.cmd b/hooks/run-hook.cmd index 380d182..73f5872 100755 --- a/hooks/run-hook.cmd +++ b/hooks/run-hook.cmd @@ -69,7 +69,7 @@ if command -v jq >/dev/null 2>&1; then else json_line="$(printf '%s\n' "$hook_out" | grep -E '^\{' | tail -1)" if [ -n "$json_line" ] && printf '%s' "$json_line" | jq -e . >/dev/null 2>&1; then - printf '%s\n' "$hook_out" | grep -vF "$json_line" >&2 # diagnostics -> stderr + printf '%s\n' "$hook_out" | grep -vxF "$json_line" >&2 # diagnostics -> stderr (full-line) printf '%s\n' "$json_line" # recovered JSON -> stdout else printf '%s\n' "$hook_out" >&2 # all noise -> stderr From 5d00fec2bd96309df4ab59bd630b6fc3fc347f31 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:58:45 -0400 Subject: [PATCH 17/21] feat(adversarial-design-review): add Artifact-class precedent design-phase bug-class (#63) --- skills/adversarial-design-review/SKILL.md | 1 + 1 file changed, 1 insertion(+) diff --git a/skills/adversarial-design-review/SKILL.md b/skills/adversarial-design-review/SKILL.md index 71778ba..1f6ddc9 100644 --- a/skills/adversarial-design-review/SKILL.md +++ b/skills/adversarial-design-review/SKILL.md @@ -87,6 +87,7 @@ checklist is the floor, not the ceiling. Additional findings are welcome. | **Project-guidance conflicts** | Read `docs/design-guidance.md` or equivalent project guidance. Does the artifact violate product direction, architecture constraints, UX/domain principles, quality/security/ops requirements, or non-goals? Cite both the guidance path/section and the conflicting design/plan section. If guidance is missing and the design does not show Q&A capture, flag it. | | **Assumptions under attack** | Load-bearing claims, stated or unstated. "We assume the upstream API is idempotent." "We assume single-tenant." "We assume the user has admin." Challenge each assumption, name how it could be false, and flag any where a false assumption collapses the design. | | **Repo-precedent conflicts** | Does this design fight existing patterns, skills, or conventions in this repo? Cite the conflicting `path/file.md:line`. If the design proposes a new pattern that contradicts an established one, the design must justify the divergence. | +| **Artifact-class precedent** | Survey how the codebase already implements this *artifact class* — not just the *mechanism*. The Repo-precedent class above checks the HOW (e.g. how a plugin registers a module type); this checks the WHERE/shape (e.g. where a scenario stands up a server, where a test fixture lives, where a migration goes). Grep for sibling instances of the same kind of artifact (`ls scenarios/*/cmd/server/main.go`, sibling plugins, migrations, CLI commands, fixtures) and confirm the design follows the established shape — or explicitly justifies divergence. A design that puts a test fixture in the production engine repo when sibling scenarios own their `cmd/server/main.go` = finding. Run the decisive `ls`/`grep` for the artifact class, not just for the mechanism. | | **YAGNI violations** | Features in the design that aren't justified by stated requirements. Configuration knobs nobody asked for. Generality nobody needs. Future-proofing for cases that may never arrive. | | **Missing failure modes** | What fails first under partial failure, network partition, restart-mid-operation, malformed input, adversarial input, the dependency being down? If the design doesn't address it, flag it. | | **Security / privacy at architecture level** | Auth/authz boundaries, secret flow, least privilege, blast radius on compromise, PII exposure, log leakage, dependency/trust boundary, abuse case, CSRF/SSRF/auth-confused-deputy at the design level (not at the code level — that's `requesting-code-review`'s job). | From 4854f425f45f02b0a11d463fbb3fb8ace74422b6 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 01:58:45 -0400 Subject: [PATCH 18/21] fix(session-start): GNU stat -c %Y before BSD -f %m for Linux time-dedup + re-enable hook-contracts CI (#64) --- .github/workflows/hooks-check.yml | 4 ++-- hooks/session-start | 8 ++++++-- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/.github/workflows/hooks-check.yml b/.github/workflows/hooks-check.yml index a881576..d17e58e 100644 --- a/.github/workflows/hooks-check.yml +++ b/.github/workflows/hooks-check.yml @@ -18,7 +18,7 @@ jobs: - uses: actions/checkout@v4 - name: Install jq run: sudo apt-get update && sudo apt-get install -y jq - - name: Hook contract tests - run: bash tests/hook-contracts.sh - name: Hook stdout discipline tests run: bash tests/hook-stdout-discipline.sh + - name: Hook contract tests + run: bash tests/hook-contracts.sh diff --git a/hooks/session-start b/hooks/session-start index 43b5100..98369e3 100755 --- a/hooks/session-start +++ b/hooks/session-start @@ -81,9 +81,13 @@ LAST_EMIT_FILE="${STATE_DIR}/session-start-last-emit" DEDUP_WINDOW_SEC=5 if [ -f "$LAST_EMIT_FILE" ]; then now=$(date +%s 2>/dev/null || echo 0) - last=$(stat -f %m "$LAST_EMIT_FILE" 2>/dev/null \ - || stat -c %Y "$LAST_EMIT_FILE" 2>/dev/null \ + # GNU stat (-c %Y) first, then BSD (-f %m). BSD-first is wrong on Linux: `stat -f` + # means "file system status" there, succeeds (exit 0), and prints fs info instead + # of the mtime — breaking the numeric comparison and the dedup (#64). + last=$(stat -c %Y "$LAST_EMIT_FILE" 2>/dev/null \ + || stat -f %m "$LAST_EMIT_FILE" 2>/dev/null \ || echo 0) + case "$last" in (*[!0-9]*|'') last=0 ;; esac # guard: only a clean integer if [ "$now" -gt 0 ] && [ "$last" -gt 0 ] && [ $((now - last)) -lt "$DEDUP_WINDOW_SEC" ]; then exit 0 fi From 7f49e0e13a25c63a0e93a25a0d2382a9c80fec6b Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 02:00:56 -0400 Subject: [PATCH 19/21] =?UTF-8?q?chore:=20amend=20v6.3.0=20scope=20?= =?UTF-8?q?=E2=80=94=20fold=20in=20#63=20(artifact-class=20precedent)=20+?= =?UTF-8?q?=20#64=20(session-start=20Linux=20stat),=20user-approved=20(ADR?= =?UTF-8?q?=200004)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../0004-v6.3.0-scope-amendment-63-64.md | 48 ++++++++++++++++++ ...06-01-pipeline-hardening-4issues-design.md | 13 +++++ .../2026-06-01-pipeline-hardening-4issues.md | 50 +++++++++++++++++-- ...1-pipeline-hardening-4issues.md.scope-lock | 2 +- 4 files changed, 108 insertions(+), 5 deletions(-) create mode 100644 decisions/0004-v6.3.0-scope-amendment-63-64.md diff --git a/decisions/0004-v6.3.0-scope-amendment-63-64.md b/decisions/0004-v6.3.0-scope-amendment-63-64.md new file mode 100644 index 0000000..7698580 --- /dev/null +++ b/decisions/0004-v6.3.0-scope-amendment-63-64.md @@ -0,0 +1,48 @@ +# 0004. v6.3.0 scope amendment — fold in #63 (artifact-class precedent) and #64 (session-start Linux stat) + +**Status:** Accepted +**Date:** 2026-06-01 +**Decision-makers:** autodev maintainers, autonomous pipeline, user (explicit approval) +**Related:** issues #63, #64; docs/plans/2026-06-01-pipeline-hardening-4issues.md (Tasks 8, 9); ADR 0003 + +## Context + +The v6.3.0 pipeline-hardening plan was locked with 7 tasks (#41/#58/#59/#60/#61 + CI +gate + version bump). During execution two new items arose: + +- **#64 (bug, execution-discovered):** Task 6's new `hooks-check.yml` ran + `tests/hook-contracts.sh` on ubuntu for the first time and surfaced a **pre-existing** + Linux bug in `hooks/session-start` — BSD `stat -f %m` is tried before GNU `stat -c %Y`, + and on Linux `stat -f` succeeds-but-wrong (prints fs info), breaking the time-dedup. The + fix is a prerequisite for Task 6's deliverable (a green CI gate that runs + `hook-contracts.sh`). +- **#63 (enhancement, newly filed):** add a design-phase `adversarial-design-review` + check that surveys how the codebase implements the *artifact class* (where a scenario + stands up a server, where a fixture lives), not just the *mechanism*. Same skill the + release already edits for #59. + +The user explicitly approved addressing both ("if you found a new issue, go ahead and +address it"; "pick up the new filed issues and address them as well"). + +## Decision + +Amend the locked v6.3.0 manifest from 7 to 9 tasks (single PR unchanged): add Task 8 +(#64 session-start `stat` ordering + numeric guard + re-enable `hook-contracts.sh` in CI) +and Task 9 (#63 `Artifact-class precedent` design-phase bug-class row). Both are small, +additive, host-neutral, and thematically identical to the locked hardening scope. The +original `Locked` stamp is preserved here for audit; the plan re-stamps to `Amended` and +re-runs `alignment-check`. + +We fold them into v6.3.0 rather than deferring to a follow-on release because: #64 is +required for Task 6's CI gate to be green and real; #63 edits the same skill as #59; and +the v6.3.0 PR is still open, so one coherent release covers all the hardening the user +asked for. + +## Consequences + +v6.3.0 closes seven issues (#41/#58/#59/#60/#61/#63/#64). The `hooks-check.yml` CI gate +now runs both hook test suites green on Linux (session-start time-dedup works on Linux +for the first time — a real bug fix for all Linux autodev users, beyond the test gate). +The amendment is recorded so the scope expansion is durable, not silent. Future +scope-expansions during execution should follow the same path: surface → user approval → +ADR → manifest amend → re-align → re-lock. diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md index cd570f1..e211073 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues-design.md @@ -253,6 +253,19 @@ while the tree didn't compile / CI failed. So: of client-asserted-permission theater in plans. - **#58/#60** are process/doc; no runtime security surface. +## Amendment (2026-06-01, user-approved — see decisions/0004) + +Two items folded into v6.3.0 during execution, both user-approved (manifest 7 → 9 tasks, +PR count unchanged): +- **#64** — `hooks/session-start` time-dedup is broken on Linux (BSD `stat -f %m` tried + before GNU `stat -c %Y`; `stat -f` succeeds-but-wrong on Linux). Surfaced by Task 6's + `hooks-check.yml` running `hook-contracts.sh` on ubuntu for the first time. Fixed + (GNU-first + numeric guard) and `hook-contracts.sh` re-enabled in the CI gate. +- **#63** — new design-phase `Artifact-class precedent` bug-class: survey how the codebase + implements the *artifact class* (where a scenario stands up a server, where a fixture + lives), not just the *mechanism*; grep for sibling instances and follow the established + shape or justify divergence. + ## Infrastructure Impact None at runtime. Plugin skill/hook/doc changes + a version bump. Release path is diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues.md b/docs/plans/2026-06-01-pipeline-hardening-4issues.md index 825971a..878be0d 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues.md +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues.md @@ -15,8 +15,8 @@ ## Scope Manifest **PR Count:** 1 -**Tasks:** 7 -**Estimated Lines of Change:** ~320 (informational; not enforced) +**Tasks:** 9 +**Estimated Lines of Change:** ~340 (informational; not enforced) **Out of scope:** - A deterministic hook that *blocks* `TaskUpdate(status=completed)` on `Implement:*` (infeasible — ADR 0003). @@ -29,9 +29,9 @@ | PR # | Title | Tasks | Branch | |------|-------|-------|--------| -| 1 | feat: v6.3.0 pipeline hardening — auth/authz bug-class, pr-monitoring pattern, completion trust-boundary, hook stdout-JSON discipline, reminder dedup (#41/#58/#59/#60/#61) | Task 1, Task 2, Task 3, Task 4, Task 5, Task 6, Task 7 | feat/pipeline-hardening-4issues-v6.3.0 | +| 1 | feat: v6.3.0 pipeline hardening (#41/#58/#59/#60/#61/#63/#64) | Task 1, Task 2, Task 3, Task 4, Task 5, Task 6, Task 7, Task 8, Task 9 | feat/pipeline-hardening-4issues-v6.3.0 | -**Status:** Locked 2026-06-01T05:35:54Z +**Status:** Amended 2026-06-01T06:00:00Z (user-approved scope expansion to #63 + #64; see Amendment note + decisions/0004) --- @@ -470,6 +470,48 @@ Rollback: `scripts/bump-version.sh 6.3.0 6.2.2` + revert; pre-merge revert preve --- +### Task 8: #64 — session-start time-dedup Linux portability (amendment) + +**Discovered at execution:** Task 6's new `hooks-check.yml` ran `hook-contracts.sh` on +ubuntu for the first time and surfaced a **pre-existing** Linux bug: +`hooks/session-start` computes the last-emit mtime with BSD `stat -f %m` **before** GNU +`stat -c %Y`. On Linux `stat -f` means "file system status", succeeds (exit 0), and +prints fs info instead of the mtime → `[ "$last" -gt 0 ]` errors and the time-dedup +never suppresses re-fires. User-approved amendment (#64). + +**Files:** +- Modify: `hooks/session-start` (the `stat` ordering + numeric guard) +- Modify: `.github/workflows/hooks-check.yml` (re-enable `hook-contracts.sh` once fixed) + +**Step 1: Fix the `stat` ordering** — GNU first, then BSD, then guard numeric: +```bash +last=$(stat -c %Y "$LAST_EMIT_FILE" 2>/dev/null || stat -f %m "$LAST_EMIT_FILE" 2>/dev/null || echo 0) +case "$last" in (*[!0-9]*|'') last=0 ;; esac +``` +**Step 2: Re-enable `hook-contracts.sh` in `hooks-check.yml`** (the CI gate is now real on Linux). +**Step 3: Verify** — `bash tests/hook-contracts.sh` → `All hook contract tests passed.` +(macOS: `stat -c %Y` fails → falls to BSD `stat -f %m`; Linux: `stat -c %Y` works.) CI on +ubuntu confirms the time-dedup case now passes. +**Step 4: Commit** — `fix(session-start): GNU stat -c %Y before BSD -f %m … (#64)`. +Rollback: revert commit. + +### Task 9: #63 — Artifact-class precedent design-phase bug-class (amendment) + +**User-approved amendment (#63).** A design can pass mechanism-correctness review yet put +an artifact in the wrong *place/shape* (v1.1: a scenario test fixture inside the +production engine repo, when sibling scenarios own a `cmd/server/main.go`). + +**Files:** +- Modify: `skills/adversarial-design-review/SKILL.md` (design-phase table, after `Repo-precedent conflicts`) + +**Step 1: Insert the row** (after `Repo-precedent conflicts`): +``` +| **Artifact-class precedent** | Survey how the codebase already implements this *artifact class* — not just the *mechanism*. … Grep for sibling instances (`ls scenarios/*/cmd/server/main.go`, sibling plugins, migrations, CLI commands, fixtures) and confirm the design follows the established shape — or explicitly justifies divergence. … Run the decisive `ls`/`grep` for the artifact class, not just for the mechanism. | +``` +**Step 2: Verify** — awk confirms the row is in the **design-phase** section; `bash tests/skill-content-grep.sh` PASS. +**Step 3: Commit** — `feat(adversarial-design-review): add Artifact-class precedent design-phase bug-class (#63)`. +Rollback: revert commit (additive prose row). + ## Verification summary (change-class mapping) | Task | Change class | Verification | Expected | diff --git a/docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock b/docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock index 87d988e..f9c2454 100644 --- a/docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock +++ b/docs/plans/2026-06-01-pipeline-hardening-4issues.md.scope-lock @@ -1 +1 @@ -e93b292e724f4b39bd6d99a2fc1bff2317cd5c32ca35f6f6156bab322318f11c +815ea55f53f1673b36497497a3141abf8c0fc4d2cb5291303ade1d123046498a From 38dcb5ec18cc1debd935a4e3bbca1686fa4dfe1b Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 02:01:26 -0400 Subject: [PATCH 20/21] docs: add #63 + #64 to v6.3.0 release notes --- RELEASE-NOTES.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md index a5e612b..a08af16 100644 --- a/RELEASE-NOTES.md +++ b/RELEASE-NOTES.md @@ -28,6 +28,15 @@ observed across autonomous runs and Codex compaction. reviewer reminder now emits once per session (deduped via a `.claude/autodev-state` marker, quote-strip-matched so a quoted `--body` mentioning `gh pr create` no longer trips it) and is reset by `pre-compact-snapshot` so it re-emits once after a compaction. +- **`adversarial-design-review` — artifact-class precedent (#63):** a new design-phase row + that surveys how the codebase already implements an *artifact class* (where a scenario + stands up a server, where a fixture lives — `ls scenarios/*/cmd/server/main.go`), not + just the *mechanism*; grep for sibling instances and follow the established shape or + justify divergence. +- **`session-start` — Linux time-dedup fix (#64):** the SessionStart hook tried BSD + `stat -f %m` before GNU `stat -c %Y`; on Linux `stat -f` succeeds-but-wrong (fs info), + so the time-based dedup never suppressed re-fires. Now GNU-first with a numeric guard — + fixing re-fire spam for all Linux autodev users. - **CI:** new `hooks-check.yml` runs the hook contract + stdout-discipline tests on any `hooks/`/test change, so these fixes are regression-gated. From c19accf2cf29bc8c285b2bb9eb09fca468b07ff6 Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Mon, 1 Jun 2026 02:04:09 -0400 Subject: [PATCH 21/21] ci(hooks-check): add explicit permissions block + workflow-self path filter on PR (Copilot/CodeQL review) --- .github/workflows/hooks-check.yml | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/.github/workflows/hooks-check.yml b/.github/workflows/hooks-check.yml index d17e58e..cee7beb 100644 --- a/.github/workflows/hooks-check.yml +++ b/.github/workflows/hooks-check.yml @@ -11,6 +11,11 @@ on: - 'hooks/**' - 'tests/hook-contracts.sh' - 'tests/hook-stdout-discipline.sh' + - '.github/workflows/hooks-check.yml' + +permissions: + contents: read + jobs: hooks: runs-on: ubuntu-latest