GoCodeAlone · intel352 · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026 · Jun 1, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -9,7 +9,7 @@
     {
       "name": "autodev",
       "description": "Autonomous development workflow skills for coding agents",
-      "version": "6.2.2",
+      "version": "6.3.0",
       "source": "./",
       "author": {
         "name": "Jon Langevin",

diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "autodev",
   "description": "Autonomous development workflow skills for coding agents: design, review, planning, execution, monitoring, and retrospectives",
-  "version": "6.2.2",
+  "version": "6.3.0",
   "author": {
     "name": "Jon Langevin",
     "email": "jon@gocodealone.com"

diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json
@@ -2,7 +2,7 @@
   "name": "autodev",
   "displayName": "Autonomous Dev Kit",
   "description": "Autonomous development workflow skills for coding agents",
-  "version": "6.2.2",
+  "version": "6.3.0",
   "author": {
     "name": "Jon Langevin",
     "email": "jon@gocodealone.com"

diff --git a/.github/workflows/hooks-check.yml b/.github/workflows/hooks-check.yml
@@ -0,0 +1,29 @@
+name: Hooks Check
+on:
+  push:
+    paths:
+      - 'hooks/**'
+      - 'tests/hook-contracts.sh'
+      - 'tests/hook-stdout-discipline.sh'
+      - '.github/workflows/hooks-check.yml'
+  pull_request:
+    paths:
+      - 'hooks/**'
+      - 'tests/hook-contracts.sh'
+      - 'tests/hook-stdout-discipline.sh'
+      - '.github/workflows/hooks-check.yml'
+
+permissions:
+  contents: read
+
+jobs:
+  hooks:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Install jq
+        run: sudo apt-get update && sudo apt-get install -y jq
+      - name: Hook stdout discipline tests
+        run: bash tests/hook-stdout-discipline.sh
+      - name: Hook contract tests
+        run: bash tests/hook-contracts.sh
diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md
@@ -1,5 +1,45 @@
 # Autonomous Dev Kit Release Notes
 
+## v6.3.0 — 2026-06-01
+
+Pipeline-hardening release closing five recurring gate-miss / context-waste issues
+observed across autonomous runs and Codex compaction.
+
+- **`adversarial-design-review` — auth/authz chain-composition bug-class (#59):** a new
+  plan-phase row that walks the design's auth/authz chain component-by-component against
+  the plan's wiring and flags any gate enforced by a *client-asserted* value
+  (`evidence.granted_permissions`, a header) instead of server-side against an
+  authenticated principal.
+- **`pr-monitoring` — sanctioned bash poll-loop (#60):** documents the host-scoped
+  CI-wait pattern. Under Claude Code, a bounded `run_in_background` bash sleep-loop that
+  blocks to completion and re-invokes the lead once on settle (the prior background-Agent
+  monitor early-exited ~6× per run); Codex/Cursor use a self-poll-on-wakeup fallback.
+- **`subagent-driven-development` / `team-conventions` — completion trust-boundary (#58,
+  ADR 0003):** a flipped `Implement: N` is a claim, not evidence — the lead must run
+  `verification-before-completion` before trusting it. A deterministic hook-block is
+  infeasible (the pre-tool payload lacks the task subject + caller identity), so
+  correctness rests on lead verification, not on who flipped the checkbox.
+- **`run-hook.cmd` — stdout JSON discipline (#41):** the wrapper now captures each hook's
+  stdout and emits only valid-JSON-or-empty to the host's hook parser, recovering a block
+  decision even when a locale/diagnostic warning precedes it (previously such noise could
+  invalidate the hook's JSON). Diagnostics are routed to stderr; jq-absent hosts pass
+  through unchanged. New `tests/hook-stdout-discipline.sh`.
+- **`pretool-pr-review-reminder` — once-per-session (#61):** the gh-version/Copilot
+  reviewer reminder now emits once per session (deduped via a `.claude/autodev-state`
+  marker, quote-strip-matched so a quoted `--body` mentioning `gh pr create` no longer
+  trips it) and is reset by `pre-compact-snapshot` so it re-emits once after a compaction.
+- **`adversarial-design-review` — artifact-class precedent (#63):** a new design-phase row
+  that surveys how the codebase already implements an *artifact class* (where a scenario
+  stands up a server, where a fixture lives — `ls scenarios/*/cmd/server/main.go`), not
+  just the *mechanism*; grep for sibling instances and follow the established shape or
+  justify divergence.
+- **`session-start` — Linux time-dedup fix (#64):** the SessionStart hook tried BSD
+  `stat -f %m` before GNU `stat -c %Y`; on Linux `stat -f` succeeds-but-wrong (fs info),
+  so the time-based dedup never suppressed re-fires. Now GNU-first with a numeric guard —
+  fixing re-fire spam for all Linux autodev users.
+- **CI:** new `hooks-check.yml` runs the hook contract + stdout-discipline tests on any
+  `hooks/`/test change, so these fixes are regression-gated.
+
 ## v6.2.2 — 2026-05-31
 
 New **Existence / runtime-validity** bug-class in `adversarial-design-review`

diff --git a/agents/team-conventions.md b/agents/team-conventions.md
@@ -25,6 +25,9 @@ them.
 - Request code review per `skills/requesting-code-review/SKILL.md`
   using the adversarial-framing brief.
 - Address review per `skills/receiving-code-review/SKILL.md`.
+- **Never self-complete an `Implement: N` task.** Do not flip your own Implement task
+  to `completed` and do not clear its `blockedBy` to back-door completion. DM the
+  spec-reviewer when ready; the code-reviewer is the sole flipper of Implement tasks.
 
 ## Spec reviewer
 
@@ -46,6 +49,18 @@ them.
   `skills/requesting-code-review/SKILL.md` until verdict is SHIP-IT
   (or REVERT-AND-REWRITE after max rounds).
 - Reflexive approval is forbidden.
+- **You are the sole role that flips an `Implement: N` task to `completed`**, and only
+  after quality review passes. Flip BOTH the "Review quality:" task and the
+  corresponding "Implement:" task.
+
+## Lead / orchestrator
+
+- **A `completed` Implement-N is a claim, not evidence.** Whoever flipped it, run
+  `skills/verification-before-completion/SKILL.md` (clean build + tests + CI green)
+  before accepting the task as done or invoking
+  `skills/finishing-a-development-branch/SKILL.md`. Correctness rests on this gate, not
+  on who flipped the checkbox. See
+  `decisions/0003-implement-n-completion-trust-boundary.md`.
 
 ## Modes
 

diff --git a/decisions/0003-implement-n-completion-trust-boundary.md b/decisions/0003-implement-n-completion-trust-boundary.md
@@ -0,0 +1,66 @@
+# 0003. Implement-N completion is a lead-verified trust boundary, not a hook-blocked invariant
+
+**Status:** Accepted
+**Date:** 2026-06-01
+**Decision-makers:** autodev maintainers, autonomous pipeline
+**Related:** issue #58; `skills/subagent-driven-development/SKILL.md`; `agents/team-conventions.md`; docs/plans/2026-06-01-pipeline-hardening-4issues-design.md
+
+## Context
+
+Across infra-admin v1 + v1.1 autonomous runs, `Implement: N` tasks were flipped to
+`completed` by implementers or via a blockedBy-clear *before* the code-reviewer's
+quality gate, violating the team-conventions contract ("only code-reviewer flips
+Implement-N to completed"). In v1.1 this masked a non-compiling tree (uncommitted
+helper) and a CI-failing hash regression — both reported "done"; only the lead's
+independent `verification-before-completion` pass caught them.
+
+Issue #58 asked for **plugin/harness enforcement**: reject
+`TaskUpdate(status=completed)` on `Implement: *` tasks unless `owner ==
+code-reviewer`.
+
+We investigated whether a deterministic plugin hook can enforce this. It cannot:
+
+- The PreToolUse payload for a `TaskUpdate` call carries the *tool input*
+  (`taskId`, `status`, `owner`) but **not** the task's current `subject`
+  ("Implement: N") nor the identity of the calling subagent. The task store is
+  harness state a bash hook cannot read (there is no `TaskList` available to a
+  hook).
+- Therefore a hook cannot reliably answer the two questions the block requires —
+  "is this taskId an Implement task?" and "is the caller the code-reviewer?". The
+  only deterministic option, "block all `status=completed`", would break every
+  legitimate completion (spec-review, quality-review, and the orchestrator's own).
+- The closest feasible hook approach — a `SubagentStop` hook reading
+  `transcript_path` to map `taskId → "Implement: N"` (via `TaskCreate` records) and
+  flag a matching `TaskUpdate(status=completed)` — was also rejected: it fires
+  *after* completion (warn, not block), sees only subagent completions (not a
+  lead-level `TaskUpdate`), depends on the unstable transcript JSONL shape, and is
+  racy across subagents. Fragile transcript-format coupling for marginal,
+  after-the-fact value.
+
+## Decision
+
+We reject the infeasible hard-block and instead **shift the trust boundary**: a
+flipped `Implement: N` is a *claim*, not *evidence*, and is **not trusted as done
+until the lead runs `autodev:verification-before-completion`** (build + test from a
+clean tree, CI green) before treating the task as complete or proceeding to
+`finishing-a-development-branch`.
+
+We encode this in `skills/subagent-driven-development/SKILL.md` (the
+"Completion is not trusted until lead-verified" rule) and restate the
+implementer/code-reviewer conventions in `agents/team-conventions.md`. We do **not**
+add an advisory `TaskUpdate` hook: it cannot block, it cannot identify Implement
+tasks, and it would fire on every completion (noise) for no enforcement gain.
+
+## Consequences
+
+The harm #58 names — a premature "done" masking a broken tree — is addressed at the
+point it actually bit: the lead's verification gate, which already caught both v1.1
+regressions. The convention ("code-reviewer is the sole flipper") remains as
+team discipline but is no longer load-bearing for correctness; correctness rests on
+lead verification, which does not depend on who flipped the checkbox.
+
+The limitation is documented so the infeasible hard-block is not re-proposed. If a
+future harness exposes the task subject **and** the calling subagent identity in
+the PreToolUse payload, the deterministic block becomes feasible and this ADR
+should be revisited (the convention is then mechanically enforceable). Until then,
+"green checkbox ≠ verified" is the operative rule.
diff --git a/decisions/0004-v6.3.0-scope-amendment-63-64.md b/decisions/0004-v6.3.0-scope-amendment-63-64.md
@@ -0,0 +1,48 @@
+# 0004. v6.3.0 scope amendment — fold in #63 (artifact-class precedent) and #64 (session-start Linux stat)
+
+**Status:** Accepted
+**Date:** 2026-06-01
+**Decision-makers:** autodev maintainers, autonomous pipeline, user (explicit approval)
+**Related:** issues #63, #64; docs/plans/2026-06-01-pipeline-hardening-4issues.md (Tasks 8, 9); ADR 0003
+
+## Context
+
+The v6.3.0 pipeline-hardening plan was locked with 7 tasks (#41/#58/#59/#60/#61 + CI
+gate + version bump). During execution two new items arose:
+
+- **#64 (bug, execution-discovered):** Task 6's new `hooks-check.yml` ran
+  `tests/hook-contracts.sh` on ubuntu for the first time and surfaced a **pre-existing**
+  Linux bug in `hooks/session-start` — BSD `stat -f %m` is tried before GNU `stat -c %Y`,
+  and on Linux `stat -f` succeeds-but-wrong (prints fs info), breaking the time-dedup. The
+  fix is a prerequisite for Task 6's deliverable (a green CI gate that runs
+  `hook-contracts.sh`).
+- **#63 (enhancement, newly filed):** add a design-phase `adversarial-design-review`
+  check that surveys how the codebase implements the *artifact class* (where a scenario
+  stands up a server, where a fixture lives), not just the *mechanism*. Same skill the
+  release already edits for #59.
+
+The user explicitly approved addressing both ("if you found a new issue, go ahead and
+address it"; "pick up the new filed issues and address them as well").
+
+## Decision
+
+Amend the locked v6.3.0 manifest from 7 to 9 tasks (single PR unchanged): add Task 8
+(#64 session-start `stat` ordering + numeric guard + re-enable `hook-contracts.sh` in CI)
+and Task 9 (#63 `Artifact-class precedent` design-phase bug-class row). Both are small,
+additive, host-neutral, and thematically identical to the locked hardening scope. The
+original `Locked` stamp is preserved here for audit; the plan re-stamps to `Amended` and
+re-runs `alignment-check`.
+
+We fold them into v6.3.0 rather than deferring to a follow-on release because: #64 is
+required for Task 6's CI gate to be green and real; #63 edits the same skill as #59; and
+the v6.3.0 PR is still open, so one coherent release covers all the hardening the user
+asked for.
+
+## Consequences
+
+v6.3.0 closes seven issues (#41/#58/#59/#60/#61/#63/#64). The `hooks-check.yml` CI gate
+now runs both hook test suites green on Linux (session-start time-dedup works on Linux
+for the first time — a real bug fix for all Linux autodev users, beyond the test gate).
+The amendment is recorded so the scope expansion is durable, not silent. Future
+scope-expansions during execution should follow the same path: surface → user approval →
+ADR → manifest amend → re-align → re-lock.