Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
f5bde9f
docs: design + ADR 0003 for v6.3.0 pipeline hardening (#41/#58/#59/#60)
Jun 1, 2026
d686975
docs: revise v6.3.0 design per adversarial cycle 1 (C1 JSON-extract +…
Jun 1, 2026
0a23755
docs: revise v6.3.0 design per adversarial cycle 2 (I-NEW-1 pre-compa…
Jun 1, 2026
85f58ac
docs: v6.3.0 design PASS adversarial cycle-3 (converged; empty-sessio…
Jun 1, 2026
8875c9b
docs: implementation plan for v6.3.0 pipeline hardening (#41/#58/#59/…
Jun 1, 2026
97c4208
docs: revise v6.3.0 plan per adversarial plan-phase (I1 quote-strip +…
Jun 1, 2026
7a1c35c
docs: plan PASS adversarial plan-phase cycle-2 (converged; sed-order …
Jun 1, 2026
9a32eb7
chore: lock scope for v6.3.0 pipeline hardening (alignment passed)
Jun 1, 2026
54bf62a
feat(adversarial-design-review): add auth/authz chain-composition pla…
Jun 1, 2026
cd1fbfe
docs(pr-monitoring): sanction the bash poll-loop CI-wait pattern, hos…
Jun 1, 2026
7714901
docs(subagent-driven-development): completion trust-boundary for Impl…
Jun 1, 2026
ff3489a
fix(run-hook.cmd): enforce stdout JSON discipline, recover block deci…
Jun 1, 2026
2b5ddc4
fix(hooks): pr-review reminder once-per-session + PreCompact reset (#61)
Jun 1, 2026
64bdda9
ci: run hook contract + stdout-discipline tests on hooks/tests change…
Jun 1, 2026
0a14316
chore: bump version to 6.3.0 (#41/#58/#59/#60/#61)
Jun 1, 2026
308cecf
fix(hooks): grep -vxF full-line diagnostic routing + atomic marker re…
Jun 1, 2026
5d00fec
feat(adversarial-design-review): add Artifact-class precedent design-…
Jun 1, 2026
4854f42
fix(session-start): GNU stat -c %Y before BSD -f %m for Linux time-de…
Jun 1, 2026
7f49e0e
chore: amend v6.3.0 scope — fold in #63 (artifact-class precedent) + …
Jun 1, 2026
38dcb5e
docs: add #63 + #64 to v6.3.0 release notes
Jun 1, 2026
c19accf
ci(hooks-check): add explicit permissions block + workflow-self path …
Jun 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
{
"name": "autodev",
"description": "Autonomous development workflow skills for coding agents",
"version": "6.2.2",
"version": "6.3.0",
"source": "./",
"author": {
"name": "Jon Langevin",
Expand Down
2 changes: 1 addition & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "autodev",
"description": "Autonomous development workflow skills for coding agents: design, review, planning, execution, monitoring, and retrospectives",
"version": "6.2.2",
"version": "6.3.0",
"author": {
"name": "Jon Langevin",
"email": "jon@gocodealone.com"
Expand Down
2 changes: 1 addition & 1 deletion .cursor-plugin/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "autodev",
"displayName": "Autonomous Dev Kit",
"description": "Autonomous development workflow skills for coding agents",
"version": "6.2.2",
"version": "6.3.0",
"author": {
"name": "Jon Langevin",
"email": "jon@gocodealone.com"
Expand Down
29 changes: 29 additions & 0 deletions .github/workflows/hooks-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Hooks Check
on:
push:
paths:
- 'hooks/**'
- 'tests/hook-contracts.sh'
- 'tests/hook-stdout-discipline.sh'
- '.github/workflows/hooks-check.yml'
pull_request:
paths:
- 'hooks/**'
- 'tests/hook-contracts.sh'
- 'tests/hook-stdout-discipline.sh'
- '.github/workflows/hooks-check.yml'

permissions:
contents: read

jobs:
hooks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install jq
run: sudo apt-get update && sudo apt-get install -y jq
- name: Hook stdout discipline tests
run: bash tests/hook-stdout-discipline.sh
Comment on lines +9 to +27
- name: Hook contract tests
run: bash tests/hook-contracts.sh
Comment thread
github-advanced-security[bot] marked this conversation as resolved.
Fixed
40 changes: 40 additions & 0 deletions RELEASE-NOTES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,45 @@
# Autonomous Dev Kit Release Notes

## v6.3.0 — 2026-06-01

Pipeline-hardening release closing five recurring gate-miss / context-waste issues
observed across autonomous runs and Codex compaction.

- **`adversarial-design-review` — auth/authz chain-composition bug-class (#59):** a new
plan-phase row that walks the design's auth/authz chain component-by-component against
the plan's wiring and flags any gate enforced by a *client-asserted* value
(`evidence.granted_permissions`, a header) instead of server-side against an
authenticated principal.
- **`pr-monitoring` — sanctioned bash poll-loop (#60):** documents the host-scoped
CI-wait pattern. Under Claude Code, a bounded `run_in_background` bash sleep-loop that
blocks to completion and re-invokes the lead once on settle (the prior background-Agent
monitor early-exited ~6× per run); Codex/Cursor use a self-poll-on-wakeup fallback.
- **`subagent-driven-development` / `team-conventions` — completion trust-boundary (#58,
ADR 0003):** a flipped `Implement: N` is a claim, not evidence — the lead must run
`verification-before-completion` before trusting it. A deterministic hook-block is
infeasible (the pre-tool payload lacks the task subject + caller identity), so
correctness rests on lead verification, not on who flipped the checkbox.
- **`run-hook.cmd` — stdout JSON discipline (#41):** the wrapper now captures each hook's
stdout and emits only valid-JSON-or-empty to the host's hook parser, recovering a block
decision even when a locale/diagnostic warning precedes it (previously such noise could
invalidate the hook's JSON). Diagnostics are routed to stderr; jq-absent hosts pass
through unchanged. New `tests/hook-stdout-discipline.sh`.
- **`pretool-pr-review-reminder` — once-per-session (#61):** the gh-version/Copilot
reviewer reminder now emits once per session (deduped via a `.claude/autodev-state`
marker, quote-strip-matched so a quoted `--body` mentioning `gh pr create` no longer
trips it) and is reset by `pre-compact-snapshot` so it re-emits once after a compaction.
- **`adversarial-design-review` — artifact-class precedent (#63):** a new design-phase row
that surveys how the codebase already implements an *artifact class* (where a scenario
stands up a server, where a fixture lives — `ls scenarios/*/cmd/server/main.go`), not
just the *mechanism*; grep for sibling instances and follow the established shape or
justify divergence.
- **`session-start` — Linux time-dedup fix (#64):** the SessionStart hook tried BSD
`stat -f %m` before GNU `stat -c %Y`; on Linux `stat -f` succeeds-but-wrong (fs info),
so the time-based dedup never suppressed re-fires. Now GNU-first with a numeric guard —
fixing re-fire spam for all Linux autodev users.
- **CI:** new `hooks-check.yml` runs the hook contract + stdout-discipline tests on any
`hooks/`/test change, so these fixes are regression-gated.

## v6.2.2 — 2026-05-31

New **Existence / runtime-validity** bug-class in `adversarial-design-review`
Expand Down
15 changes: 15 additions & 0 deletions agents/team-conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ them.
- Request code review per `skills/requesting-code-review/SKILL.md`
using the adversarial-framing brief.
- Address review per `skills/receiving-code-review/SKILL.md`.
- **Never self-complete an `Implement: N` task.** Do not flip your own Implement task
to `completed` and do not clear its `blockedBy` to back-door completion. DM the
spec-reviewer when ready; the code-reviewer is the sole flipper of Implement tasks.

## Spec reviewer

Expand All @@ -46,6 +49,18 @@ them.
`skills/requesting-code-review/SKILL.md` until verdict is SHIP-IT
(or REVERT-AND-REWRITE after max rounds).
- Reflexive approval is forbidden.
- **You are the sole role that flips an `Implement: N` task to `completed`**, and only
after quality review passes. Flip BOTH the "Review quality:" task and the
corresponding "Implement:" task.

## Lead / orchestrator

- **A `completed` Implement-N is a claim, not evidence.** Whoever flipped it, run
`skills/verification-before-completion/SKILL.md` (clean build + tests + CI green)
before accepting the task as done or invoking
`skills/finishing-a-development-branch/SKILL.md`. Correctness rests on this gate, not
on who flipped the checkbox. See
`decisions/0003-implement-n-completion-trust-boundary.md`.

## Modes

Expand Down
66 changes: 66 additions & 0 deletions decisions/0003-implement-n-completion-trust-boundary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# 0003. Implement-N completion is a lead-verified trust boundary, not a hook-blocked invariant

**Status:** Accepted
**Date:** 2026-06-01
**Decision-makers:** autodev maintainers, autonomous pipeline
**Related:** issue #58; `skills/subagent-driven-development/SKILL.md`; `agents/team-conventions.md`; docs/plans/2026-06-01-pipeline-hardening-4issues-design.md

## Context

Across infra-admin v1 + v1.1 autonomous runs, `Implement: N` tasks were flipped to
`completed` by implementers or via a blockedBy-clear *before* the code-reviewer's
quality gate, violating the team-conventions contract ("only code-reviewer flips
Implement-N to completed"). In v1.1 this masked a non-compiling tree (uncommitted
helper) and a CI-failing hash regression — both reported "done"; only the lead's
independent `verification-before-completion` pass caught them.

Issue #58 asked for **plugin/harness enforcement**: reject
`TaskUpdate(status=completed)` on `Implement: *` tasks unless `owner ==
code-reviewer`.

We investigated whether a deterministic plugin hook can enforce this. It cannot:

- The PreToolUse payload for a `TaskUpdate` call carries the *tool input*
(`taskId`, `status`, `owner`) but **not** the task's current `subject`
("Implement: N") nor the identity of the calling subagent. The task store is
harness state a bash hook cannot read (there is no `TaskList` available to a
hook).
- Therefore a hook cannot reliably answer the two questions the block requires —
"is this taskId an Implement task?" and "is the caller the code-reviewer?". The
only deterministic option, "block all `status=completed`", would break every
legitimate completion (spec-review, quality-review, and the orchestrator's own).
- The closest feasible hook approach — a `SubagentStop` hook reading
`transcript_path` to map `taskId → "Implement: N"` (via `TaskCreate` records) and
flag a matching `TaskUpdate(status=completed)` — was also rejected: it fires
*after* completion (warn, not block), sees only subagent completions (not a
lead-level `TaskUpdate`), depends on the unstable transcript JSONL shape, and is
racy across subagents. Fragile transcript-format coupling for marginal,
after-the-fact value.

## Decision

We reject the infeasible hard-block and instead **shift the trust boundary**: a
flipped `Implement: N` is a *claim*, not *evidence*, and is **not trusted as done
until the lead runs `autodev:verification-before-completion`** (build + test from a
clean tree, CI green) before treating the task as complete or proceeding to
`finishing-a-development-branch`.

We encode this in `skills/subagent-driven-development/SKILL.md` (the
"Completion is not trusted until lead-verified" rule) and restate the
implementer/code-reviewer conventions in `agents/team-conventions.md`. We do **not**
add an advisory `TaskUpdate` hook: it cannot block, it cannot identify Implement
tasks, and it would fire on every completion (noise) for no enforcement gain.

## Consequences

The harm #58 names — a premature "done" masking a broken tree — is addressed at the
point it actually bit: the lead's verification gate, which already caught both v1.1
regressions. The convention ("code-reviewer is the sole flipper") remains as
team discipline but is no longer load-bearing for correctness; correctness rests on
lead verification, which does not depend on who flipped the checkbox.

The limitation is documented so the infeasible hard-block is not re-proposed. If a
future harness exposes the task subject **and** the calling subagent identity in
the PreToolUse payload, the deterministic block becomes feasible and this ADR
should be revisited (the convention is then mechanically enforceable). Until then,
"green checkbox ≠ verified" is the operative rule.
48 changes: 48 additions & 0 deletions decisions/0004-v6.3.0-scope-amendment-63-64.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# 0004. v6.3.0 scope amendment — fold in #63 (artifact-class precedent) and #64 (session-start Linux stat)

**Status:** Accepted
**Date:** 2026-06-01
**Decision-makers:** autodev maintainers, autonomous pipeline, user (explicit approval)
**Related:** issues #63, #64; docs/plans/2026-06-01-pipeline-hardening-4issues.md (Tasks 8, 9); ADR 0003

## Context

The v6.3.0 pipeline-hardening plan was locked with 7 tasks (#41/#58/#59/#60/#61 + CI
gate + version bump). During execution two new items arose:

- **#64 (bug, execution-discovered):** Task 6's new `hooks-check.yml` ran
`tests/hook-contracts.sh` on ubuntu for the first time and surfaced a **pre-existing**
Linux bug in `hooks/session-start` — BSD `stat -f %m` is tried before GNU `stat -c %Y`,
and on Linux `stat -f` succeeds-but-wrong (prints fs info), breaking the time-dedup. The
fix is a prerequisite for Task 6's deliverable (a green CI gate that runs
`hook-contracts.sh`).
- **#63 (enhancement, newly filed):** add a design-phase `adversarial-design-review`
check that surveys how the codebase implements the *artifact class* (where a scenario
stands up a server, where a fixture lives), not just the *mechanism*. Same skill the
release already edits for #59.

The user explicitly approved addressing both ("if you found a new issue, go ahead and
address it"; "pick up the new filed issues and address them as well").

## Decision

Amend the locked v6.3.0 manifest from 7 to 9 tasks (single PR unchanged): add Task 8
(#64 session-start `stat` ordering + numeric guard + re-enable `hook-contracts.sh` in CI)
and Task 9 (#63 `Artifact-class precedent` design-phase bug-class row). Both are small,
additive, host-neutral, and thematically identical to the locked hardening scope. The
original `Locked` stamp is preserved here for audit; the plan re-stamps to `Amended` and
re-runs `alignment-check`.

We fold them into v6.3.0 rather than deferring to a follow-on release because: #64 is
required for Task 6's CI gate to be green and real; #63 edits the same skill as #59; and
the v6.3.0 PR is still open, so one coherent release covers all the hardening the user
asked for.

## Consequences

v6.3.0 closes seven issues (#41/#58/#59/#60/#61/#63/#64). The `hooks-check.yml` CI gate
now runs both hook test suites green on Linux (session-start time-dedup works on Linux
for the first time — a real bug fix for all Linux autodev users, beyond the test gate).
The amendment is recorded so the scope expansion is durable, not silent. Future
scope-expansions during execution should follow the same path: surface → user approval →
ADR → manifest amend → re-align → re-lock.
Loading
Loading