From 52f0449e5a80b1ef8e5649c8b9fd08ee0ad816ef Mon Sep 17 00:00:00 2001 From: Jon Langevin Date: Wed, 3 Jun 2026 13:31:12 -0400 Subject: [PATCH] docs(retro): pipeline evidence + doc-sync hardening v6.4.0 (#69-72) --- ...-06-03-pipeline-evidence-doc-sync-retro.md | 74 +++++++++++++++++++ 1 file changed, 74 insertions(+) create mode 100644 docs/retros/2026-06-03-pipeline-evidence-doc-sync-retro.md diff --git a/docs/retros/2026-06-03-pipeline-evidence-doc-sync-retro.md b/docs/retros/2026-06-03-pipeline-evidence-doc-sync-retro.md new file mode 100644 index 0000000..bd5af74 --- /dev/null +++ b/docs/retros/2026-06-03-pipeline-evidence-doc-sync-retro.md @@ -0,0 +1,74 @@ +# Retro: Pipeline Evidence + Doc-Sync Hardening + +**PR:** #73 — Pipeline evidence + doc-sync hardening (#69 #70 #71 #72) → v6.4.0 +**Merged:** 2026-06-03 +**Branch:** feat/pipeline-evidence-doc-sync +**Design:** docs/plans/2026-06-03-pipeline-evidence-doc-sync-design.md +**Plan:** docs/plans/2026-06-03-pipeline-evidence-doc-sync.md +**Committed adversarial reports (dogfood of #69):** docs/plans/2026-06-03-pipeline-evidence-doc-sync-design-review.md · -plan-review.md +**Related ADRs:** none (no new non-trivial trade-off beyond what the design captured) + +## Adversarial-review findings, scored + +Scored directly from the committed review reports — the first end-to-end exercise of #69's new convention. + +| Phase | Finding | Severity | Outcome | +|---|---|---|---| +| design | D1 — invented `-adversarial-.md` name diverged from existing `-design-review.md`/`-plan-review.md` convention; "never written" premise overstated (2 files already existed) | Critical | Prescient — adopting the existing name was the right call; the whole feature reframed from "new artifact" to "systematize ad-hoc practice" | +| design | D2 — retro fix would leave the format template (`:99`) + Reads bullet still pointing at the kit-local script | Critical | Prescient — became the 3-edit-site requirement in Task 4; verified by the regression test's negative assertion | +| design | D3 — dogfood framing misleading (skill edits don't take effect until their task lands) | Important | Resolved upfront — design + Task 5 noted manual emulation; no downstream confusion | +| design | D4 — phase attribution conflated `ev:"skill"` vs `ev:"agent"` records | Important | Resolved upfront — clarified to key off `ev:"skill"` args | +| design | D5 — `Resolution` field per-cycle-mutable with no consumer (YAGNI) | Important | Resolved upfront — reframed optional/end-state + wired retro consumer | +| design | D6 — Step 1e judgment gate could silently self-pass (the user's "trap") | Important | Resolved upfront — narrowed trigger + PR-body accountability token | +| design (cycle 2) | N1 — stem-derivation ambiguous for plan files → could break D1↔D2 path contract | Critical | Prescient — deterministic two-case rule + worked examples; later test-guarded | +| design (cycle 2) | N2 — Step 1e absent from finishing's Autonomous Mode list → never fires autonomously | Important | Prescient — the single highest-value catch; without it the gate was dead-on-arrival in autonomous runs | +| design (cycle 2) | N3 — PR-body token had no wired retro consumer (aspirational) | Important | Resolved upfront — wired to retro Step 5 missed-activation row | +| plan | P1 — two test assertions pre-green (matched ambient "committed"/"in-progress.jsonl" prose) → weak RED | Important | Prescient — code review independently re-derived the precision need (M1); tightened both | +| plan | P2 — `skill-cross-refs.sh` run locally but not wired into CI | Important | Resolved upfront — wired into `skill-content-check.yml` for free | +| code | I1 — finding-ID numbering ambiguous (sequential-all vs per-severity) | Important | Resolved upfront — clarifying parenthetical added | +| code | M1 — D1↔D2 test assertion OR-branch too broad | Minor | Resolved upfront — narrowed to the specific load-bearing phrase + revert-restore proven | + +## Gate misses + +No gate misses this PR. Every code-review finding (I1, M1) was a refinement of contracts the design/plan adversarial passes had already established, not a class they missed; both were caught before merge. CI was green on the first push (CodeQL, skill-content-check, version-check, Analyze ×3) — no CI failure slipped any local gate. The plan-phase reviewer's P1 (weak RED assertions) and the code reviewer's M1 (broad OR) converged on the same test-precision issue from two angles — a healthy redundancy, not a miss. + +## Missed skill activations + +Full canonical chain fired (verified from session memory + the partial activation log; see the dogfood note below for why the log is partial): + +| Gate | Fired? | Notes | +|---|---|---| +| brainstorming | yes | logged in `in-progress.jsonl` | +| adversarial-design-review (design) | yes | 3 cycles → PASS; committed report exists | +| writing-plans | yes | | +| adversarial-design-review (plan) | yes | PASS; committed report exists | +| alignment-check | yes | PASS (1 drift item resolved) | +| scope-lock | yes | Locked + `.scope-lock` sha256 6cb6ebd9… | +| subagent-driven-development | yes | 1 implementer (sequential; tasks share files) | +| requesting-code-review (adversarial) | yes | APPROVE, 0 Critical | +| finishing-a-development-branch | yes | Step 1d PASS; Step 1e dogfooded (`Doc-reconciliation: clean`) | +| pr-monitoring | yes | bash poll-loop (sanctioned pattern), not a subagent | +| post-merge-retrospective | yes | this document | + +## What worked + +- **The phantom-dependency catch (design D1) reframed the whole feature.** The reviewer found 2 pre-existing `-design-review.md` files, disproving the "report is never written" premise. The fix became "systematize the ad-hoc practice under the existing name" instead of "invent a new artifact" — smaller, precedent-aligned, less bloat. This is the design-phase adversarial review doing exactly its job. +- **Cycle-2 N2 (Step 1e missing from the Autonomous Mode list) was the highest-value catch.** A doc-reconciliation gate that exists in the skill body but not the autonomous control-flow list would have been dead on arrival in every autonomous run — the precise "trap" the user warned about. Caught before a line of code. +- **#69 dogfooded end-to-end.** Both committed review reports rode the squash-merge to `main`; this retro scored findings by reading them — no transcript reconstruction. The feature proved itself on its own pipeline. +- **Bloat discipline held.** Net +~85 skill lines across 3 files, no new skill, no scanner — the user's explicit constraint. The two largest skills (writing-skills 740, TDD 522) were untouched. + +## What didn't + +- **#70's dogfood was only partial — and surfaced a residual the design under-weighted.** `record-activity` writes `/.claude/autodev-state/in-progress.jsonl`, but the pipeline executed in a git worktree that was removed on cleanup, and the lead's session-cwd log captured only a fraction of the chain. So the retro's activation table leaned on session memory, not purely the log — the exact reconstruction #70 set out to eliminate. The fix is real (no more "script does not exist"), but **worktree-based execution + cwd-scoping fragments the very log the retro now depends on.** Assumption A2 ("Skill-invoked gates are what the retro needs") held; the unspoken assumption "the log lives where the retro looks" did not. +- **No `docs/design-guidance.md` exists**, so durable lessons keep landing in retros + memory rather than a single inherited guidance file. Recurring across many retros now. + +## Plugin-level follow-ups + +1. **Activation-log location robustness (new, from #70's partial dogfood).** The retro reads `/.claude/autodev-state/in-progress.jsonl`, but `record-activity` writes to the hook payload's `.cwd`, which for worktree-based pipelines is often the session cwd or a since-removed worktree. Candidate fixes: have `record-activity` also/instead anchor to the git common-dir (`git rev-parse --git-common-dir`) so worktrees share one log; or have the retro union the logs from the repo root + any sibling worktrees. **File as a follow-up issue** — this is the natural #70 successor, not a regression of it. One occurrence so far (this retro); watch for a second before hard-committing to a design. +2. **`docs/design-guidance.md` bootstrap (recurring).** Multiple retros (this one, and prior 6.3.x retros) have noted the absence of a canonical guidance file. Consider seeding one with the durable principles that keep recurring (anti-bloat for skills; phantom-dependency / circular-logic class; "the log must live where the consumer reads it"). Not urgent; flagged as a trend. + +## Project guidance updates + +| Guidance file | Change | Reason | +|---|---|---| +| `docs/design-guidance.md` | no change (absent) | The durable lesson ("an evidence artifact is only useful if it lives where its consumer reads it — verify location, not just existence") is captured here + in memory; no guidance file exists to update. Bootstrapping one is follow-up #2 above, not in this PR's scope. |