Skip to content

chore(deps-dev): bump @commitlint/cli from 19.8.1 to 21.0.0#174

Open
dependabot[bot] wants to merge 47 commits intomainfrom
dependabot/npm_and_yarn/commitlint/cli-21.0.0
Open

chore(deps-dev): bump @commitlint/cli from 19.8.1 to 21.0.0#174
dependabot[bot] wants to merge 47 commits intomainfrom
dependabot/npm_and_yarn/commitlint/cli-21.0.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github May 8, 2026

Bumps @commitlint/cli from 19.8.1 to 21.0.0.

Release notes

Sourced from @​commitlint/cli's releases.

v21.0.0

Heads-up: --legacy-output is a transitional escape hatch. It will be removed in a future major release. Plan to migrate your parsers / snapshots to the new format during the v21 lifecycle.

21.0.0 (2026-05-08)

Breaking

Fixes

Internals (Node 22 cleanup)

  • chore: replace dependencies with Node 22 built-ins by @​escapedcat in #4681 — drops glob, fast-glob, import-meta-resolve, minimist, fs-extra
  • refactor: replace read-pkg with native fs.readFile + JSON.parse by @​escapedcat in #4742
  • chore: update dependency yargs to v18 by @​escapedcat in #4686
  • chore: remove cross-env, move env vars to vitest config by @​escapedcat in #4684

Dependency updates

Full Changelog: conventional-changelog/commitlint@v20.5.3...v21.0.0

v20.5.3

20.5.3 (2026-04-30)

Refactor

Docs

New Contributors

Full Changelog: conventional-changelog/commitlint@v20.5.2...v20.5.3

v20.5.2

... (truncated)

Changelog

Sourced from @​commitlint/cli's changelog.

21.0.0 (2026-05-08)

BREAKING CHANGES

  • drop node v18 and v20 support
  • Bump engines to >=v22 in all 39 package.json files
  • Update @​types/node to ^22.0.0
  • Update CI matrix to [22, 24]
  • Update Ubuntu baseline job to ubuntu:26.04
  • Update Dockerfile.ci, .mise.toml, .codesandbox/ci.json
  • Update pre-commit hook to use --ignore-engines
  • Update README and docs

Co-authored-by: Claude Opus 4.6 (1M context) noreply@anthropic.com

20.5.3 (2026-04-30)

Note: Version bump only for package @​commitlint/cli

20.5.2 (2026-04-25)

Note: Version bump only for package @​commitlint/cli

20.5.0 (2026-03-15)

Bug Fixes

  • cli: validate that --cwd directory exists before execution (#4658) (cf80f75), closes #4595

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

pmclSF and others added 30 commits May 1, 2026 20:01
…bing

## Summary

Foundation work for the 0.2 release per `docs/release/0.2.md`. **35 commits** covering all five critical-path items plus the bulk of parallelizable work.

## Critical path (5/5 done)

1. **Schema lock + manifest CI gate** (`98336e6`)
2. **SignalV2 schema** (`03998e1`)
3. **Severity rubric** (`7fd018f`)
4. **Calibration corpus infrastructure** (`e67e129`) + Wilson 95% intervals (`0bb7e67`)
5. **Tree-sitter parser pool** (`b450145`)

## AI detectors — 10 of 12 stable, 1 experimental, 2 still planned

| Signal | Status | Severity | Source |
|---|---|---|---|
| `aiHardcodedAPIKey` | stable | Critical | static |
| `aiNonDeterministicEval` | stable | Medium | static |
| `aiModelDeprecationRisk` | stable | Medium | static |
| `aiPromptInjectionRisk` | experimental | High | static |
| `aiToolWithoutSandbox` | stable | High | static |
| `aiSafetyEvalMissing` | stable | High | graph |
| `aiHallucinationRate` | stable | High | runtime/eval |
| `aiCostRegression` | stable | Medium | runtime/eval + baseline |
| `aiRetrievalRegression` | stable | High | runtime/eval + baseline |
| `aiPromptVersioning` | stable | Medium | static |

`aiFewShotContamination` and `aiEmbeddingModelChange` remain — both need similarity comparison or content-hash infrastructure that didn't fit cleanly in this PR.

## Eval-framework adapters (3/3 done)

- **Promptfoo** (`93d4782`, `34f0e01`, `f1d6b22`) — handles v3 and v4+ shapes
- **DeepEval** (`c963501`) — testCases × metricsData → EvalRunResult
- **Ragas** (`7bde060`) — pulls every numeric named-score into NamedScores

All three target the same `EvalRunEnvelope` shape; CLI flags `--promptfoo-results`, `--deepeval-results`, `--ragas-results` ingest in parallel during analyze.

## Baseline snapshot mechanism (`23e3e2b`, `93c63c4`)

`--baseline path/to/old.json` attaches a previous snapshot for regression-aware detectors. `aiCostRegression` and `aiRetrievalRegression` consume it via paired-case comparison.

## Confidence intervals (`0bb7e67`)

`airun.WilsonInterval95` + `calibration.MetricInterval` — Wilson score 95% CI for binomial proportions. Detectors emit heuristic intervals today; once corpus is populated those swap to corpus-derived bounds.

## terrain ai run captures eval output (`d6f9f9d`)

Promptfoo / DeepEval / Ragas runs now write structured output to a temp file, get parsed via the right adapter, land in `snap.EvalRuns`, and stash on the persisted Artifact.

## Test discovery improvements

- **Hierarchical Go t.Run** (`d85b3be`) — full SuiteHierarchy preserved across nesting
- **Pytest parametrize value extraction** (`d85b3be`) — `Values []string` per parametrize row
- **JUnit 5 `@Nested` + `@DisplayName`** (`44e88f8`)
- **Pytest fixture dependency graph** (`aae50de`) — `Dependencies []string` per FixtureSurface
- **Vitest in-source tests** (`f003fd4`) — `import.meta.vitest` files discovered as test-bearing
- **TSConfig path resolution** (`fc5c865`) — extends chain + multi-target + jsconfig.json fallback

## AI surface detection expansion (`831cfab`)

Datasets (.jsonl/.parquet/...), DB-cursor + SQL retrieval (psycopg2/pgvector/Elasticsearch knn), in-memory FAISS, MCP tool decorators.

## Conversion (`a236871`, `8ce4121`, `a32c7a4`)

- `terrain convert --preview` with real LCS-based unified diff
- Per-file confidence: `ItemsCovered` / `ItemsLossy` / `Confidence`
- `.terrain/conversion-history/log.jsonl` audit trail

## Hardening + ops gates

- **Performance regression gate** (`2eab842`) — >10% regression fails the PR
- **Auto-generated rule docs** (`78fad2b`) — 68 stubs + drift gates
- **Cosign hard-fail** (`6e8ebd9`) — npm postinstall switched from warn-only
- **SLSA L2 build provenance** (`225e1c7`) — `actions/attest-build-provenance@v3` per archive
- **Husky pre-commit** (`56006ed`) — secrets-shaped + archive blocklists
- **Dependency pin rationale** (`153bd5e`)

## Documentation (`8459bfd`)

- README "What Terrain Is Not" + GitHub Actions templates
- Persona guides: `docs/personas/{frontend,backend,ai,manager}.md`
- Comparison guide: `docs/compare/codecov-sonar-launchable.md`

## Misc

- (`ecbc59e`) GitHub secret scanner alert resolved — synthetic test fixtures split at compile time

## Still pending in 0.2

- 2 AI detectors (`aiFewShotContamination`, `aiEmbeddingModelChange`) — need similarity comparison + content-hash infrastructure
- CLI restructure 43 → ~15 canonical commands + universal flag schema (well-bounded but large mechanical change worth its own PR)
- Surface detection deeper RAG plumbing (chunker / retriever inference)
- Conversion top-3 fixture corpora to A-grade (content work, ~12 SP)
- Scoring v2 — band re-anchoring to corpus percentiles (blocks on corpus content)

## Test plan

- [x] `go test ./internal/... ./cmd/... -count=1` — all packages green
- [x] `make docs-verify` — manifest JSON + severity rubric + 68 rule-doc stubs all in sync
- [x] `make calibrate` — runs end-to-end against `tests/calibration/`
- [x] `terrain analyze --promptfoo-results sample.json` smoke-tested end-to-end
- [x] `terrain analyze --baseline old.json --promptfoo-results new.json` exercises regression detectors
- [x] `terrain convert --preview` smoke-tested end-to-end (Jest → Vitest)
- [x] AI detectors verified to fire on smoke fixtures
- [ ] CI: full pipeline including new docs-verify, bench-gate, drift gates, attestations

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…rieval)

## Summary

Closes the eval-data-detector calibration gap from PR #123's followups.
Three detectors that ingest framework eval results now have anchors in the
calibration corpus: `aiHallucinationRate`, `aiCostRegression`, and
`aiRetrievalRegression`. With these landed, all 12/12 AI detectors from the
round-4 plan have at least one calibration fixture at 1.00 precision/recall.

## Changes

- **Calibration runner extension**: auto-discovers per-fixture eval artifacts
  (`eval-runs/{promptfoo,deepeval,ragas}.json` and `baseline.json`) and feeds
  them to the pipeline as `PipelineOptions`. Fixtures without these files
  behave as before.

- **Path-relative matching in `matchFixture`**: eval-data detectors stamp the
  artifact's absolute path into `Signal.Location.File`; the matcher now
  normalises against the fixture dir so labels stay portable.

- **Baseline-from-fixture helper**: `buildBaselineFile` synthesises a baseline
  snapshot from `baseline/eval-runs/{promptfoo,deepeval,ragas}.json` rather
  than requiring a hand-written snapshot JSON with base64-encoded payloads.
  Regression-shaped fixtures only need two pairs of framework JSON files.

- **Three new calibration fixtures**:
  - `eval-hallucination-rate` — Promptfoo run with 3/8 hallucinated cases
    (37.5% > 5% threshold).
  - `eval-cost-regression` — paired runs, avg cost-per-case rose
    0.0028 → 0.0084 (+200% vs 25% threshold).
  - `eval-retrieval-regression` — paired runs with `context_relevance` avg
    dropping 0.90 → 0.59 (−31 pts vs 5-pt threshold).

## Corpus state after merge

- 27 fixtures × 33 distinct signal types at 1.00 precision/recall
- All 12/12 AI detectors covered
- Calibration gate is load-bearing — any future change that drops a labelled
  signal trips CI.

## Test plan

- [x] `make calibrate` reports 27/33 at 100%
- [x] `go test ./internal/... ./cmd/... -count=1` clean locally
- [ ] CI green on Ubuntu / macOS / Windows

🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Summary

Implements the noun.verb restructure of the Terrain CLI surface. Compresses
35 top-level commands to 11 canonical, all non-breaking — legacy commands
keep working through 0.2 with a deprecation note in 0.2.x and removal in 0.3.

## The 11

```
1.  terrain init
2.  terrain analyze              (--against=<ref>, --policy=<file> in phase B)
3.  terrain report <verb>        # 9 verbs: summary, insights, metrics, explain,
                                 #          show, impact, pr, posture, select-tests
4.  terrain migrate <verb>       # 11 verbs covering convert + migrate + migration
5.  terrain ai <verb>            (already grouped)
6.  terrain portfolio <verb>     (already grouped)
7.  terrain config <verb>        # feedback, telemetry
8.  terrain doctor
9.  terrain debug <verb>         (already grouped)
10. terrain serve
11. terrain version
```

## Two former top-level commands collapse into flags

- `focus` → `report summary --focus=<path>` (also valid on insights/metrics)
- `export` → `--output=<path>` flag pattern across all verbs

## Migration strategy (non-breaking)

1. **0.2 (this PR)** — namespaces ship as aliases. Legacy commands keep
   working. New users see the canonical shape in `--help`.
2. **0.2.x** — deprecation note on legacy invocations.
3. **0.3** — legacy names removed.

## What this PR does NOT do (phase B)

- Fold `policy` into `analyze --policy=<file>` — changes exit-code
  semantics. Deserves its own review.
- Fold `compare` into `analyze --against=<ref>` — different output path.
  Same.
- Universal flag schema cleanup (`--root` vs `-root`, `--json` vs
  `--format json`). The dispatchers added here use the canonical names;
  the legacy parsers still accept both.

## Three new dispatcher files

- `cmd_migrate_namespace.go` — 11 canonical verbs covering convert +
  migrate + migration. `terrain convert` aliased to `terrain migrate`.
- `cmd_report_namespace.go` — 9 read-side verbs.
- `cmd_config_namespace.go` — 2 verbs (feedback, telemetry).

Each has a corresponding `_test.go` verifying canonical-verb
recognition, unknown-verb rejection, and missing-arg handling. Behavioural
tests for the underlying runners live in their existing per-command tests.

## Test plan

- [x] `go test ./...` clean locally
- [x] Smoke-tested `terrain migrate list`, `terrain convert list`,
      `terrain report`, `terrain config telemetry --status` end-to-end
- [ ] CI green on Ubuntu / macOS / Windows

🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Summary

Adversarial review (`/gambit:parallel-agents` × 5 domains, ~150 findings)
surfaced material ship-blockers. This PR fixes the must-merge-before-tag
items and rewrites the CHANGELOG to match what actually shipped.

**Without this PR, v0.2.0 should not be tagged.**

## Code fixes

1. **`aiModelDeprecationRisk` false positive** — regex matched
   `claude-2.1`, `gpt-3.5-turbo-0125`, and other dot-versioned current
   models against undated parents. One-character fix in the boundary
   class.

2. **Ragas detector broken end-to-end** — `retrievalScoreKeys` allowlist
   missed Ragas's modern key (`context_precision`). Detector silently
   fired zero signals on real Ragas runs. Added modern Ragas keys plus
   LangSmith variants.

3. **`terrain convert <file>` regressed** — CLI fold-in routed
   per-file conversion to the project-wide migrate runner (which
   expects a directory). Split the fall-through: `convert` namespace
   keeps `runConvertCLI`, `migrate` namespace keeps `runMigrateCLI`.

## CHANGELOG honesty pass

The previous CHANGELOG made claims that don't match reality:

| Was | Now |
|---|---|
| "12/12 detectors shipped" | 10 stable + 2 experimental, per-detector caveats |
| "27 × 33 at 100% precision/recall" | "32 types fire on real-shaped fixtures; gate is recall, not precision" |
| "35→11 commands" | "canonical 11 + 33 legacy aliases" (binary still accepts ~44) |
| "`focus → --focus=<path>`, `export → --output=<path>`" | Removed — flags don't exist in 0.2 |
| "18 severity clauses" | 17 (matches code) |
| "Schema 1.0.0 → 1.1.0" | Snapshot 1.1.0; manifest export still 1.0.0 |
| "Cosign hard-fail" | Caveat added — degrades to checksum when cosign absent |

## New: `docs/release/0.2-known-gaps.md`

Captures ~30 review-flagged follow-ups across detectors, eval adapters,
calibration corpus, CLI, distribution, and documentation. Each entry
names where it lives and the planned resolution window (0.2.x or 0.3).

## Deferred-to-0.3 expanded

Items from `docs/release/0.2.md` that didn't ship are now tracked
explicitly: scoring v2, top-3 conversion at A-grade (was Tier-2 gate),
universal flag schema, plugin architecture, in-band deprecation
warnings, terrain doctor consolidation, terrain ai gate, several
"lands in 0.2" manifest entries that didn't promote.

## Test plan

- [x] `go test ./...` clean
- [x] `make calibrate` clean (27 × 32 at 100% recall)
- [x] `terrain convert /tmp/foo.cy.test.ts --from cypress --to playwright` works
- [x] `terrain convert list`, `terrain migrate list`, `terrain report summary` all route correctly
- [ ] CI green on Ubuntu / macOS / Windows

🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Summary
Re-opening as a squash-target PR after `main` was reset to `b9868cf`. Branch state is identical to the prior PR #126 (16 commits, +2187/-350); the only difference is the merge method this time around.

Closes the ~80 unique substantive findings from two adversarial review passes (`/gambit:parallel-agents`) across:

- **AI detectors (15+)** — model deprecation, prompt injection, hallucination rate, embedding model change, hardcoded API key, prompt versioning, few-shot contamination, retrieval regression, cost regression, non-deterministic eval, tool-without-sandbox, safety eval missing
- **Eval adapters** — Ragas numericValue + cost-column false-failure
- **Calibration corpus** — Symbol-aware match key, ExpectedAbsent normalisation, load-bearing gate (t.Skipf → t.Fatalf, minFixtures=25)
- **CLI** — uniform Ctrl-C handling on all 18 pipeline call sites, canonical migrate/convert help blocks, deprecation hooks, distinct exit code for not-found
- **Engine safety** — detector panic recovery, baseline schema validation
- **Severity rubric** — split overloaded clauses; SARIF helpUri
- **Supply chain** — mandatory cosign, supply-chain.md v0.2.0 refresh + `gh attestation verify` section
- **Docs** — feature-status.md rewrite, COMPAT.md schema bump, scoring-rubric v2 alignment, README canonical-11 framing, honest known-gaps.md

## Test plan
- [x] `go test ./...` green locally
- [x] CI green on identical SHAs in PR #126 prior to its merge (12/12 checks)
- [x] No regressions in calibration corpus (24 fixtures × 30 detector types at 1.00 precision/recall)

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…c honesty (#128)

* fix(0.2): Final polish — release blockers, detector-panic catalog, doc honesty

Final adversarial-review pass before tag. Closes the highest-impact
findings from a 7-domain parallel review (~245 findings total; this
batch is the verified P0/P1 subset).

## Release infra (P0)

- npm-release job adds setup-go: pre-fix, `npm publish --provenance`
  triggered prepublishOnly → npm test → verify-pack.js → `go build`,
  which would fail with `go: not found` because Go isn't installed
  on the npm-release runner. The first release attempt would crash
  at publish time. Adds permissions: id-token: write explicitly so
  provenance attestation works regardless of inherited workflow
  permissions.

- supply-chain.md drops the windows/arm64 row. goreleaser only builds
  windows/amd64; the doc was promising an artifact that doesn't exist.

## Engine self-diagnostic (P0)

- detectorPanic added to models.SignalCatalog and the manifest.
  Pre-fix: safeDetect's panic-recovery path emitted a "detectorPanic"
  signal, but ValidateSnapshot rejected it as an unknown type — so the
  whole snapshot got dropped, defeating the graceful-degradation
  promise. New regression test
  (TestRegistry_RunRecoversFromDetectorPanic_ProducesValidSnapshot)
  asserts the snapshot from a panicking detector still passes
  validation. A new TER-ENGINE-001 rule doc captures the contract.

## CLI (P1)

- README claimed `terrain score` as a canonical command — it doesn't
  exist. Removed from the README architecture block.

- runDepgraph (cmd_debug.go) bypassed runPipelineWithSignals,
  inheriting context.Background() — Ctrl-C didn't unwind during deep
  monorepo scans. Routes through AnalyzeContext now, matching every
  other analysis command since 0.2.

- terrain version --json now includes schemaVersion. CI tooling that
  pins on snapshot shape no longer needs to load a snapshot to find
  out which version a binary produces.

## Detector quality (P1)

- aiFewShotContamination silenced for auto-derived scenarios (empty
  CoveredSurfaceIDs — the dominant shape in the wild). Adds the same
  implicit path-based coverage fallback aiSafetyEvalMissing already
  shipped: scope to top-level dir when scenario.Path is set,
  whole-repo otherwise. Two new tests lock in both branches.

## Eval adapter (P1)

- Promptfoo errors-bucket wiring: row-derived stats fallback used to
  classify every non-success as Failure, even when the row had a
  provider-crash `error: "..."` field. That polluted
  aiHallucinationRate's `caseIsScoreable` denominator. Now routes
  errored rows into Aggregates.Errors via a new promptfooRowErrored
  helper, and per-case Cost falls back to the top-level `cost` field
  when r.response.tokenUsage.cost is zero (modern Promptfoo writes to
  both). Two new tests pin both behaviors.

## Documentation honesty (P0/P1)

- README:61 footnote: "marked [experimental] or [planned] in 0.1.2" →
  "in 0.2.0".
- docs/personas/ai.md: 8 detectors marked *(planned)* that ship
  stable in 0.2.0; aligned with feature-status.md.
- CONTRIBUTING.md and DESIGN.md "30+ commands" → "10 canonical + legacy
  aliases"; "47 packages" / "48 packages" / "46 packages" → 49 (actual).
- docs/release/0.2.md "Status: planned (starts after 0.1.2 ships)" →
  "shipping (release 0.2.0)".
- docs/release/0.1.2.md "Status: in progress" → "shipped".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): Doc honesty + CLI deprecation hints + serve --read-only enforcement

Continuation of the final-polish review work. Closes a wider band
of findings from the 7-domain review.

## Doc honesty
- Exit-code 4 comment in main.go: claimed `terrain ai gate` exists,
  reframed as `terrain ai run --baseline`'s actionBlock decision.
- 0.2.md: "43 → ~15 canonical" → "43 → 11 canonical" (actual landing).
- 0.2.md Goal 4: honest about 50-repo labelled corpus slipping to 0.3
  while 0.2 ships a regression-only fixture corpus + ≥95% recall gate.
- 0.2.md release-gates: replace "≥90% precision" with the actual
  "≥95% recall" gate that shipped.
- release-notes.md: was 334 lines of 0.1.0 ground-up-rewrite framing —
  now a short redirect to CHANGELOG + per-release docs.
- release-checklist-final.md: rewritten for the 0.2 surface (canonical
  11 + legacy aliases) and replaces the stale 0.1.x checklist with
  load-bearing gates (calibrate, docs-verify, bench-gate).
- quality-bar-and-gates.md: adds the three 0.2 release gates that the
  table previously omitted (docs-verify, calibrate, bench-gate).
- scoring-rubric.md: stale "0.1.x"/"0.1.2's job" framing aligned to
  what 0.2.0 actually carries forward.
- 0.2-known-gaps.md: mid-cycle banner points at `main` instead of a
  deleted branch; installer/supply-chain/README rows marked
  **(fixed in 0.2)** with the actual fix described.

## CLI deprecation hints
- 11 legacy top-level commands gain `legacyDeprecationNotice` calls
  so `TERRAIN_LEGACY_HINT=1` produces a uniform migration prompt:
  shorthands, detect, estimate, status, checklist, reset, policy,
  portfolio, focus, compare, migration, show, export, depgraph.
  Pre-fix only ~6 of the legacy commands triggered the hint, making
  the documented opt-in environment variable inconsistent.

## serve --read-only — promote no-op to actual enforcement
- `--read-only` was documented as "no-op in 0.1.2; reserved for 0.2"
  on a 0.2 ship — a CLI-contract trap. The security middleware now
  rejects any non-GET/HEAD/OPTIONS request with HTTP 405 + Allow
  header when ReadOnly is set. Every endpoint in 0.2 is GET-only, so
  this is a contract gate for future state-changing endpoints rather
  than a behaviour change for current traffic. Flag help and Config
  doc-comment updated.

## README onboarding
- Quick Start adds `terrain explain <test-path>` so the README's own
  "four canonical workflows" promise (analyze / insights / impact /
  explain) is exercised in the start path.
- Documentation section now links CHANGELOG.md, feature-status.md,
  and SECURITY.md — pre-fix you couldn't reach release notes from
  the README in one click.
- "Pre-built binaries" line corrected: macOS (amd64+arm64), Linux
  (amd64+arm64), Windows (amd64). Pre-fix it claimed Windows arm64
  too, which goreleaser doesn't build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): AI detector polish — FP/FN edges + confidence scaling + severity ladder

Wide pass through the AI detectors closing P1/P2 findings from the
final-polish review.

## aiToolWithoutSandbox
- delete_<benign-noun> false-positive class closed: `delete_cache`,
  `purge_logs`, `remove_session`, `truncate_buffer`, etc. now silenced
  via `classifyDestructive` + `benignDestructiveObjects` whitelist.
  Always-high verbs (exec, eval, run_shell, send_payment, transfer,
  charge, refund) keep firing regardless of object — their blast
  radius isn't bounded by the noun.
- Word-boundary regex bug fixed: Go's `\b` treats `_` as a word char,
  so `\bdelete\b` did NOT match `delete_user`. Trailing class is now
  `(?:_|\b)` to catch the dominant `verb_object` snake-case form.
- Two new tests cover both the benign-suppression path and the
  always-high-verb-still-fires path.

## aiPromptInjectionRisk
- userInputShapes expanded to include FastAPI `Body/Query/Form/File/
  Header/Cookie/Path`, Flask `request.form/request.values`, Django
  `request.POST/request.GET/request.FILES`, gRPC `request.message/
  request.payload`, Pyramid `request.json`, plus `sys.stdin/argv` for
  CLI-driven user input. Pre-fix only Express-style `request.body` and
  `req.body` shapes were caught.
- Multi-line concatenation false-negative closed: scanner now reads
  the file as a slice of lines and looks at a 3-line window for the
  user-input match when a prompt-shaped variable appears. Real-world
  Black/Prettier-formatted code routinely splits `prompt += \n  user.
  input` across two lines, which the line-by-line scanner missed.
- `bufio.Scanner` replaced with `os.ReadFile` + line split.

## aiHallucinationRate
- Keyword list expanded from 5 stems to 17 to catch real failure-
  reason text: "not in source", "not in context", "no evidence",
  "no citation", "unsupported", "outside scope", "off-topic",
  "contradicts source". Pre-fix list was closed-class English with
  only "fabricat*", "hallucinat*", "grounding", "made up",
  "ungrounded".
- Threshold-boundary comment clarified (rate <= threshold skips, so
  fire is rate > threshold; equal-to-threshold does not fire).

## aiHardcodedAPIKey
- Scan-error abuse fixed: pre-fix the synthetic `keyHit` for
  bufio.Scanner.Err() was emitted as a SeverityCritical Signal with
  Type=aiHardcodedAPIKey, painting a binary blob as a real API key
  in the rendered report. Now ScanError is a flag on keyHit; the
  caller routes scan-error hits to a SeverityMedium degraded-coverage
  diagnostic with explanation "Secret-scan coverage degraded: scanner
  failed mid-file."

## aiModelDeprecationRisk
- Severity by category: deprecated tags (text-davinci-003, claude-1)
  are now SeverityHigh; floating tags (gpt-4, claude-3-opus) stay
  SeverityMedium. Pre-fix every category was Medium, which under-
  prioritized the genuinely-broken cases where the next API call WILL
  fail.
- BASIC `'` comment-prefix tightened to `' ` (space-required) so it
  doesn't match Python single-quoted strings at column 0.

## aiNonDeterministicEval
- Extension list adds `.toml` (docstring promised TOML support; map
  didn't include it). Promptfoo + DeepEval TOML configs now reach
  the detector.

## aiPromptVersioning
- Placeholder-version rejection: `version: "TODO"`, `version: TBD`,
  `version: ???`, `version: placeholder`, `version: none`,
  `version: unknown` no longer satisfy the inline-version
  requirement. New `lineLooksLikePlaceholderVersion` post-check runs
  after the regex match. New parameterised regression test covers 6
  placeholder shapes.

## aiCostRegression / aiRetrievalRegression
- Confidence scales by paired-case count via shared `pairedConfidence`
  helper: 0.5 at paired=1, 0.7 at paired=5, 0.85 at paired=10,
  plateau at 0.9 from paired>=20. Pre-fix every regression fired at
  fixed 0.9 confidence regardless of evidence quality.
- aiCostRegression severity ladder: catastrophic regressions
  (delta >= 1.0, i.e. >=2x cost) escalate to SeverityHigh under new
  clause `sev-high-008`. Merely-above-threshold stays SeverityMedium
  under `sev-medium-006`.

## Severity rubric
- New clause `sev-high-008` ("Catastrophic cost regression") added to
  rubric.go and regenerated rule docs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): Engine robustness — RequiresGraph hard-fail, sort tiebreak, panic-recovery parity

## detector_registry.go
- RequiresGraph mismatch surfaces a detectorPanic-shaped diagnostic
  instead of silently dropping the registration. Pre-fix a detector
  declaring `RequiresGraph: true` whose runtime type didn't satisfy
  GraphDetector vanished from the pipeline with no signal, no log,
  no diagnostic — a configuration bug indistinguishable from a
  detector that simply found nothing.
- Phase-1 results slice pre-allocated to len(registrations) to avoid
  the 0/1/2/4/8/16/32 reallocation chain under the mutex.
- Phase-2 graphResults likewise pre-allocated.

## pipeline.go (signals)
- RunDetectors (the simple non-registry entry point) now wraps each
  detector in safeDetect for parity with the registry path. Tests
  calling RunDetectors directly previously had no panic protection;
  a single broken detector would unwind the test goroutine. Added a
  detectorTypeName helper so the synthetic Meta.ID still names
  something useful in the recovered-from-panic diagnostic.

## models/sort.go
- sortSignals adds Symbol as a tiebreaker after Line and switches to
  sort.SliceStable. Pre-fix two signals on the same (Category, Type,
  File, Line) but different Symbols sorted non-deterministically, so
  byte-identical snapshot output under SOURCE_DATE_EPOCH could break
  for snapshots that happened to have such ties.

## engine/artifacts.go
- filepath.Rel error in the depth-limit branch now causes
  filepath.SkipDir instead of being silently treated as depth=0. A
  computation failure on a pathological symlink loop no longer slips
  past the depth gate.

## engine/pipeline.go
- loadBaselineSnapshot stream-decodes via json.NewDecoder instead of
  os.ReadFile + json.Unmarshal. 100MB+ historical snapshots no
  longer spike RSS by the same amount; empty-file check via
  fi.Size() avoids the read at all.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): Eval adapter polish — Promptfoo time magnitude, Ragas modern shapes, DeepEval whitespace, envelope SourcePath rel

## Promptfoo
- createdAt magnitude check: pre-fix `time.UnixMilli(raw.CreatedAt)`
  assumed millis, but some Promptfoo v4 CLI paths emit unix-seconds.
  A 10-digit second-epoch timestamp from 2026 silently decoded as
  1970. Now: `< 1e12` treated as seconds, otherwise millis.

## DeepEval
- runId fallback: newer DeepEval (1.x) writes `runId` instead of
  `testRunId`. When TestRunID is empty fall back to RunID. Without
  this, baseline matching fell into the framework-wide first-match
  fallback and could cross-attribute runs across multiple suites.
- CreatedAt format flexibility: tries RFC3339 (newer), space-
  separated `2006-01-02 15:04:05` (older), and microsecond-fraction
  without timezone. Failures stay silent.
- Metric-name whitespace normalisation: `Answer Relevancy` (with
  internal space) now normalises to `answer_relevancy` so it matches
  the retrievalScoreKeys / hallucinationGroundingKeys whitelists in
  the consumer detectors. Pre-fix human-readable metric names from
  the DeepEval CLI quietly mismatched.

## Ragas
- Accept `evaluation_results` (modern Ragas ≥0.1.0) and `scores`
  (DataFrame export) shapes, in addition to legacy `results`. New
  `rowsForParsing()` helper merges all three. Pre-fix the modern
  shape was rejected with a misleading "ragas payload has no
  results" error — Ragas 0.2 users couldn't run the adapter at all.
- isRagasQualityKey strips `eval_` prefix and normalises spaces, so
  `eval_faithfulness` and `eval context_relevance` (ragas-evaluate-
  helpers shapes) now match the quality-key whitelist.
- ragasQualityKeys whitelist adds Ragas 0.2 modern metrics:
  context_utilization, noise_sensitivity, summarization,
  factual_correctness.

## Envelope SourcePath
- New `relativeArtifactPath(root, p)` helper. EvalRunEnvelope
  SourcePath is now stored relative to the analysis root rather
  than as the raw CLI path. Pre-fix
  `--promptfoo-results /Users/alice/proj/...` produced absolute
  paths that leaked into SARIF output and downstream signal
  Location.File fields. New helper attempts filepath.Rel(root, p);
  falls back to original path on error or upward-traversal.
- All three ingest helpers (`ingestPromptfooArtifacts`,
  `ingestDeepEvalArtifacts`, `ingestRagasArtifacts`) updated to
  take `root` and apply the helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): Supply chain — concurrency, timeouts, COSIGN cleanup, post-release smoke, archive files

## CI workflows
- All PR-triggered workflows (ci, codeql, terrain-pr, terrain-ai) gain
  a `concurrency` block that cancels in-progress runs on PR re-pushes.
  Pre-fix force-pushes piled up overlapping runs that all consumed
  runner minutes pointlessly. Main-branch CI runs leave
  cancel-in-progress=false so post-merge runs always complete.
- All jobs gain `timeout-minutes` (15-45 depending on workload). Pre-
  fix a hung windows go-test could hold the matrix for 6h.
- CodeQL: dropped `python` from the language matrix. The repo has
  only a few Python helper scripts (no production Python under
  analysis); CodeQL Python autobuild was burning ~5 min/week with
  no useful coverage.

## Release workflow
- `concurrency: cancel-in-progress: false` is explicit. Once OIDC
  certs are issued mid-release, cancelling leaves orphan certs in
  the public Sigstore log; preventing concurrent cancellation
  protects the supply-chain story.
- All release jobs (verify, go-release-build matrix, go-release-
  publish, npm-release) gain `timeout-minutes`.
- COSIGN_EXPERIMENTAL=1 removed from the two cosign sign-blob steps.
  cosign 2.x makes keyless the default; the env var emits a
  deprecation notice when set.
- New `release-smoke` job downloads the just-published linux/amd64
  archive, extracts it, runs `terrain version --json`, and asserts
  the reported version matches the tag. Catches the class of
  "release published but archive contains a stale build" bugs that
  previously could only be caught after the fact by a user.

## Goreleaser
- archives.files now ships LICENSE + README.md inside every archive.
  Pre-fix archives shipped only the binary; users had no in-tree
  license text — a soft compliance issue for OSS-policy review.

## Installer (bin/terrain-installer.js)
- downloadFile redirect chain capped at MAX_REDIRECTS=5. Pre-fix
  recursion was unbounded — a misconfigured proxy redirect loop
  hung the installer until the OS killed it. Error message points
  users at TERRAIN_INSTALLER_BASE_URL as the documented escape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): Documentation gaps — issue templates, CoC, glossary, versioning, compatibility, integration guides

## New top-level files
- CODE_OF_CONDUCT.md (Contributor Covenant 2.1, with reporting via
  private security advisory). Standard for OSS distribution; the
  agent reviews flagged its absence.

## New issue templates under .github/ISSUE_TEMPLATE/
- bug-report.md — structured reproduction, version output, debug log
- false-positive.md — detector + code + why-it's-fine + opt-in to add
  as a calibration corpus regression fixture
- feature-request.md — outcome-first, with optional implementation
  hint
The pre-existing feedback.md stays as the catch-all.

## New user-facing docs
- docs/glossary.md — single page for the Terrain-specific
  vocabulary (snapshot, signal, surface, detector, posture, severity
  clause, evidence strength, calibration corpus, eval run, baseline,
  manifest, severity rubric, docs-verify gate, surface ID,
  capability). Pre-fix the README and per-doc files used these terms
  liberally with no central definition.
- docs/versioning.md — explicit semver policy: what counts as
  breaking, behaviour change, bug fix; pre-release identifiers;
  release cadence; compatibility windows table.
- docs/compatibility.md — host platform tier table (Linux/macOS amd64+
  arm64 + Windows amd64), build-time tools (Go 1.23, Node 22.x), test
  framework support tiers, AI eval framework version coverage.
- docs/integrations/promptfoo.md, deepeval.md, ragas.md — per-
  framework wiring guides with schema versions accepted, baseline
  comparison setup, calibration fixture format, troubleshooting
  table.

## Internal docs disclaimer
- docs/internal/README.md added so external readers landing in this
  directory know the contents are planning artifacts, not product
  truth, and aren't subject to the docs-verify gate. The agent
  reviews flagged 30+ internal files mixed into the public docs tree.

## README updates
- Documentation section now links Glossary, Versioning Policy,
  Compatibility, Integrations directory, Code of Conduct. Reorders
  for first-time-reader flow (concepts → CLI → release info →
  community).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): relativeArtifactPath normalises separators to forward slash

Windows CI failure regression from the prior eval-adapter polish
commit (f85f488). `filepath.Rel` returns native separators —
backslash on Windows — which produced `eval-runs\\promptfoo.json`
SourcePaths that mismatched the forward-slash paths in
calibration labels.yaml. Three calibration fixtures
(eval-cost-regression, eval-hallucination-rate,
eval-retrieval-regression) tripped the load-bearing gate.

`filepath.ToSlash(...)` applied to every return path normalises to
`/` consistently. Snapshot JSON, calibration labels, and SARIF
all expect forward slashes; this aligns the SourcePath stamp with
that convention.

Verified: TestCalibration_CorpusRunner now passes; full
`go test ./...` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(0.2): Public-facing documentation refresh for 0.2.0 tag

Updates user-facing docs to reflect everything that actually
shipped in 0.2.0 — including the post-initial-publish polish
work that landed during the ship-blocker + final-polish reviews.

## CHANGELOG.md
- 0.2.0 header gets the release date (2026-05-02).
- Per-detector entries replace stale "Known limitation" callouts
  with the actual shipped behaviour (per-provider scoping for
  aiNonDeterministicEval, structural key-name + benign-object
  whitelist for aiToolWithoutSandbox, implicit path-based coverage
  for aiSafetyEvalMissing + aiFewShotContamination, multi-line
  concatenation + expanded user-input shapes for
  aiPromptInjectionRisk, severity-by-category for
  aiModelDeprecationRisk, paired-count confidence scaling for
  aiCostRegression + aiRetrievalRegression, placeholder-token
  rejection for aiPromptVersioning, env-var constructor patterns
  for aiEmbeddingModelChange).
- Calibration corpus section reframed: gate is recall-only,
  precision floor slipped to 0.3 (corpus v2). Match-key precision
  improvement (Symbol added) and empty-corpus-bypass closure
  documented.
- New "Polish (release-prep adversarial review fixes)" section
  bullets the cross-cutting fixes from the two adversarial-review
  passes: release infra, engine self-diagnostic (detectorPanic in
  catalog, RequiresGraph hard-fail), eval adapters (Promptfoo
  errors-bucket + per-case cost + time magnitude, DeepEval runId +
  metric-name normalisation, Ragas evaluation_results + scores
  shapes, envelope SourcePath rel), CLI (deprecation hints,
  --read-only enforcement, version JSON schemaVersion, exit 5 for
  not-found), determinism (sortSignals Symbol tiebreak), supply
  chain (concurrency, timeouts, CodeQL Python drop,
  COSIGN_EXPERIMENTAL cleanup), documentation (CODE_OF_CONDUCT,
  issue templates, glossary/versioning/compatibility, integration
  guides).

## docs/release/feature-status.md
- 12 AI detector rows updated to describe the actual shipped
  behaviour, not the pre-fix state. Specifically: per-provider
  scoping, severity-by-category, structural-with-benign-objects,
  implicit-coverage, paired-count confidence, placeholder
  rejection, env-var constructor patterns.
- Cosign npm-install row now reflects the mandatory-by-default
  posture (was "degrades to checksum-only"). Documents the two
  escapes (TERRAIN_INSTALLER_ALLOW_MISSING_COSIGN=1,
  TERRAIN_INSTALLER_SKIP_VERIFY=1) and the redirect cap.

## docs/cli-spec.md
- New "Surface — canonical 11 + legacy aliases" section at the
  top documents the 0.2 namespace dispatchers and the
  TERRAIN_LEGACY_HINT=1 deprecation flow. Detailed per-command
  entries below stay valid since legacy aliases still work
  through 0.2.x.
- `terrain version` entry mentions the new schemaVersion field
  in --json output (CI tools can pin the snapshot contract).
- `terrain serve --read-only` description updated: was
  "no-op in 0.1.2", now reflects actual HTTP 405 method
  enforcement that shipped in 0.2.0.

## docs/telemetry.md
- Example version field updated 0.1.0 → 0.2.0.

`make docs-verify` passes; full `go test ./...` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(0.2): Repo-wide American English sweep

The repo's existing convention is American English (240 AmE markers
vs 21 BrE pre-sweep). Recent docs and Go-source comments drifted to
British spellings during the polish review cycles; this commit
brings everything back to convention.

Mechanical replacements applied uniformly across `*.go`, `*.md`,
`*.js`, `*.yml`, `*.yaml` (excluding `node_modules/`,
`benchmarks/repos/`, `.git/`, `.claude/`, `tests/calibration/`,
`vendor/`):

- behaviour → behavior (incl. plurals + capitalization)
- normalise / normalised / normalising / normalisation →
  normalize / normalized / normalizing / normalization
- categorise / recognise / synthesise / summarise / prioritise /
  utilise / optimise / favour / finalise → -ize / -or / -ze
  variants
- labelled / labelling / modelled / modelling → labeled / labeling /
  modeled / modeling
- cancelled → canceled
- centre → center
- analyse → analyze (verb form; the -yze variant is the AmE
  convention and was already used as such elsewhere)

64 files changed, 132 insertions / 132 deletions. Pure word-level
substitutions; no semantics affected. `make docs-gen` regenerated
manifest.json + severity-rubric.md + rule doc stubs to track the
Go-source comment changes.

Verified: `go build ./...` clean, `go test ./...` all pass,
`make docs-verify` zero-diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): AI validation gate is now impact-scoped

The terrain-ai.yml workflow was reporting whole-repo AI signals as
"blocking" for every PR, including doc-only ones. Both PR #128
(code-changing) and PR #129 (docs-only) generated 29 / 68 blocking
signals respectively — most referencing files the PR never touched
(calibration-corpus fixtures, the detector's own source, etc.).

Root cause in `internal/changescope/analyze.go`: the AI signals
collection loop iterated EVERY signal of category AI in the
snapshot, regardless of whether `sig.Location.File` was in the PR's
changed-files set. Compare to the impacted-scenarios loop right
above it (line 250), which IS impact-scoped via
`result.ImpactedScenarios`. The asymmetry meant the gate was
useless: the calibration corpus contains intentionally-bad
fixtures designed to trigger the detectors (that's their job —
they're regression tests), and those fixtures showed up as
"merge blockers" on every PR.

Fix: hoist the `changedPaths` set construction above the signals
loop and filter by `sig.Location.File`. Signals without a
Location.File (whole-repo emergent signals) are also dropped —
they belong in `terrain analyze`, not `terrain pr`.

The pre-existing
`TestBuildAIValidationSummary_WithSignals` test was constructed
around the old (broken) behavior with no Location.File set on the
fixture signals; rewritten to assert the new contract:

  - Critical AI signal on a changed file → BlockingSignals
  - Medium AI signal on a changed file → WarningSignals
  - High AI signal on an UNCHANGED file → dropped
  - Quality signal → dropped (category filter, unchanged)

New regression test `TestBuildAIValidationSummary_DropsSignalsOnUnchangedFiles`
locks in the doc-only-PR case: a PR that only touches docs/* and
CHANGELOG.md should produce zero blocking signals from
calibration-corpus fixtures elsewhere in the tree.

Verified: `go test ./...` green; the new asymmetry between
"whole-repo signals via terrain analyze" and "PR-introduced
signals via terrain pr" matches the design contract for the two
commands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): Rewrite AI Validation PR comment for actual humans

The PR-comment format for `terrain pr --format markdown` had three
problems that made the AI Validation section unhelpful:

1. Detector taxonomy as the headline. Bullets started with bold
   `**aiPromptInjectionRisk**:` followed by the raw detector
   explanation. PR authors had to know the rubric to interpret it.
2. No file paths or line numbers. The Recommended Tests section
   right above already shows file paths (and they're clickable on
   GitHub); the AI Validation section dropped them.
3. No deduplication. 12 prompt-injection hits across 4 files
   produced 12 identical-looking bullets. The signal was lost in
   the noise.

Fix:

- AISignalSummary gains File/Line/Symbol fields (model.go);
  populated in analyze.go from sig.Location.

- New `internal/changescope/ai_signal_humanize.go` carries:
  - `humanSummary[type]` — one-sentence plain-language description
    per detector type (no taxonomy)
  - `humanAction[type]`  — one-sentence concrete next step
  - `groupSignalsByFileAndType()` — collapses N (file, type)
    duplicates into one entry whose Lines slice carries every
    distinct line number
  - `renderGroupedSignal()` — outputs `**\`path:42, 47, 51\`** —
    <plain summary>` followed by `→ <action>`

- render.go's markdown path now calls the grouped renderer for
  Blocking + Warning sections. Section header becomes
  "**N new finding(s) introduced by this PR**" instead of
  "Blocking signals (N)" so the framing matches what `terrain pr`
  is actually about (PR-introduced findings, not whole-repo state).

- The text renderer (used when --format isn't markdown) gets the
  same grouped output.

Test updates:
- TestRenderPRSummaryMarkdown_AISection rewritten for the new
  contract: locator presence, plain-language summary presence,
  action arrow, line-grouping for duplicates, symbol-keyed
  locator for tool findings, NO bare detector taxonomy in the
  bold headline.

Sample new output (12-line repetition collapses to 4 bullets):

  ### AI Validation
  Scenarios: 2 of 12 selected

  **4 new finding(s) introduced by this PR:**
  - **`evals/promptfoo.yaml:7`** — API key embedded in source or
    config — should be in env / secret store.
    → Move the secret to an env var (or your secrets manager).
  - **`agents/tools.yaml (delete_user)`** — Destructive tool can
    run without an approval gate, sandbox, or dry-run mode.
    → Add `requires_approval: true`, route through a sandbox.
  - **`src/auth/login.ts:42, 47, 51`** — User input flows into a
    prompt without visible escaping or boundary tokens.
    → Wrap user input through a sanitizer, or use a prompt
      template with explicit user-content boundaries.
  - **`src/chat/handler.ts:18`** — User input flows into a prompt
    ... [same summary, separate file]

Combined with the impact-scope filter from the previous commit,
the AI Validation section now does what it should:
PR-introduced findings only, grouped, with file paths and
plain-language explanations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): Aesthetic polish for terrain pr markdown comment

The PR comment now reads as a sequence of discrete cards rather than
a wall of bullets. Concrete changes:

## Header
- H2 + blockquote subtitle ("> High-severity gaps found in changed
  code.") instead of `## ... — verdict` + `*italic line*`. Blockquote
  renders as a callout on GitHub (left rule + soft background) which
  visually separates the "why" from the "what."

## Metrics table
- Compact 2-column layout with bold left-column labels:

  | Metric | Value |
  |---|---|
  | **Changed files** | 7 (5 source · 2 test) |
  | **Impacted units** | 12 |

  Empty rows skipped (`Impacted units: 0` doesn't render). Middle-dot
  separator (·) replaces commas in compound stats so cells read
  cleanly.

## Sentence-case headings + horizontal rules
- "New Risks (directly changed)" → "Coverage gaps in changed code"
- "Recommended Tests" → "Recommended tests"
- "Impacted capabilities:" + "Scenarios: 2 of 12 selected" inline →
  blockquote with both: `> **Capabilities:** ... · **Scenarios:** ...`
- Every major section is preceded by a `---` horizontal rule, giving
  the comment visual rhythm. Each section reads as its own card.

## Finding card shape (parallel across all sections)
- Coverage gaps and AI signals now share the same bullet shape:

  - **`path/to/file.ts`** [HIGH] — <plain-language description>
    → <suggested action>

  Pre-fix coverage gaps and AI signals were rendered by two different
  paths producing different shapes (one used severity prefix, one
  used taxonomy prefix); the inconsistency was visible in the same
  comment. Now both go through `renderFindingCard` /
  `renderGroupedSignal` with parallel structure.

## Pluralization
- New `pluralize(n, singular, plural)` helper replaces the awkward
  "finding(s)" / "issue(s)" / "gap(s)" notation in user-visible
  headers. "1 advisory finding", "2 advisory findings",
  "1 indirectly impacted protection gap", "3 pre-existing issues."

## Footer
- Owners + branding + limitations consolidated into a small-text
  footer using `<sub>` tags, separated from main content by a final
  `---`. Pre-fix these were scattered mid-comment.

## Test updates
- TestRenderPRSummaryMarkdown_DirectVsIndirectSections updated for
  new heading + card shape + pluralization.
- TestRenderPRSummaryMarkdown_FindingTruncation: italicized
  "_...and N more_" overflow message.
- TestRenderPRSummaryMarkdown_Deterministic + AISection +
  MixedTraditionalAndAI: matched against new sentence-case headings.

Sample new comment shape:

  ## [WARN] Terrain — Merge with caution

  > High-severity gaps found in changed code.

  | Metric | Value |
  |---|---|
  | **Changed files** | 7 (5 source · 2 test) |
  | **Impacted units** | 12 |

  ---

  ### Coverage gaps in changed code

  - **`src/auth/login.ts`** [HIGH] — Exported function authenticate
    has no observed test coverage.

  ---

  ### AI Validation

  > **Capabilities:** customer-support-bot · **Scenarios:** 2 of 12
  > selected

  **2 new findings introduced by this PR:**

  - **`src/auth/login.ts:42, 47, 51`** — User input flows into a
    prompt without visible escaping or boundary tokens.
    → Wrap user input through a sanitizer.

  ---

  <sub>Generated by [Terrain](...) · `terrain pr --json` for
  machine-readable output</sub>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ement totals (#130)

* fix(0.2): Drop spurious 'file:' prefix in `terrain insights` output

CLI visual audit (running each canonical command and reading the
output) caught one real bug in `terrain insights`:

    Recommended Actions
    ------------------------------------------------------------
      7. [coverage] Add tests for 17 uncovered source files
         files:  file:bin/terrain-installer.js, file:extension/vscode/src/extension.ts
                 +2 more

The `file:` prefix is the dependency-graph node-ID prefix from
`internal/depgraph/build.go` — `SourceCoverage.SourceID` carries it
because it's the graph node identifier. The renderer expects bare
file paths, but `internal/insights/insights.go:1189` was appending
`src.SourceID` to `rec.TargetFiles` when it should have used
`src.Path` (which the SourceCoverage struct documents as "File path
(without 'file:' prefix)"). The leak was visible to every user
running `terrain insights` against a repo with uncovered sources.

Fix: switch `src.SourceID` → `src.Path`. Output now reads
`bin/terrain-installer.js` instead of `file:bin/terrain-installer.js`.

Other commands audited and confirmed clean:
- `terrain analyze` — headline + Key Findings + What to do next
- `terrain summary` — exec summary with posture + drivers
- `terrain doctor` — minimal pass/warn/fail check list
- `terrain ai list` — table inventory
- `terrain explain selection` — strategy + reason breakdown
- `terrain pr` (text format) — clean

Known minor cosmetic issue (NOT fixed here, not a tag blocker):
`Recommended Actions` items have inconsistent template fields —
some have why/impact/effort/files/run, others have only why. Hits
when a finding's category-builder doesn't populate the structured
fields (e.g., the meta-summary "491 high-severity signals
detected"). Worth a follow-up but doesn't affect correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): CLI aesthetic polish — proper pluralization + title-case posture labels

Two cross-cutting visual issues across the CLI output that the
manual audit caught:

## 1. Drop `n thing(s)` notation everywhere

Rendered output throughout the CLI used the parenthetical-`s`
pattern as a stand-in for proper pluralization:

  890 high-fanout fixture(s) — changes trigger wide test impact
  3 more finding(s) available — run `terrain insights`...
  2 untested source file(s) — start with src/auth.js
  Investigate 5 test(s) with concentrated instability

Reads like a tool's escape hatch, not a sentence.

Fix: introduce a small `plural(n, base) → "base" | "bases"` helper
in each renderer package (analyze, insights, summary,
internal/reporting). 19 sites updated. The reporting helper is
exported as `Plural()` for cross-renderer use; the leaf-package
helpers stay private to avoid coupling.

After:

  890 high-fanout fixtures — changes trigger wide test impact
  1 more finding available — run `terrain insights`...
  2 untested source files — start with src/auth.js
  Investigate 5 tests with concentrated instability
  (and singular-form versions render correctly when n == 1)

## 2. Title-case posture labels in `terrain summary`

Pre-fix output dumped snake_case dimension keys directly:

  health:              strong
  coverage_depth:      strong
  coverage_diversity:  moderate
  structural_risk:     strong
  operational_risk:    strong

The keys are stable storage form (preserved in JSON for API
roundtrip); the display form should be human-readable.

Fix: `titleCaseDimension()` and `titleCaseBand()` helpers in
`internal/reporting/executive_report.go`. Snake-case underscores
become spaces; first letter capitalized; bands ("strong",
"moderate", etc.) get the same first-letter cap.

After:

  Health:                Strong
  Coverage depth:        Strong
  Coverage diversity:    Moderate
  Structural risk:       Strong
  Operational risk:      Strong

JSON output is unchanged — the snake_case keys still round-trip;
only the human-readable text renderer normalizes them.

## Test fixups
- Two `analyze_report_v2_test.go` assertions were checking for the
  literal `(s)` notation; updated to check for the natural pluralized
  form ("3 more findings available", "5 findings").
- Insights goldens regenerated via `-update-golden` (new title text).
- `cmd/terrain` snapshot golden likewise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): catch the remaining (s)/(ies) leaks I missed

Audit pass after the first sweep found:
- internal/insights/insights.go: 'AI surface(s)', 'capability(ies)', 'cluster(s)'
- internal/reporting/analyze_report_v2.go: 'cluster(s)' (3 sites)
- internal/reporting/insights_report_v2.go: 'cluster(s)'

Same fix pattern: use plural() / Plural() helper or inline if/else
for the irregular 'capability/capabilities' case.

Goldens regenerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): polarity-aware bands + concrete measurements in summary posture line

Three layered improvements to `terrain summary` Overall Posture:

1. Polarity-aware band rendering. Previously "Structural risk: Strong"
   read as "high risk" on natural-English interpretation, inverting
   the band's meaning for the two risk dimensions. Bands now translate
   per dimension: Health/Coverage stay Strong/Moderate/Weak; risk
   dimensions render Low/Moderate/Significant/Elevated/Critical.

2. Sentence-case dimension labels. "Coverage Depth" → "Coverage depth"
   reads naturally inline; reserve title-case for headings.

3. Concrete measurement totals. The line now shows what drove the
   band — "28 / 772 skipped" instead of an opaque "3.6% skipped" or
   the original bare "Strong". Counts come from the measurement's own
   explanation ("28 of 772 test files contain skipped tests"). Zero
   measurements are dropped so the line shows what changed, not a
   wall of "0.0% flaky · 0.0% dead · 0.0% slow".

Goldens regenerated for the dimension-name + explanation changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): close pre-existing CLI test gaps + CHANGELOG entry for #130

The 4 CLI help-text tests in internal/testdata/cli_test.go were
written for the pre-restructure CLI shape and started failing when
the canonical 11-command surface landed:

  TestCLI_HelpContainsCanonicalWorkflow expected `terrain insights`
  bare-form; canonical layout uses `terrain report insights`.

  TestCLI_HelpContainsAllCommands expected `ai list`, `ai run`, etc.
  as substrings; canonical layout shows `ai <verb>` with verbs listed
  inline.

  TestCLI_HelpContainsDebugNamespace expected `debug graph`, etc. as
  substrings; debug had no inline verb list at all.

  TestCLI_HelpContainsPrimaryCommands expected a "Primary commands:"
  header and journey questions for impact / insights / explain;
  canonical layout uses "Canonical commands" and only `analyze`
  retains a journey question (the other three moved under report).

Test expectations updated to match the canonical layout. Also added
the missing debug verb list to the top-level help for consistency
with report / migrate / ai / config (where each namespace shows its
verbs inline).

Separately, TestCLI_ExportBenchmarkAcceptsJSONFlag was failing
because `export benchmark` didn't accept `--json`. The command always
emits JSON, so `--json` is now accepted as a no-op for flag parity.

CHANGELOG.md gains a "CLI visual polish (PR #130)" line under
0.2.0's Polish section so the tag captures the full scope of this
branch's work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A senior-engineer launch-readiness review (~65 items, 2026-05-02)
flagged that the codebase is more honest than its public surface —
the docs, README, CHANGELOG, and PR-comment workflow all overstated
what 0.2 actually delivers. This PR closes the user-visible
contradictions before the v0.2.0 tag is cut so an evaluator who
reads README + CHANGELOG + quickstart sees the same product the
binary delivers.

What this PR is NOT: not the suppression model, not the command
registry, not the SARIF audit. Those are 0.2.1 / 0.2.2 work tracked
in `/Users/pzachary/.claude/plans/kind-mapping-turing.md`. This PR
is the truth-up only.

CHANGELOG truth-up:
  * Headline calibration claim: "100% precision/recall" → "100% recall
    on a 27-fixture corpus; precision floors against a labeled-repo
    corpus deferred to 0.3." Closes the most visible internal
    contradiction (the same file disclaims it later).
  * Per-detector calibration paragraph: same nuance applied.
  * Cosign installer block: rewrote to match actual code behavior.
    The previous text said the installer "degrades to checksum-only"
    when cosign isn't on PATH; bin/terrain-installer.js:153–177
    actually hard-fails by default and accepts two opt-out env vars
    (TERRAIN_INSTALLER_ALLOW_MISSING_COSIGN=1,
     TERRAIN_INSTALLER_SKIP_VERIFY=1). The bin/postinstall.js wrapper
    swallows the failure and prints a warning; that UX gap is now
    explicitly tracked for 0.2.1.
  * New "What's stable in 0.2" section: stable / experimental /
    planned, sourced from feature-status.md so users see the honest
    scoping next to the headline.

"AI Validation" → "AI Risk Review" rename:
  * .github/workflows/terrain-ai.yml workflow name + PR-comment
    template + decision strings + comment-marker HTML id
  * docs/cli-spec.md gate description (with detector-shape caveats)
  * internal/changescope/render.go markdown + plain-text section
    headers, plus tests that asserted the old strings
  * model.go / analyze.go / airun/artifact.go / policy/config.go
    code comments
  * CHANGELOG references and docs/product/terrain-overview.md,
    docs/integrations/gauntlet.md,
    docs/architecture/19-ai-scenario-and-eval-model.md
  * Internal scorecards under docs/internal/product/ left as-is
    (historical artifacts, not user-visible)
  * Avoid "AI validation" / "AI safety" / "AI security" until 0.3
    ships precision metrics + AST taint flow + sandboxing.

README + quickstart language tightening:
  * Headline: "30 seconds" promise now qualified inline rather than
    only at line 61.
  * "Reads your repository — test code, source structure, coverage
    data, runtime artifacts, ownership files, and local policy" now
    says "when available" / "best effort" so coverage/runtime
    artifact dependence is honest.
  * "No SaaS / no telemetry" claim now distinguishes runtime ("does
    not phone home") from installation ("downloads signed binaries
    from GitHub Releases").
  * Quickstart line 150 ("Terrain gives AI surfaces the same CI
    treatment as regular tests") rewritten to honest scope.
  * New "What 0.2 Is and Isn't" section in README between
    "What Terrain Is Not" and "Who Uses Terrain". Stable /
    experimental / planned summary + cross-link to feature-status.
  * Supporting commands table: portfolio gets an explicit
    "(canonical, experimental)" qualifier so it isn't both
    "supporting" and "in the canonical 11" without flagging the
    multi-repo work as experimental.

Doc drift fixes:
  * docs/telemetry.md: bare `terrain telemetry` → canonical
    `terrain config telemetry`; legacy alias documented.
  * docs/compatibility.md Tier-1 description: "full CI matrix" →
    "binary target + extended gates on Linux only, unit-test
    parity on macOS/Windows" per actual ci.yml.
  * docs/compatibility.md: dropped the conversion-direction A-grade
    rating reference (deferred to 0.3 per CHANGELOG).
  * docs/user-guides/getting-started.md: new Prerequisites section
    documenting cosign install (brew/apt/scoop) and the two opt-out
    env vars.

Server stale comments + scope honesty:
  * cmd/terrain/cmd_serve.go: 0.1.2 reference removed; flag docstring
    explicit about no-auth + localhost-only + not-production-ready.
  * internal/server/server.go: security posture comment block
    rewritten for 0.2.0; documents the request-context-not-wired and
    mutex-blocking-analysis bugs as known issues tracked for 0.2.1
    rather than silently shipping.

British spelling sweep:
  * sanitiser → sanitizer, sanitised → sanitized, sanitisation →
    sanitization across internal/aidetect, internal/signals,
    docs/rules/ai. Manifest regenerated via cmd/terrain-docs-gen.
  * docs/personas/frontend.md: behavioural → behavioral.

Verification:
  * `go build ./...` clean
  * `go test ./...` and `go test ./internal/testdata/` green
  * `make docs-verify` green
  * grep for "AI [Vv]alidation\b", "sanitiser|sanitised|behaviour",
    and "0.1.2" all return zero hits in user-facing paths

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(0.2): parity rubric + baseline scores (Track 0.1)

Encodes the 12-area × 17-axis maturity rubric as data so the parity
dashboard (Track 0.2) can consume it. Two files:

  docs/release/parity/rubric.yaml — structural source of truth
    - 3 pillars (understand/align/gate) + cross-cutting distribution
    - 12 functional areas with id / name / pillar / tier / surface
    - 17 axes (7 product / 7 engineering / 3 UI-visual) with anchored
      level definitions for 1, 3, 5
    - Per-pillar floor requirements: gate ≥ 4, understand ≥ 3,
      align ≥ 3 (soft)
    - 7 cross-cutting uniformity gates (detector_shape,
      framework_depth, command_shape, doc_scaffold, renderer_tokens,
      empty_state_coverage, voice_and_tone) — these catch unevenness
      across detectors, frameworks, commands, and outputs

  docs/release/parity/scores.yaml — current per-cell scores
    - 12 × 17 = 204 cells, each scored 1–5 with one-line evidence
    - Snapshot taken against `main` at 8545f03 (post PR #131)
    - Reflects the state captured in
      `docs/release/0.2.x-maturity-audit.md`

The baseline scores are honest about the floor: every pillar
currently has at least one cell at score 2, which is exactly what
the launch-readiness review flagged as "uneven product." The 0.2.0
work (Tracks 1–10) lifts those cells to clear the gate — Gate to
≥ 4, Understand to ≥ 3, Align to ≥ 3 soft.

The V-axes (V1 visual consistency / V2 information rhythm / V3
fun-to-use polish) are baselined as the lowest-scoring axis cluster
across the codebase, which matches reality: most user-visible
output uses ad-hoc styling; Track 10 (Visual & design system)
lifts these.

Track 0.2 (parity-gate Go binary) reads both files and emits the
matrix + floor map. Track 0.4 wires `make pillar-parity` into CI as
a hard gate that rejects PRs that drop a cell below its pillar
floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): parity-gate binary + make target + CONTRIBUTING (Tracks 0.2/0.3/0.6)

Three Track 0 deliverables in one PR:

  Track 0.2 — `cmd/terrain-parity-gate/main.go`
    Reads rubric.yaml + scores.yaml (defaults to docs/release/parity/),
    validates structure (every cell scored, scores in [1,5], pillar
    references valid), computes per-area floors + per-pillar
    verdicts, emits human-readable matrix or JSON, exits non-zero
    when any hard-gate pillar is below its floor. Soft gates (Align
    in 0.2.0) print WARN but do not fail.

    Three output modes: default matrix, --json, --floor-map (compact).
    Exit codes: 0 PASS, 1 hard-gate FAIL, 2 usage error.

    12 unit tests cover validation rejections, floor computation,
    soft-gate WARN semantics, mixed hard/soft pillar behavior, and
    a real-rubric load test that catches drift between YAML and
    Go types.

  Track 0.3 — `make pillar-parity` (+ `pillar-parity-floor`,
  `pillar-parity-json` variants) wired through Makefile. Same posture
  as `make docs-verify` — anyone can run it locally before opening a
  PR.

  Track 0.6 — CONTRIBUTING.md "Parity gate" section. Documents:
    - per-pillar floors and which are hard / soft
    - the source-of-truth split (rubric.yaml = structure,
      scores.yaml = per-cell numbers, audit doc = prose companion)
    - how a parity-lift PR updates a cell (one-line evidence + score
      change + audit doc narrative if relevant)
    - the seven uniformity gates (advisory in 0.2.0, hard in 0.2.x)

Current baseline output:

  Pillar verdict
    understand   floor=2 / required=3   FAIL  weakest=core_analyze/V1
    align        floor=2 / required=3   WARN (soft)
    gate         floor=2 / required=4   FAIL  weakest=pr_change_scoped/E2
  Overall: FAIL

This is the honest starting point. Tracks 1-10 lift cells until every
hard-gate pillar passes; that's the 0.2.0 release gate.

Track 0.4 (CI hard gate) intentionally not included in this PR. Wiring
parity-gate into CI today would fail every PR until enough cells are
lifted; the gate flips to mandatory after the parity-lifting tracks
land enough to make it useful. For now, anyone can run
`make pillar-parity` locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
)

Foundational deliverable for Track 10 (Visual & design system). Every
user-visible renderer in 0.2.0 — terminal output, HTML report,
PR-comment markdown, SARIF tags — consumes from this package. Ad-hoc
styling outside `internal/uitokens/` becomes a parity-gate violation
on the V1 (visual consistency) axis; Track 10.2 migrates existing
renderers to the tokens.

What's in the package:

  Color tokens
    Six semantic colors (muted / accent / ok / warn / alert / bold).
    Names describe ROLES not specific shades, so the underlying ANSI
    codes can change without rewriting callers. Wrappers (Muted, Ok,
    Warn, Alert, Bold) emit only when ColorEnabled is true; empty-
    input wraps are no-ops to avoid stray escape sequences.

  TTY / NO_COLOR detection
    ColorEnabled initialized once via stdout TTY check + NO_COLOR
    env var (https://no-color.org/) + TERM=dumb fallback. Pipes /
    file redirects automatically suppress color. Tests can flip the
    flag to assert plain-text output.

  Symbol vocabulary
    SymOK / SymFail / SymWarn / SymInfo / SymArrow / SymBullet /
    SymDash / SymDot / SymRule / SymSubrule. One vocabulary used
    across every command — the V2 (information rhythm) axis depends
    on this consistency.

  Severity model + badges
    SeverityCritical / High / Medium / Low / Info ladder with
    SeverityBadge() rendering. CRITICAL and HIGH bold so blocking
    findings stand out at a glance.

  Verdict badges
    VerdictBadge("PASS"|"WARN"|"FAIL") returns the canonical glyph +
    label combo used by the parity-gate matrix, AI risk review hero
    block, and policy check.

  Spacing & rules
    SectionWidth = 60 — every renderer uses this width for section
    rules so headings line up across commands. Heading() / Subheading()
    return ready-to-print two-line blocks.

  ASCII bar renderer
    BarChar / BarEmpty constants; Bar() with auto-coloring by
    proportion (≥80% alert, ≥40% warn, < muted); BarPlain() for
    callers that want inverse-polarity coloring (e.g. coverage,
    where high is good).

  Text helpers
    Truncate / PadRight / PadLeft — rune-aware, unicode-safe. Used
    by every table layout in the codebase once Track 10.2 migrates.

Tests (15 total, ~280 lines):
  - Color wrappers respect ColorEnabled in both directions
  - Color wrappers no-op on empty strings (avoid escape-sequence
    noise around "")
  - Severity badges produce the right labels
  - Severity badges bold ≥ HIGH
  - Verdict badges canonicalize case + whitespace
  - Bar rendering covers full / empty / half / overflow / negative /
    zero-max / zero-width
  - Bar coloring threshold transitions at 0.4 and 0.8
  - Truncate / PadRight / PadLeft handle unicode correctly
  - Rule and SubRule render at SectionWidth
  - Heading is two lines (title + rule)
  - Color composition (Bold(Alert(...))) preserves both escapes

Zero dependencies. Stateless. Tested at 100% coverage of public API.

Track 10.2 (renderer audit + migration) is the follow-on PR that
moves existing internal/reporting/* code paths to consume from here.
A vet rule that flags raw ANSI codes outside this package is also
on the Track 10.2 list.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…G anchored to pitch (Track 1) (#137)

* docs(0.2): commit product vision (Track 1.1)

Durable north-star doc capturing the unified Terrain pitch we
converged on through the launch-readiness review threads. The
headline verbatim:

  Terrain is the control plane for your test system.
  It maps how your unit, integration, e2e, and AI tests actually
  relate to your code — and lets you gate changes based on that
  system as a whole.

  See what's covered, what's missing, and what's overlapping.
  See which tests matter for a PR — and why.
  Bring AI evals into the same review pipeline as the rest of
  your tests.

What's in the doc:

  * The user's actual job (six questions; today no single tool
    answers more than two)
  * "What Terrain is" — the control-plane framing in two phrases
  * The three pillars (Understand / Align / Gate) with internal
    job + external framing + anchored capabilities
  * The unifying thread (CI gate primitives shared across pillars)
  * What's distinctive vs. Jest / pytest / SonarQube / Promptfoo /
    Bazel / GitHub code scanning / AI safety tools
  * What Terrain explicitly isn't
  * Anti-goals for 0.2.x (no safe-skip guarantee; we don't run
    tests; we don't judge model truthfulness; no public precision
    floor yet)
  * Trajectory: 0.2.0 = "see clearly + gate progressively"; 0.3 =
    "take control"; 0.4 = "test the universe" (AI-aware
    integration/e2e under the control plane)
  * Capability → pillar → tier map covering every shipping command
  * Primary workflow (terrain analyze && terrain report pr) as the
    canonical entry point
  * Doc-evolution rules (what stays stable, what updates per release)

This is the source of truth that the 0.2.0 README rewrite (Track 1.2),
quickstart anchor (Track 1.4), and feature-status pillar columns
(Track 1.5) point at. When the README and this doc disagree, this
doc wins until the README is updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(0.2): narrative + tiering anchored to product vision (Tracks 1.2/1.3/1.4/1.5/1.8)

Bundles the doc-only Track 1 work into one cohesive PR following the
"control plane for your test system" pitch as the verbatim anchor.
Track 1.6 (per-pillar reproducible examples) and Track 1.7 (per-
detector "known false positives" sections) are larger and follow as
separate PRs.

Track 1.1 — `docs/product/vision.md` (committed earlier in this branch)
  Durable north-star doc with the verbatim pitch, three pillars,
  capability map, anti-goals, and 0.2.0 → 0.3 → 0.4 trajectory.

Track 1.2 + 1.3 — `README.md` headline rewrite + capability list
  Lead with the pitch verbatim. "Map your test terrain" demoted to
  the secondary tagline. Primary workflow (terrain analyze && terrain
  report pr) shown above the install snippet so the entry point is
  unmistakable. "What 0.2 Is and Isn't" rewritten around pillars +
  tiers (Understand / Align / Gate × Tier 1 / 2 / 3) instead of a
  flat stable/experimental list.

Track 1.4 — `docs/quickstart.md` anchored to first-user gate insights
  Walkthrough restructured around the three insight types from the
  first-user success gate:
    1. Coverage gap with explanation (`terrain analyze`)
    2. PR risk explanation (`terrain report pr --base main`)
    3. Test-selection explanation (`terrain report impact
       --explain-selection`)
  Each step is ~90 seconds; total walkthrough ≤ 5 min. The safe-skip
  caveat is called out explicitly per the 0.2.x anti-goal.

Track 1.5 — `docs/release/feature-status.md` Pillar + Tier columns
  Every shipping capability now has Pillar (Understand / Align /
  Gate / cross-cutting) and Tier (1 / 2 / 3) tags. Tier-1 means
  publicly claimable in 0.2.0 marketing; Tier-2 ships but stays
  experimental; Tier-3 is in development with no public claim.
  Workflows table extended to include `terrain explain finding <id>`
  and `terrain suppress <id>` (Tracks 4.6 / 4.7).

Track 1.8 — `CHANGELOG.md` 0.2.0 entry restructure
  Headline replaced with the pitch verbatim as the section opener.
  New paragraph explains parity-gate framing (Gate ≥ 4, Understand
  ≥ 3, Align ≥ 3 soft) and points at the audit doc as source of
  truth. The release groups deliverables by pillar instead of by
  feature category.

Plan link: `/Users/pzachary/.claude/plans/kind-mapping-turing.md`
(Tracks 1.1, 1.2, 1.3, 1.4, 1.5, 1.8).

Verification: `make docs-verify` green; `go test ./...` green; manual
read of README + quickstart confirms continuity with the pitch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the silent-install path flagged in the launch-readiness
review. Pre-fix:

  $ npm install -g mapterrain
  > postinstall fails (cosign missing)
  > [warn] one-line stderr message
  > exit 0
  $ terrain analyze
  > silently retries the same fetch
  > fails with the same error
  > user is confused about why "install" reported success

Two acceptable resolutions were on the table; the plan chose Option
B (lazy-with-loud-signal) over Option A (hard-fail npm install).
Hard-failing every cosign-missing host would be more disruptive than
the failure mode itself; CI pipelines that wrap `npm install` would
break for legitimate reasons. The right user experience is: install
"succeeds" with a loud, framed warning, and the *first* `terrain`
invocation refuses to retry silently — printing the original error
verbatim with concrete remediation.

Implementation:

  * `bin/postinstall.js` writes ~/.terrain/install-failure.log with
    the captured error (timestamp, platform, version, message,
    stack) when ensureTerrainBinary throws. The stderr warning is
    upgraded from one line to a multi-line framed banner so it's
    hard to miss in a CI log.
  * `bin/terrain-installer.js` exports `writeInstallFailureMarker`
    and `clearInstallFailureMarker`. `runTerrainCli` (the trampoline
    for `terrain ...`) consults the marker before retrying the
    fetch; if a marker exists AND no installed binary is present,
    it throws the recorded error with a structured remediation
    block instead of attempting another silent retry.
  * On a successful install or successful first run, the marker is
    cleared so future invocations don't see stale state.

Tests added (`scripts/test-installer-marker.mjs`, run via
`node --test`):

  * writeInstallFailureMarker captures the error fields
  * clearInstallFailureMarker removes the marker
  * clearInstallFailureMarker is idempotent

Wired into `npm test` as a `test:unit` step so it runs ahead of
`scripts/verify-pack.js` in the existing release-verify chain.

CHANGELOG note in PR #131 already calls this out as 0.2.1 work; a
follow-up PR can promote the entry from "known issue" to "fixed".

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(0.2.1): add --fail-on and --timeout flags to terrain analyze

CI integration was the launch-readiness review's top adoption gap
for the platform/SRE audience. Two flags close most of it:

  --fail-on critical|high|medium
      Exit with the new severity-gate code (6) when at least one
      finding at or above the requested severity is present.
      Standard CI integration pattern. The report still renders
      first; the gate decision is the last thing that happens.

  --timeout <duration>
      Abort the analysis after the duration elapses (e.g. 5m, 30s).
      Wraps the existing SIGINT-aware context with a deadline so a
      runaway monorepo scan doesn't block CI indefinitely.

Exit-code conventions extended:

  6 — Severity gate block. Returned by analyze --fail-on. Same
      pattern as the existing AI-gate code 4. CI scripts can branch
      on "the analysis succeeded but the gate blocked us" without
      parsing stderr text. Code 3 stays reserved for the planned
      policy-vs-usage split.

Implementation:
  * cmd/terrain/cmd_severity_gate.go — parseSeverityGate +
    severityGateBlocked + errSeverityGateBlocked sentinel. Pure
    function over analyze.SignalBreakdown; no engine coupling.
  * cmd/terrain/cmd_pipeline_helpers.go — new
    runPipelineWithSignalsAndTimeout(...). The original
    runPipelineWithSignals is preserved as a 0-timeout shim so the
    other six callsites (impact / explain / pr / ai *) keep their
    contract; they can adopt the timeout flag in follow-up PRs
    without churning this one.
  * cmd/terrain/main.go — flag wiring; errors.Is(err,
    errSeverityGateBlocked) routes the gate exit code without
    confusing it for an analysis crash.
  * cmd/terrain/cmd_analyze.go — gate evaluated last so the report
    is always rendered before the exit decision.

Tests:
  * TestParseSeverityGate covers canonical / case-insensitive /
    whitespace / invalid inputs
  * TestSeverityGateBlocked covers the threshold cascade
    (critical-only, critical+high, critical+high+medium) and
    confirms gateNone never blocks

Manual smoke (against the Terrain repo itself):
  $ terrain analyze --fail-on critical   → exit 0 (no critical)
  $ terrain analyze --fail-on medium     → exit 6 (matched
    "0 critical + 493 high + 169 medium finding(s)")
  $ terrain analyze --fail-on bogus      → exit 2, usage error

The remaining gate flag from the plan
(--new-findings-only --baseline) is a 0.2.x follow-up; it needs
the suppressions/finding-IDs work to ship clean against renamed
files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): close PR #134 review gaps + uniform gate across output formats

The launch-readiness review flagged six concrete gaps in PR #134
that block defensible merge. This commit closes all six and fixes
one real bug uncovered while writing the e2e tests.

Bug fix: the JSON output branch (and SARIF / annotation / HTML
branches) early-returned from runAnalyze before reaching the
--fail-on gate check. So `terrain analyze --json --fail-on=medium`
silently exited 0 even with matching findings — the gate worked in
text mode only. The TestRunAnalyze_JSONStdoutPurity test caught
this: it expected a gate-blocked error but got nil.

The fix factors the gate decision into a closure (gateErr) computed
before any rendering branch, then each branch returns gateErr()
after its renderer completes. Text, JSON, SARIF, annotation, and
HTML output all gate uniformly now.

Review gaps closed:

  Pluralization
    cmd_severity_gate.go: replaced "finding(s)" with proper plural
    via plural() helper, mirroring the 0.2 polish work in
    internal/reporting/plural.go.

  Negative-timeout validation
    cmd/terrain/main.go: --timeout < 0 now exits with usage error
    (code 2) and a clear message, rather than the silent
    immediate-DeadlineExceeded that read like an analysis crash.

  Stale "0.1.2 contract" comments
    cmd/terrain/main.go: exit-code conventions block rewritten as
    pre-0.1.2 / additive 4+ semantics; "0.2 will move policy
    violations" deferred phrasing replaced with concrete
    0.2.x → 0.3 milestone reference.

  E2E test for exit code 6 + report-renders-before-exit invariant
    TestRunAnalyze_GateBlocksOnFixture: runs runAnalyze against
    the calibration corpus with --fail-on=medium, asserts:
      1. Returns errSeverityGateBlocked (so main.go maps to exit 6)
      2. Error message contains the --fail-on label + counts
      3. stdout is non-empty (report rendered before gate fired)
      4. stdout contains the report header

  JSON stdout purity test
    TestRunAnalyze_JSONStdoutPurity: with --json + --fail-on
    matching, the entire stdout body parses as JSON. Verifies the
    gate message goes to the error channel (stderr via main.go),
    not into the JSON document. This is the test that uncovered
    the early-return bug above.

  Gate-passes inverse test
    TestRunAnalyze_GatePassesWhenSeverityAbsent: --fail-on=critical
    against a fixture whose worst severity is below critical
    returns nil. Locks in the false-positive-prevention property.

All three new e2e tests run via captureRun (existing helper at
cli_smoke_test.go:154) — function-level invocation, not exec
subprocess, so they're fast and don't shell out.

Verification:
  go build ./...               clean
  go test ./...                green
  go test ./internal/testdata/ green
  make docs-verify             green

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundational deliverable for the Gate-pillar parity lift. Stable IDs
unblock three downstream features: suppressions
(`.terrain/suppressions.yaml` honoring per-finding entries),
`terrain explain finding <id>` round-trip, and
`--new-findings-only --baseline <path>` baseline-aware gating.

Format:
  {detector}@{normalized_path}:{anchor}#{hash}

Example: weakAssertion@internal/auth/login_test.go:TestLogin#a1b2c3d4

Where:
  detector       = signal type (e.g. "weakAssertion")
  normalized_path = forward-slash, repo-relative
  anchor         = symbol when present, "L<line>" otherwise, "_" when neither
  hash           = 8 hex chars derived from canonical form

Stability guarantees (documented in the FindingID field doc-comment
and the package-level comment in `internal/identity/finding_id.go`):

  * Same (Type, Location.File, Location.Symbol, Location.Line) →
    same FindingID across runs. Used for suppression matching and
    baseline-aware gating.
  * Symbol takes precedence as the anchor when present, so line
    drift WITHIN a symbol does not change the ID. This is the
    common case (whitespace edits, import reordering).
  * File rename or symbol rename produces a new ID. The underlying
    finding has moved; old suppressions don't apply.
  * Line drift WITHOUT a symbol changes the ID — known limitation
    documented; AST-anchored 0.3 work removes it.

What landed:

  internal/identity/finding_id.go (new, ~145 lines)
    BuildFindingID / ParseFindingID / MatchFindingID + helpers.
    Reuses GenerateID + NormalizePath from the existing identity
    package.

  internal/identity/finding_id_test.go (new, ~200 lines)
    16 table-driven tests covering: stability, shape, distinct on
    rename/file-move/detector-change, path normalization (back- vs
    forward-slash), line anchor when no symbol, placeholder when
    nothing, symbol-precedence-over-line invariant, line drift
    changes ID without symbol, parser round-trip + malformed
    rejection (8 cases), anchor-with-colons round-trip, MatchFindingID
    semantics.

  internal/models/signal.go
    New `FindingID string `json:"findingId,omitempty"` field on
    models.Signal with the stability documentation inline. omitempty
    so older snapshots remain valid; pre-existing JSON consumers
    that don't read the field are unaffected.

  internal/engine/finding_ids.go (new)
    assignFindingIDs(snapshot) — walks top-level Signals + per-file
    Signals, populates FindingID for any signal that doesn't already
    have one. Pre-set IDs are preserved (lets specialized detectors
    like detectorPanic emit their own anchor). Idempotent and
    nil-safe.

  internal/engine/pipeline.go
    Calls assignFindingIDs(snapshot) right after SortSnapshot in
    Step 10 of RunPipelineContext. Sort runs first so IDs land in
    canonical order; this preserves byte-identical SOURCE_DATE_EPOCH
    output.

  internal/engine/finding_ids_test.go (new)
    4 tests: top-level + per-file population, pre-set ID preserved,
    idempotency, nil-safe.

Why the model package isn't importing identity directly: keeps
`internal/models/` dependency-free (zero internal imports today).
Engine orchestrates detectors and is the natural place for
ID assignment; the dependency arrow runs engine → identity, which
matches the layering elsewhere.

Verification:
  go test ./internal/identity/ ./internal/engine/ — green
  go test ./... — green
  go test ./internal/testdata/ — green (goldens unchanged because
    FindingID is omitempty and the existing goldens don't assert
    its presence; 0.2.1 work updates goldens to include IDs)

Next:
  Track 4.5 — `.terrain/suppressions.yaml` minimal viable shape;
    suppressions match against FindingID
  Track 4.6 — `terrain explain finding <id>` lookup
  Track 4.7 — `terrain suppress <id>` writer
  Track 4.8 — `--new-findings-only --baseline <path>`

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…7.6/7.7) (#141)

Three small parity-axis lifts bundled into one PR. Each addresses a
named cell from the audit doc.

Track 5.2 — aiHallucinationRate framing
  The detector's name implies Terrain judges hallucinations directly.
  It does not — it reads hallucination-shaped failure metadata that
  the eval framework (Promptfoo / DeepEval / Ragas) has already
  computed and returns the rate. The launch-readiness review flagged
  this as overpromising.

  For 0.2.0: keep the type name `aiHallucinationRate` for back-compat
  (renaming the type touches every adapter, every test, every
  fixture) but tighten the manifest description + remediation so the
  trust framing is correct:

    Title:       "Eval-Flagged Hallucination Share"
    Description: explicitly notes that Terrain reads framework metadata,
                 does not judge hallucinations directly
    Remediation: tells the user to fix the eval scenario or raise the
                 threshold with documented justification when they
                 disagree with the framework's classification

  The actual type rename to `aiEvalFlaggedHallucinationShare` is 0.3
  work (deprecation alias + back-compat consumer migration).

  Lifts area 5 (AI risk + inventory) E4 (Stability) — the misleading
  name was the load-bearing concern there.

Track 7.6 — terrain init policy template
  Existing `generatePolicyYAML` already emits a commented template.
  Tightened to:
    - Reference the new docs/policy/examples/ files (Track 7.7) so
      users have a clear "copy this file" path instead of uncommenting
      the boilerplate one rule at a time.
    - Add inline comments per rule explaining what each does (was
      bare key/value pairs before).
    - Cleaner section split between "Core test-system rules" and
      "AI governance rules".

  Lifts area 10 (Policy / governance) P4 (Onboarding) from 2 to 3.

Track 7.7 — three example policies
  New files under docs/policy/examples/:

    minimal.yaml   — safe defaults for first-time adoption; every
                     rule warn-only, nothing blocks the build
    balanced.yaml  — recommended starting point for most teams;
                     blocks on critical AI regressions + safety gaps
                     + skipped tests; warns elsewhere
    strict.yaml    — mature-repo enforced-quality branch; blocks on
                     every high+ finding, zero accuracy-regression
                     tolerance

  Plus README.md that documents the adoption ramp (minimal → balanced
  → strict), pairs each policy with the right CLI invocation, and
  cross-links to vision.md / CONTRIBUTING.md / feature-status.md.

  Lifts area 10 (Policy / governance) P6 (Examples) from 2 to 3.

Pillar parity impact: this PR is a slice of the Understand-pillar
(via 5.2) + Gate-pillar (via 7.6 / 7.7) lift work toward the 0.2.0
release gate.

Verification:
  go test ./... — full suite green
  make docs-verify — manifest + rule docs regenerated and in sync
  Manual: read each new policy file end-to-end; the adoption ramp
  reads as one coherent story

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…it pointer (Tracks 8.4/8.5/8.6) (#142)

Three small deliverables that close the install → CI gate journey
the launch-readiness review flagged as missing.

Track 8.5 — `docs/examples/gate/github-action.yml`
  The ONE recommended GitHub Action config for 0.2.0. Drops into
  `.github/workflows/terrain-pr.yml` and gives adopters:

    - per-PR `terrain analyze --write-snapshot --json`
    - per-PR `terrain report pr --base ... --new-findings-only --baseline ...`
      posting a unified comment via the `body-includes` marker so
      successive runs update the same thread
    - SARIF upload to GitHub code scanning (Security tab)
    - **safe-default mode**: warn-only by default; --fail-on
      critical is one uncomment away
    - --new-findings-only --baseline baked in by default so
      adopters with existing debt don't brick CI on day one

  Concurrency group + cancel-in-progress so a force-push doesn't
  pile up runs. Permissions list documents what each step needs.

Track 8.6 — `docs/product/trust-ladder.md`
  The four-rung adoption path: Inventory → Warnings → CI annotations
  → Blocking gates. Each rung says what you do, what you get, what
  it doesn't do, and when to move up.

  The fundamental pattern this addresses: teams that jump from
  Rung 1 to Rung 4 in one step have CI bricking on day one against
  inherited debt. The ladder makes "see signals first, gate later"
  the recommended path, with the recommended config matching it.

  Cross-links: vision.md, feature-status.md, policy/examples/,
  github-action.yml. Closes the loop so an adopter who lands on
  any one of those docs can navigate to the rest.

Track 8.4 — `terrain init` CI pointer
  Existing `terrain init` walks through "next steps" (run analyze,
  generate coverage, generate runtime artifacts, edit policy). Added:

    - Step (n+1) "Wire Terrain into CI (warn-only by default):"
      with copy-this-file pointer to the github-action.yml template
      and a pointer to the trust ladder for which mode to run when.
    - Policy step now references the three starter policies
      (minimal/balanced/strict) instead of the implicit "uncomment
      stuff" workflow.

  The flow from `terrain init` to a working CI gate is now four
  bullet points instead of five separate doc trails.

Pillar parity impact: lifts area 12 (Distribution / install) P4
(Onboarding) from 2 → 3 (with the suspended Node 22 prominence
work being the remaining gap). All three deliverables also lift
"Examples" axes across multiple areas via the cross-cutting reach
of the recommended config + trust ladder.

Verification:
  go test ./... — full suite green
  go test ./internal/engine/ -run TestRunInit — all 9 init tests green
  Manual: read trust-ladder.md end-to-end; cross-references resolve

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…R comment (#145)

The three Track 3 deliverables that close the pitch's "maps how your
unit, integration, e2e, and AI tests actually relate to your code"
promise. Together they make the recommended workflow
(`terrain analyze && terrain report pr`) defensible against the
launch-readiness review's "this is uneven across test types" critique.

Track 3.3 — integration-test classification rigor
  Adds explicit content-based detection of HTTP-testing libraries
  (supertest, httptest, MockMvc, supertest, requests, etc.) so
  integration tests living alongside unit tests in flat directories
  are correctly classified. New
  `internal/testtype/integration_imports.go` with a curated allowlist
  of 30+ patterns spanning JS/TS, Go, Python, Java, Ruby ecosystems.
  New `refineIntegrationClassification` step in the analyzer reads
  each test file once via the existing FileCache and merges the
  content-based signal with the path/suite/framework-based signal,
  with explicit conflict-handling that lets explicit imports
  override directory naming. 14 new tests covering positive matches,
  prose-mention false-positive guards, multi-library co-detection,
  and the merge logic.

Track 3.4 — e2e-to-code attribution honest carve-out
  New `docs/product/e2e-attribution.md` documents the exact limit
  of e2e attribution in 0.2: structural-only, via path co-location
  + framework-config declarations + shared-fixture transitive
  links + convention fallback. Explicitly carves out runtime trace
  ingestion, URL-to-route mapping, DOM-selector-to-component
  mapping, and cross-language attribution as 0.3+ work. The honest
  carve-out is the contract — the recommended-tests stanza tags
  e2e selections as `[structural-only]` so adopters know to
  inspect rather than trust blindly.

Track 3.5 — unified PR-comment rendering audit
  New `internal/changescope/unified_render_test.go` enforces the
  visual contract for `terrain report pr --format markdown`:
  bracketed `[LABEL]` badges across stanzas, `**\`path\`**` locator
  format, em-dash separator, and a single recommended-tests stanza
  for unit + integration + e2e (no per-type splitting). Plus a
  consistent-section-order test. Companion doc
  `docs/product/unified-pr-comment.md` documents the contract,
  the section-level severity-grouping exception for the AI stanza
  (and why), and the alternatives that were considered and
  rejected.

Verification: all internal + cmd tests pass; `make docs-verify`
green; new tests run in ~0.5s combined.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two Track 5 deliverables for the parity-gated 0.2.0 plan. Together
they raise the trust posture of the AI Risk Review surface — both
in how it's classified (Track 5.1) and how it behaves under
cancellation (Track 5.3).

Track 5.1 — AI risk subdivision (Inventory / Hygiene / Regression)
  Adds `internal/signals/ai_subdomain.go` with three trust tiers
  and a classification for every CategoryAI signal in the manifest:

    - Inventory  (Tier 1, publicly claimable): direct facts
      derived from declared AI surfaces — uncoveredAISurface,
      aiPromptVersioning, aiSafetyEvalMissing, capabilityValidationGap,
      phantomEvalScenario, aiPolicyViolation, untestedPromptFlow.

    - Hygiene    (Tier 2, visible but not gating-critical):
      heuristic structural patterns — aiPromptInjectionRisk,
      aiHardcodedAPIKey, aiToolWithoutSandbox, aiModelDeprecationRisk,
      aiFewShotContamination, contextOverflowRisk.

    - Regression (Tier 2, eval-data-dependent): fires only when
      eval-framework artifacts present — every cost / latency /
      hallucination / retrieval / tool-routing / RAG-grounding
      signal across the airun catalog.

  Public helpers `AISubdomainOf`, `AISubdomainLabel`, and
  `AISubdomainTrustBadge` give renderers a single source of truth
  for tier vocabulary so PR comment, terminal report, and JSON all
  speak the same language. Drift gate test
  `TestAISubdomain_AllAISignalsClassified` fails CI if a new AI
  signal is added without a tier — closes the "silent dump into
  legacy umbrella stanza" failure mode.

  Companion doc `docs/product/ai-risk-tiers.md` documents the three
  tiers, the public-claim posture per tier, the gating contract
  (Tier 1 may be critical; Tier 2 caps at high), and the
  add-a-signal recipe.

Track 5.3 — ctx audit on AI detector file walk
  Adds `aidetect.DetectContext(ctx, root)` that respects ctx in
  the source-walk inner loop — checks `ctx.Err()` every 64 entries,
  aborts cleanly when cancelled. The pre-Track-5.3 shape silently
  ignored ctx, so a `terrain analyze --timeout 5s` run against a
  large repo with AI patterns would still wait for the AI walk to
  finish after ctx had been cancelled.

  Pipeline call site (`internal/engine/pipeline.go:413`) now uses
  DetectContext so cancellation propagates end-to-end. Existing
  `aidetect.Detect(root)` is preserved as a backwards-compatible
  wrapper that delegates to DetectContext(context.Background()).

  New `cancellation_test.go` proves the contract:
    - already-cancelled ctx returns within 250ms on a 200-file
      fixture (vs ~50ms+ without short-circuit)
    - mid-walk cancel after 20ms aborts within 1s on a 1000-file
      fixture (vs ~200ms+ without honoring)
    - Detect / DetectContext produce identical results on the
      same fixture (backwards-compat invariant)

Verification: 48 internal packages pass; cmd tests pass;
make docs-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…147)

Three Track 7 deliverables that lift the Gate-pillar AI side from
"works on the happy path" to "works across the framework × version
matrix we claim to support, with honest drift signals."

Track 7.1 — Adapter conformance fixtures
  New `internal/airun/conformance/` package with shape-fixture
  tests covering each (framework × version) combination Terrain
  supports today:
    - Promptfoo v3 nested + v4 flat + missing-evalId variants
    - DeepEval 1.x camelCase + 1.x snake_case + bare-array variants
    - Ragas modern (samples + scores envelope) + legacy (bare array)
  Adding a new shape fixture is the documented extension recipe;
  the README at the package root spells it out.

Track 7.2 — Shape detection + warn-on-drift
  New `internal/airun/shape.go` introduces `ShapeInfo` (framework,
  detected version, version source, drift warnings) plus three
  detector functions — DetectPromptfooShape, DetectDeepEvalShape,
  DetectRagasShape. Each does a top-level envelope probe (no full
  payload parse), classifies the shape, and emits warnings on
  unfamiliar variants.

  Public helpers `ShapeInfo.HasWarnings()` and `FormatWarnings()`
  let downstream callers log a single per-run notice when an
  adapter is parsing an unfamiliar shape, before the detector
  chain consumes the result.

Track 7.4 — End-to-end Promptfoo+Terrain CI walkthrough
  New `docs/examples/gate/ai-eval-ci/` directory with:
    - README.md spelling out the full installation → baseline →
      first-PR flow, plus what the example does NOT do
      (anti-goals up front)
    - github-action.yml — drop-in workflow that runs Promptfoo,
      feeds results into Terrain, and posts the unified comment
    - promptfoo.config.yaml + prompts/ + evals/ — minimal
      working scenario package an adopter can copy and edit

  The walkthrough's posture matches the parity plan's
  recommended-CI-config rule: --new-findings-only --baseline by
  default, --fail-on critical for blocking gates, hygiene findings
  visible but non-gating.

Verification: 8 new conformance tests pass; full airun package
green; make docs-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
….6/10.8) (#149)

Two Track 10 deliverables that lift the visual / UX side of 0.2.0
from "every renderer invents its own empty-state message" to "one
designed vocabulary, regression-locked."

Track 10.6 — Empty state helpers
  New `internal/reporting/empty_states.go` defines an EmptyStateKind
  enum with seven shipped kinds (zero-findings, no-AI-surfaces,
  no-policy, first-run, no-impact, no-test-selection,
  no-migration-candidates) plus three render targets:
    - EmptyStateFor(kind) → structured EmptyState (Header + NextMove)
    - RenderEmptyState(io.Writer, kind) → terminal-text rendering
    - EmptyStateMarkdown(kind) → blockquote-callout markdown

  Each shipped kind carries a designed header and a next-move nudge
  that names a concrete `terrain ...` command the user can run.
  Three contract tests enforce the design rules:
    - every kind has a non-empty designed header
    - voice & tone — no exclamation marks, no emoji codepoints, no
      British spellings (Track 10.7 voice rules locked at the test
      level)
    - every shipped kind's next-move references a command in
      backticks (no purely invitational empty states)

  First integration: `internal/reporting/insights_report_v2.go`
  swaps its "No significant issues detected." line for the
  EmptyZeroFindings rendering with the next-move nudge.

Track 10.8 — Visual regression goldens (foundation)
  New `testdata/empty_state_goldens/<kind>.txt` — seven byte-for-
  byte golden files matching the seven shipped EmptyStateKind
  values. Companion test `TestEmptyState_Goldens` asserts byte
  equality on every kind. The drift gate
  `TestEmptyState_GoldensCoverEveryKind` verifies that the goldens
  directory matches the shipped enum 1:1 — adding a new kind
  without a golden, or vice versa, fails CI.

  Running `go test ./internal/reporting/... -update-empty-state-goldens`
  regenerates the goldens after intentional copy changes.

  This is the foundation for the broader Track 10.8 visual
  regression suite — once Track 10.2-10.4 (renderer migration to
  uitokens) lands, the same golden pattern extends to
  PR-comment markdown, SARIF tags, and HTML report screenshots.

Verification: 11 new tests pass; full Go test suite green; no
regression in existing renderers.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ure (Tracks 9.9/9.11) (#150)

Two Track 9 (engineering hardening) deliverables. Both are pure
test additions — no behavior changes — and both raise the
adopter-trust posture against scenarios the existing test suite
didn't exercise.

Track 9.9 — Adversarial filesystem suite
  New `internal/analysis/adversarial_fs_test.go` exercises the
  analyzer against 8 deliberately weird filesystem inputs that
  real repositories surface but synthetic fixtures usually don't:
    - binary file with .ts extension (misnamed asset / compiled
      artifact)
    - oversize source file (~2MB; tests size-skip threshold)
    - UTF-8 BOM at file start (Windows-edited files)
    - NUL bytes mid-content (transpiler / minifier output)
    - 0-byte test file (developer commits before filling in)
    - nested .git directories (submodules) — load-bearing
      assertion that .git contents never leak into inventory
    - 50-level deep directory nesting (skipped on Windows due
      to path-length limits)
    - 180-char filename (long-but-legal, tests filesystem
      assumptions)

  Contract: every test asserts Analyze completes without panic /
  hang / OOM. Some tests additionally assert that legitimate
  inputs survive (binary-poisoned tree must not lose the real
  test file; nested .git contents must never leak).

  Skipped (out of scope, with rationale):
    - Symlink loops: behavior differs across platforms; add
      per-platform tests when an adopter hits one.
    - Permission-denied: hard to set up portably; manual smoke
      verifies the walker silently skips.

Track 9.11 — Schema migration fixture
  New `internal/models/testdata/snapshot_v0_1_x_legacy.json` —
  hand-crafted JSON snapshot in the shape that 0.1.x actually
  wrote: schema version field absent, no SignalV2 envelope, no
  UnitID on code units, simpler snapshotMeta.

  New `internal/models/migrate_fixture_test.go` with 3 tests:
    - LoadLegacyFixture — load via Unmarshal + migrate via
      MigrateSnapshotInPlace; assert SchemaVersion stamped,
      generatedAt backfilled, UnitIDs backfilled (incl. parent-
      qualified case), compatibilityNotes in Metadata
    - LegacyFixtureDataPreserved — every field present in fixture
      (frameworks, test files, signals) survives intact through
      migration
    - FixtureRoundTripsViaJSON — migrated snapshot can be
      re-serialized + re-loaded + re-migrated near-idempotently
      (regression guard for the byte-identical determinism
      contract that `terrain analyze --write-snapshot` depends
      on)

Verification: 8 adversarial-FS tests + 3 fixture migration tests
pass; full Go test suite green; no regression in existing tests;
make docs-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the canonical 3-repo convergence example for the Align pillar:
docs/examples/align/multirepo/. Companion to the Track 6.1 manifest
format (in PR #148) and the alignment-first migration framing
(also #148).

The example walks through Acme Corp's three Node services on
partially-divergent test stacks, declaring a portfolio manifest,
and the convergence sequence the cross-repo aggregator will
recommend when it lands in 0.2.x. Until the aggregator binary
ships, the example is illustrative — the README documents the
*shape* of the expected output so:
  - the contract is locked before the implementation chases it
  - adopters who hand-write a repos.yaml today learn the file
    format that 0.2.x will consume unchanged
  - the alignment-first vs health-first sequencing rule is
    documented as the contract, not a runtime detail

Files:
  - README.md          full convergence story + expected output shape
  - .terrain/repos.yaml  runnable manifest matching the story
  - snapshots/README.md  notes on snapshotPath: future expansion

Status posture matches the parity plan: Track 6 is parallel and
partial-ship-OK in 0.2.0; Align is the secondary pillar; multi-
repo is explicitly Tier 3 / experimental until 0.2.x ships the
aggregator. This example is the bridge that locks the contract
between today's manifest format and tomorrow's binary output.

Verification: make docs-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the missing axis to the bench surface: existing
BenchmarkFullAnalysis_* measure CPU but never fail on memory
regressions. Real adopter complaints take the shape "Terrain ate
4 GB on my monorepo," not "Terrain was slow"; this suite plugs
the gap.

Two categories:

Allocation benchmarks (Benchmark*_Memory)
  Wrap the existing analysis benches with b.ReportAllocs() so
  bytes/op + allocs/op surface as regression-comparable
  baselines. Run via:
    go test -bench Memory ./internal/analysis/...

Ceiling tests (TestMemoryCeiling_*, TestMemoryNoLeak_*)
  Run analysis at known scales and assert peak heap growth stays
  under a configured ceiling. Three tests:
    - 1k files: ceiling 250 MB (current observed: ~177 MB)
    - 5k files: ceiling 1300 MB (current observed: ~1050 MB)
    - 5-iter repeated analysis: ceiling 2000 MB (current
      observed: ~1500 MB across 5 iterations on a 500-file fixture)

Skipped by default
  Ceiling tests are gated on TERRAIN_MEMORY_BENCH=1 — they're
  expensive (force GCs, run analysis at scale) and surface ceiling
  regressions per the Track 9.10 baseline rather than smoke
  failures. The new `make memory-bench` target sets the env var
  for you. The default `go test ./...` loop is unaffected.

Track 9.10 follow-up: the leak test reports unexpectedly high
growth (~1.5 GB across 5 iterations on a 500-file fixture) — much
higher than the FileCache amortization should produce. Comment in
the test names this as the leading hypothesis (something in the
per-run allocation graph holds onto data the cache should
amortize) and points at where to look. Investigation is its own
work; the ceiling here catches regressions BEYOND the current
state, so a future fix that lowers actual growth will pass with
large headroom — at that point the ceiling should ratchet down.

Verification: all three ceiling tests pass under
`make memory-bench`; default `go test ./...` skips them; existing
benchmarks unaffected.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the per-detector budget mechanism that protects analyze runs
from any single hung detector blocking the whole pipeline.

The mechanism

- New `DetectorMeta.Budget time.Duration` field. Zero means "use
  DefaultDetectorBudget" (30 seconds). Detectors with legitimate
  long-running work set this explicitly.
- New `safeDetectWithBudget(reg, fn)` wrapper composes with the
  existing safeDetect panic-recovery: a panicking detector still
  produces the detectorPanic marker; a slow detector produces the
  new detectorBudgetExceeded marker.
- All call sites in detector_registry.go (Run + RunWithGraph
  Phase 1/2/3 paths) routed through the budget wrapper. Pre-Track-
  9.4 a hung detector would block the goroutine waiting on
  wg.Wait(); now the budget elapses first and the wait completes.
- New SignalDetectorBudgetExceeded type registered in the manifest
  + signal catalog so ValidateSnapshot accepts the marker (same
  posture as detectorPanic — without the catalog entry, a single
  budget overrun would invalidate the whole snapshot).

Behavior contract

When a detector exceeds its budget, the pipeline returns the
budget-exceeded marker instead of waiting for the eventual
completion. The detector goroutine completes in the background
(Go has no goroutine kill primitive); its post-budget signals are
discarded. This is the right trade-off for the failure modes the
budget targets: runaway regex, accidentally-O(n²) graph walks,
blocking I/O on a slow filesystem.

Coverage

Five new tests in detector_budget_test.go:
  - budget exceeded → marker returned within budget window
  - fast detector → original signals returned
  - zero budget → DefaultDetectorBudget applied
  - panic + budget compose correctly (detectorPanic wins)
  - registry-level integration: r.Run() returns within budget
    even when a registered detector deliberately sleeps past it

Plus the regenerated docs/rules/engine/detector-budget.md doc.

Verification: all 5 budget tests pass; full Go test suite green;
make docs-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rix (#144)

Bundle of three Track 7 / Track 8 deliverables for the parity-gated
0.2.0 release plan.

Track 7.3 — AI execution trust-boundary doc
  Adds docs/product/ai-trust-boundary.md spelling out what Terrain
  executes vs. what it parses, where the LLM call actually happens
  (inside the eval framework, not Terrain), the per-command trust
  surface for `analyze`, `ai list/doctor`, `ai run`, and
  `ai run --ingest-only`, plus the 0.2 → 0.3 sandboxing roadmap.
  Closes the "is this safe?" question that the launch-readiness
  review flagged as insufficiently documented.

Track 8.1 — Node 22 prominence
  Documents the Node 22 npm-path requirement up front in README and
  docs/user-guides/getting-started.md, with explicit brew /
  `go install` alternatives for CI images on Node 20 LTS. Avoids
  the silent install-time failure mode the review caught.

Track 8.2 — Release-smoke matrix expansion
  Extends the post-publish smoke job from linux/amd64-only to also
  cover darwin/arm64 (Apple Silicon — the modern Mac default) and
  windows/amd64 (the most likely Windows shape). Matrix uses the
  POSIX tar.gz extract path on Linux/macOS and a pwsh
  Expand-Archive path on Windows; both verify
  `terrain version --json` reports the tagged version. Catches
  per-platform "wrong build / wrong version string" regressions
  that previously could only surface after a user installed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…racks 3.1/3.2) (#157)

Two of the five "load-bearing centerpiece" Track 3 deliverables that
turn the unified pitch into something the binary actually delivers.

Track 3.1 — `terrain report pr --fail-on <severity>`
  Defends the pitch claim "Gate changes based on that system as a
  whole." Pre-fix, --fail-on existed only on `analyze`; the gating
  flow (`analyze && report pr`) silently lost the gate at the second
  step.

  Implementation reuses the same severityGateBlocked helper that
  PR #134 introduced for analyze. Added prSeverityBreakdown(severities
  []string) that converts a PR's NewFindings + AI BlockingSignals
  into the same SignalBreakdown shape `analyze.SignalSummary` uses,
  so the gate decision logic is shared, not duplicated.

  Wired through both the legacy `terrain pr` command (deprecated
  alias of `terrain report pr`) and the canonical `terrain report
  pr` namespace dispatcher.

  Same render-then-gate pattern as analyze: every output format
  (json, markdown, comment, annotation, default text) renders
  before the gate decision returns through the error channel. So
  `--json --fail-on=high` produces a valid JSON document on stdout
  AND exits with code 6 if the gate fired — the property the launch-
  readiness review's "JSON stdout purity" gate test asks for.

  Tests:
    - TestPRSeverityBreakdown: empty / mixed bag / case-insensitive
      / unknown-severities-dropped-silently table
    - cli_smoke_test.go updated for the new runPR signature

Track 3.2 — `terrain report impact --explain-selection`
  Defends the pitch claim "See which tests matter for a PR — and
  why." Pre-fix, `report impact` showed selected tests but not the
  reason chains; the "and why" half of the pitch wasn't deliverable.

  Implementation reuses the existing internal/explain.ExplainSelection
  + reporting.RenderSelectionExplanation (already shipping for
  `terrain explain selection`). When --explain-selection is set,
  runImpact computes the selection explanation and renders it
  with verbose=true so per-test evidence (selection reasons, code
  unit matches, confidence) appears.

  Wired through both `terrain impact` (legacy) and `terrain report
  impact` (canonical). --json + --explain-selection emits the
  SelectionExplanation JSON structure for tooling consumption.

Pillar parity impact: Track 3.1 + 3.2 are the centerpiece work that
the plan calls "highest-priority track — every pitch claim must be
directly verifiable in the CLI output before 0.2.0 ships." This PR
closes two of the five Track 3 items; 3.3 (integration-test
classification rigor), 3.4 (E2E attribution), and 3.5 (unified
PR-comment audit) are separate follow-ups.

Verification:
  go build ./...                clean
  go test ./cmd/terrain/        green (4 new TestPRSeverityBreakdown
                                cases; existing TestCLISmoke_PRCommand
                                updated for new signature)
  go test ./...                 full suite green
  go test ./internal/testdata/  golden + CLI suite green

Plan link: /Users/pzachary/.claude/plans/kind-mapping-turing.md
(Tracks 3.1 / 3.2).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…2) (#151)

Two release-readiness deliverables that lift the contract Terrain
makes with adopters: links in user-facing docs go where they say
they go, and JSON fields carry an explicit stability promise.

Track 9.8 — Make docs-linkcheck
  New `cmd/terrain-docs-linkcheck/main.go` walks docs/ and verifies
  that every `[label](relative.md)` link resolves to a real file.
  Skips external (http/https/mailto), same-page anchors (#foo),
  and — by default — the docs/internal/ + docs/legacy/ subtrees
  whose link discipline is inherited debt (run with
  `-include-internal` to scan them anyway).

  Resolves directory links to <dir>/README.md when present;
  reports "directory link with no README.md" otherwise.

  Wired into the Makefile as `make docs-linkcheck`. Same posture
  as `make docs-verify` — fails with `::error::N broken links`
  + per-link source:line:target output suitable for CI annotation.

  Initial run on the public docs surface caught 3 real broken
  links (README.md → legacy/, quickstart.md → examples/,
  release-notes.md → ../architecture/) — all fixed in this PR by
  pointing at concrete files. The remaining 39 links inside
  docs/internal/planning/ are pre-0.2 inherited debt, deliberately
  skipped pending a separate cleanup PR.

Track 9.12 — Schema field stability tiers
  New `docs/schema/FIELD_TIERS.md` documents the three-tier
  contract Terrain commits to per JSON field:

    Stable  — name + type + semantics won't change without a
              major schema bump; safe for long-lived tooling
    Beta    — name + type stable for the next minor; semantics
              may evolve as calibration corpora arrive
    Internal — diagnostic / debug fields; treat as scratch space

  Spells out concrete examples per tier (snapshotMeta.schemaVersion
  is stable; signals[].confidence is beta; metadata.diagnostics.*
  is internal); names the precedence order for telling a field's
  tier (name pattern → x-terrain-tier annotation → this page);
  defines the promotion path internal → beta → stable.

  Adopter guidance section: pin the schemaVersion, defensively
  read beta fields with fallbacks, never branch on internal.

  `docs/schema/README.md` linked from the new page so adopters
  reach the contract via the canonical schema entry point.

Verification: `make docs-linkcheck` passes; `make docs-verify`
clean; `go build ./cmd/terrain-docs-linkcheck/` clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These keep getting committed unintentionally during PR rebases when
`go run` materializes a binary in the repo root and `git add -A`
sweeps it up. Ignoring at the source so the merge wave doesn't keep
churning amend-and-force-push to remove them.
)

Two related defects in `terrain serve`, both flagged by the launch-
readiness review and verified in the codebase:

1. **Request context not wired.** The HTTP handlers ignored
   r.Context() and called engine.RunPipeline (which wraps
   context.Background()), so a client disconnect during a long
   analysis left the analysis running in the background with no
   handler waiting on it.

2. **Mutex-blocking analysis cache.** getResult held Server.mu via
   defer-Unlock for the full analysis duration. One slow analysis
   serialized every other request needing a pipeline result behind a
   single goroutine, regardless of whether the cache was warm enough
   for them.

Both ship together because the right fix to (2) replaces the cache
mutex with an RWMutex + singleflight, which also gives us a clean
seam for (1):

  * Fast path: RWMutex.RLock so warm-cache hits don't contend.
  * Slow path: singleflight.Group.DoChan dedups concurrent in-flight
    analyses (one analysis per cache window, even with N waiters).
  * Per-caller cancellation: each handler threads r.Context() through
    getResult; the select on (ch | ctx.Done()) returns ctx.Err()
    immediately on disconnect. The shared analysis runs with
    context.Background() so a single caller's disconnect doesn't kill
    work other waiters depend on.

Tradeoff documented inline: a single-waiter request whose context is
canceled won't (yet) cancel the underlying analysis. Reference-
counting waiters is on the 0.3 list.

Tests added:
  * TestGetResult_CacheHit — fast path returns cached pointer
  * TestGetResult_RespectsCanceledContext — pre-canceled context
    returns context.Canceled within 2s rather than blocking on
    analysis (pre-fix this hung until pipeline completion)
  * TestGetResult_ConcurrentCallsShareCache — 50 concurrent callers
    on a warm cache observe the same report pointer

Dep: golang.org/x/sync v0.10.0 (singleflight). Pinned to v0.10.0
because v0.20.0 bumps the go directive to 1.25; v0.10.0 is
compatible with the existing go 1.23.

Server-package doc-comment refreshed to describe the new
concurrency model and remove the now-fixed "known issues" block
that was added in PR #131.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF and others added 17 commits May 4, 2026 16:52
…ration + tier badges (#148)

Three Track 6 deliverables that establish the Align pillar's 0.2.0
foundation. Track 6 is intentionally parallel and partial-ship-OK
per the parity plan (Align is the secondary pillar; floor ≥ 3
soft); these three deliverables are the parts that lift Align
without requiring the full multi-repo aggregator. The aggregator
(Tracks 6.2 / 6.3) lands in 0.2.x once the manifest format ships
here.

Track 6.1 — Multi-repo manifest format
  New `internal/portfolio/manifest.go` with `RepoManifest` and
  `RepoEntry` types matching the parity plan's `.terrain/repos.yaml`
  schema: per-repo path or pre-saved snapshot, optional owner,
  frameworks-of-record declaration, free-form tags. Loader +
  validator + path resolution helpers.

  Validation enforced (rather than trusting YAML schema):
    - schema version is the only currently-supported value (1)
    - non-empty repos list
    - per-repo non-empty Name + at least one of Path / SnapshotPath
    - unique Name across the manifest
    - error messages cite the repo position so adopters can find
      the bad entry without grepping

  12 new tests covering the canonical happy path plus every
  validation rule; ResolveRepoPath / ResolveSnapshotPath helpers
  for the aggregator that lands in 0.2.x.

Track 6.5 — Migration alignment-first reframe
  New `docs/product/alignment-first-migration.md` documents the
  framing shift the launch-readiness review surfaced: most teams
  care about *aligning* (declare a framework of record per repo,
  see drift, converge gradually), not just *converting*. Conversion
  is one tool inside convergence, not the headline.

  Spells out single-repo and multi-repo flows end-to-end; pins the
  per-direction tier matrix (Stable / Experimental / Preview /
  Cataloged); names anti-goals (we don't auto-convert, don't pick
  the "best" framework, don't block convergence on calibration).

Track 6.6 — Per-direction tier badges in `migrate list`
  New `tierLabelForState` helper renders GoNativeState as the
  Tier-badge vocabulary used elsewhere in 0.2 (Stable / Experimental
  / Preview / Cataloged). `terrain migrate list` and
  `terrain migrate shorthands` now surface tier badges instead of
  raw state strings.

  `humanizeGoNativeState` is preserved so any other call sites that
  depend on the raw state string don't break — only the user-facing
  list renderers switched. Tier label test locks the mapping;
  renaming a label is a public-facing change and the test surfaces
  the dependency.

Verification: full Go test suite green; make docs-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Tracks 9.1/9.3) (#155)

Two paired Track 9 deliverables that lift detector self-description
from "what does it emit" to "what does it consume" — and surface
input gaps as visible diagnostics instead of silent zero-output.

Track 9.1 — Capability metadata extension
  Adds five new fields to DetectorMeta:

    RequiresRuntime       — needs RuntimeStats from runtime artifact
                            ingestion (--runtime junit.xml / jest.json)
    RequiresBaseline      — needs Baseline snapshot pointer (--baseline)
    RequiresEvalArtifact  — needs EvalRuns from Promptfoo / DeepEval
                            / Ragas adapter ingestion
    ContextAware          — honors ctx.Err() in inner loops (descriptive
                            today; surfaced in `terrain doctor`
                            cancellation posture)
    Experimental          — detector implementation not yet stable
                            (distinct from manifest-level signal status)

  All zero-default; existing detectors continue working unchanged.
  New detectors that genuinely consume these inputs declare them
  via the metadata so the missing-input check knows what to flag.

Track 9.3 — Missing-input diagnostics
  New `safeDetectChecked(reg, snap, fn)` — the registry's canonical
  detector-invocation path. Pre-Track-9.3 a runtime-needing detector
  on a no-runtime snapshot would silently emit zero signals;
  adopters had no way to know whether the detector ran-and-found-
  nothing or ran-but-was-blind.

  When `missingInputs(meta, snap)` returns non-empty, the helper
  returns a single SignalDetectorMissingInput marker per affected
  detector, with the explanation listing every flag the user needs
  to add (Oxford-comma joined). The actual detector body is
  skipped — no panic, no waste.

  All call sites in detector_registry.go (Run + RunWithGraph
  Phase 1/2/3) routed through safeDetectChecked. The check
  composes with safeDetect's panic recovery: a panicking detector
  with sufficient inputs still produces detectorPanic; a panicking
  detector with missing inputs is shielded by the early-return.

  New SignalDetectorMissingInput type registered in the manifest
  + signal catalog so ValidateSnapshot accepts the marker (same
  posture as detectorPanic).

Coverage

7 new tests (missing_input_test.go):
  - happy path (detector runs when no inputs required)
  - missing runtime → diagnostic with --runtime flag named
  - runtime present (RuntimeStats on TestFile) → detector runs
  - missing baseline → diagnostic with --baseline flag named
  - missing eval artifact → diagnostic with promptfoo-results
    flag named
  - multiple missing → one diagnostic listing all three (Oxford)
  - joinInputNames covers 0/1/2/3+/4 cases including the Oxford
    comma fix

Verification: full Go test suite green; make docs-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Track 9.7) (#156)

Adds the Track 9.7 release-readiness gate: cross-checks
docs/release/feature-status.md against the canonical signal
manifest. Drift between what the curated doc promises and what the
engine actually ships is the failure mode adopters notice when they
evaluate the binary against the marketing claim.

The gate

- New `cmd/terrain-truth-verify/main.go` walks
  docs/release/feature-status.md, extracts every backtick-delimited
  camelCase signal reference between the "Detectors / signal
  types" anchor and the "Planned" subsection, and verifies each
  resolves to a real entry in internal/signals/manifest.go.
- Camel-case constraint (lowercase start + at least one mid-word
  uppercase) excludes CLI verbs (`report`, `eval`, `policy`) and
  English filler words from false-positive matches.
- Planned subsection excluded by design — references there name
  signals that *don't* yet have a code-side implementation;
  flagging them would invert the signal.
- Engine self-diagnostic signals (detectorPanic,
  detectorBudgetExceeded, detectorMissingInput, suppressionExpired)
  excluded from the orphan check — they're documented inline
  alongside the mechanisms that emit them, not in the curated
  signal table.

Wired into Makefile as `make truth-verify`. Same posture as
`make docs-verify` and `make pillar-parity` — fails with
`::error::N broken signal reference(s)` + per-signal output
suitable for CI annotation.

What it caught

Initial run on the current docs surfaced one real broken
reference: `duplicateCluster` was listed as a "stable signal" in
the table but isn't a `Signal` type at all — duplicate-cluster
analysis surfaces via `DuplicateClusters` in the analyze report
rather than through the signal mechanism. Doc updated to clarify
the distinction.

Plus 15 advisory orphans (stable signals in the manifest that the
curated doc doesn't mention by name). The doc explicitly says it's
a "curated view" with the manifest as the source of truth, so
these are acceptable today; --strict-orphans escalates them to
hard failures for adopters who want full coverage.

Out of scope today (per the package comment)

  - README command list ⊆ dispatcher: needs Track 9.6 registry
    refactor first
  - CHANGELOG promotion-claim cross-check: per-signal status is
    already manifest-driven, so docs-verify catches it
  - CI matrix ⊆ compatibility tier doc: distinct failure mode in
    workflow YAML rather than markdown

Coverage

5 new unit tests in main_test.go: happy-path extraction, no-anchor
returns empty, planned subsection excluded, all-lowercase tokens
rejected (regression guard against the false-positive class that
caught report / eval / policy on the first run), engine-diagnostic
classifier.

Verification: full Go test suite green; make truth-verify exits 0
on the current docs surface; make docs-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#158)

Brings forward what the prior plan deferred to 0.3: a working
suppression model in 0.2.0 so adopters can adopt strict CI gating
without forking the project. Builds on Track 4.4 (stable finding
IDs) — most suppressions match by FindingID; a (signal_type, file
glob) fallback covers class-wide waivers.

Schema:

  schema_version: "1"
  suppressions:
    - finding_id: weakAssertion@internal/auth/login.go:TestLogin#a1b2c3d4
      reason: false positive; sanitized upstream
      expires: 2026-08-01
      owner: "@platform"

    - signal_type: aiPromptInjectionRisk
      file: internal/legacy/**
      reason: rewriting in 0.3
      expires: 2026-09-01

Match modes (an entry uses exactly one):

  * `finding_id` exact match — most precise; survives line drift
    when the underlying signal has a stable symbol per
    BuildFindingID semantics
  * `signal_type` + `file` glob — coarser; supports `**`-style
    recursive patterns. Useful for class-wide waivers.

Anti-goal: suppressions are NOT a free-form ignore-everything
switch. The schema rejects entries that satisfy neither mode and
entries missing `reason` (every suppression must justify itself).

Lifecycle:

  * `reason` required — printed when a suppressed signal would
    otherwise have been blocking, so reviewers see the rationale
    in PR comments without opening the YAML.
  * `expires` optional ISO 8601 date. After the date, the
    suppression is INVALID — the underlying signal fires again,
    and a new `suppressionExpired` warning signal surfaces in
    the report so silent rot doesn't accumulate.
  * `owner` optional free-text owner pointer for review.

Engine wiring:

  * `internal/suppression/` package — Load + Apply + path-glob
    helpers. 9 unit tests covering load validation, expiry,
    finding-id match, signal-type+glob match, idempotency, nil-
    safety.
  * `internal/engine/pipeline.go` — Step 10c after FindingID
    assignment: load `.terrain/suppressions.yaml` (or
    PipelineOptions.SuppressionsPath override), apply matched
    entries, surface expired entries as warning signals.
  * `PipelineOptions.SuppressionsPath` for `terrain analyze
    --suppressions <path>`.
  * 5 engine integration tests: drops matching signal, expired
    emits warning + lets signal fire, missing file is a no-op,
    malformed file logs and continues (don't fail pipeline on a
    fat-fingered YAML edit), override path honored.

Manifest:

  * New `suppressionExpired` signal type, governance category,
    medium severity, evidence-strong (it's a deterministic check).
    Registered in `internal/signals/manifest.go` and
    `internal/models/signal_catalog.go`. Rule doc auto-generated
    via cmd/terrain-docs-gen.
  * No new detector — pipeline emits the signal directly.

What's NOT in this PR (follow-ups):

  * Track 4.6: `terrain explain finding <id>` — round-trips an ID
    back to the underlying signal + suggests a suppression command
  * Track 4.7: `terrain suppress <id>` — writes a suppression
    entry to the YAML with goccy/go-yaml round-trip preservation
  * Track 4.8: `--new-findings-only --baseline <path>` — uses the
    same FindingID set to filter signals against a baseline

Verification:
  go test ./internal/suppression/ — 9 tests green
  go test ./internal/engine/ — 5 new integration tests green
  go test ./... — full suite green
  make docs-verify — manifest + severity rubric + rule docs in sync

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates docs/release/parity/scores.yaml to reflect what the merge
wave delivered. 58 cell lifts applied via scripts/parity-lift.py
(committed for reproducibility — running it is idempotent), each
tied to the merged PR(s) that delivered the evidence. The dashboard
now shows accurate honest state, not pre-merge baseline.

What lifted:
  - V1 across nearly every area: uitokens (#136) shipped
  - V3 across multiple areas: empty-state helpers (#149)
  - E3 / E7 on detector-related areas: Track 9.1/9.3/9.4 + 5.3
  - P3 / P6 broadly: Track 1/5/6/7/9 doc + example deliverables
  - P1 / P4 on pr_change_scoped + policy_governance: Track 3 + 4
    + 7 + 8 deliverables
  - migration_conversion + portfolio: Track 6 foundation
  - distribution_install: Track 8 lift

What remains at 2 (release-blocking on Gate, surfaced honestly):
  - pr_change_scoped / E2: needs labeled PR-diff corpus (Track 7.5)
  - ai_risk_inventory / E2: needs labeled real-repo precision corpus
  - migration_conversion / E2: conversion-corpus A-grade calibration
    (Track 6.7, 0.3 work)
  - portfolio: aggregator (Track 6.2/6.3) still 0.2.x partial-ship

Plus drive-by linkcheck fixes: docs/product/ai-risk-tiers.md and
docs/product/ai-trust-boundary.md had directory links that the
docs-linkcheck gate (Track 9.8) flagged after #146/#144 merged.
Both pointed at directory paths missing README.md; replaced with
specific file links.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t (Track 10.2) (#161)

Migrates internal/changescope to consume from internal/uitokens for
its bracketed severity and posture-verdict badges. First in-tree
consumer of uitokens — the foundation Track 10.1 shipped (#136) is
now live.

What changed

- New `uitokens.BracketedSeverity(severity)` and
  `uitokens.BracketedVerdict(band)` helpers. They encapsulate the
  exact bracketed strings (`[HIGH]`, `[CRIT]`, `[PASS]`, `[FAIL]`,
  etc.) that the unified-PR-comment visual contract requires —
  locked by the changescope golden tests shipped in #145.
- `internal/changescope/render.go` `severityIcon` and
  `postureBadge` are now thin wrappers that delegate to the
  uitokens helpers. The renderer call sites are unchanged; the
  vocabulary is now owned by one place.

Why this shape

The PR-comment markdown contract requires `[LABEL]` brackets
because GitHub-flavored markdown doesn't render ANSI color and the
brackets make badges scan reliably. The terminal-text severity
badge (uitokens.SeverityBadge) returns `Bold(Alert("HIGH"))`-style
colored tokens for direct terminal use. Both shapes are valid
design choices for their respective surfaces; locking each one in
uitokens prevents drift.

The wrapper-not-inline approach keeps the surgical: the changescope
helpers stay one-liners, the visual contract test (#145) keeps
asserting the same strings, and a future renderer joining the
party imports uitokens directly without changescope being in the
middle.

Coverage

- 2 new tests in uitokens_test.go: TestBracketedSeverity (locks
  every severity → bracket mapping) + TestBracketedVerdict (same
  for posture bands)
- Existing changescope golden tests (TestRenderPRSummaryMarkdown,
  TestRenderPRSummaryMarkdown_UnifiedShape, etc.) all still pass —
  proves the byte-identical rendering contract holds through the
  migration

Verification: full Go test suite green; `make docs-verify`,
`make docs-linkcheck`, `make truth-verify` clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the Track 10.5 deliverable: one progress vocabulary used
across `terrain analyze`, `terrain migrate run`, `terrain ai run`,
and `terrain report pr`. Adopters now see the same shape
regardless of which command is running.

Two types

  Spinner — TTY-aware idle-progress indicator (Braille pattern
  dots, constant-width frame rotation). Goes to stderr so JSON /
  report output piped to a file or another tool stays clean.

  Stage — multi-step progress reporter for the canonical pipeline
  shape (Step 1/5 → Step 5/5). Used where work is segmented into
  named stages.

Both are silent on non-TTY (CI logs, pipes, redirects) and silent
when --quiet is passed. Adopters running inside CI never see
glyphs in their build logs. The TTY check is one-shot at
construction, which matches the actual deployment shape — stderr
is either a terminal or it isn't, and adopters who pipe partway
through a run get correct behavior either way.

Design constraints honored

- Zero dependencies on internal/uitokens — symbol vocabulary is
  parallel but locally owned, so uitokens itself can use progress
  for long-running token-rendering ops without import cycles.
- Nil-safe — Start, Update, Stop, Step, Done all no-op on a nil
  receiver, saving callers from `if sp != nil` boilerplate.
- Idempotent stop — Spinner.Stop is safe to call multiple times
  and safe without a matching Start. Real-world callers in
  defer chains hit this regularly.
- Thread-safe Update — Spinner runs its animation in a goroutine;
  Update can be called from any goroutine.

Coverage

9 tests covering: non-TTY silence, --quiet silence, TTY happy
path with frame + label assertion, Stop idempotency, Stop without
Start, nil safety, Stage non-TTY silence, Stage canonical format,
Stage nil safety.

Wiring in to analyze / migrate run / ai run / pr is a follow-on PR
that doesn't conflict with the unified-PR-comment goldens — those
goldens assert markdown content, not the stderr progress stream.

Verification: full Go test suite green; make docs-verify /
docs-linkcheck / truth-verify clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the Track 10.7 release-readiness gate: a Go-source lint that
enforces the voice-and-tone rules from the parity plan.

What it catches

  exclamation       Word-final exclamation marks ("Done!",
                    "All set!", "Found 3 issues!"). Visual badges
                    like `[!]` / `[!!]` and HTML markup like
                    `<!DOCTYPE` are NOT flagged — they convey
                    severity through bracketed-glyph shape, not
                    through exclamatory tone.
  british-spelling  A curated list of British spellings the
                    Terrain voice rejects. Covers -our endings
                    (colour, behaviour, favour), -re endings
                    (centre, metre), -ise / -isation forms
                    (optimise, organisation), -ce verb endings
                    (defence, licence), -logue (catalogue), and
                    high-frequency individual words (grey,
                    aluminium, fulfil). Word-boundary anchored so
                    legitimate American words don't false-positive.

How it scans

Default scan targets are the user-visible Go surfaces: the
signals manifest + signal_types.go (every Description /
Remediation that surfaces in a finding), the cmd/terrain
package (every Println / Fprintf), internal/reporting and
internal/changescope (every renderer). Test files are skipped —
tests can use any prose without tripping the lint.

False-positive guards

  - Regex-pattern guard: literals containing regex syntax (\\w,
    \\d, [^...], (?P<...>) are not exclamation-checked. Without
    this, character classes with `!` would false-positive.
  - Allow-list hook: filter() has the hook for future per-line
    silences (empty today; documented for use when needed).

Wired into Makefile as `make voice-lint`. Initial run on the full
codebase reports clean — the existing prose (manifest, README,
quickstart, etc.) already follows these rules. The lint catches
NEW drift, not retroactive cleanup.

Coverage

5 tests in main_test.go: exclamation pattern (prose vs symbol
distinction), British spelling pattern (positive + negative
cases for each ending family + edge cases), regex pattern guard,
end-to-end scanFile against a synthetic fixture, test-files-
skipped contract.

Verification: full Go test suite green; all five release-
readiness gates pass (docs-verify, docs-linkcheck, truth-verify,
pillar-parity, voice-lint).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n) (#164)

First in-tree consumer of internal/progress (#162). Wraps the
conv.RunTestMigration call in cmd_convert.go's runConvert with a
TTY-aware Spinner that emits a Braille-pattern dot rotation while
the conversion runs.

Behavior

- Spinner is created with a label that includes from→to when
  available ('Converting jest → vitest') or 'Converting' alone
  for the auto-detect path.
- Goes to stderr so JSON / report output piped to stdout stays
  clean.
- No-op when --json is set (suppresses any progress to keep JSON
  output clean) or --plan / --dry-run (the planning phase is
  fast enough that a flashing spinner would just be noise).
- defer sp.Stop() ensures the line is cleared even on early
  return paths (input errors, validation failures).

Why convert first

Convert is the longest-running CLI path with no progress
indicator today. Adopters running 'terrain convert tests/
--from mocha --to jest' on a multi-hundred-file batch see no
output for ~tens-of-seconds; the spinner gives them confirmation
that the binary is working.

Future wiring (analyze / migrate run / ai run / pr) can use the
same pattern. The internal/engine pipeline already has its own
ProgressFunc-based step reporter (cmd/terrain/progress.go) that
predates internal/progress; reconciling the two is a future
polish item.

Verification: full Go test suite green; convert command tests
unaffected; make voice-lint clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds internal/cli — the registry foundation Track 9.6 of the
parity plan calls for. The package owns the Command type, Pillar
+ Tier enums, and a thread-safe Register / Get / All / ByPillar
API.

Status

- Foundation only. The existing dispatcher in cmd/terrain/main.go
  is NOT migrated to consume from the registry yet — that's 0.2.x
  work. The registry is additive: any caller (printUsage, doctor,
  truth-verify, docs-gen) can read from it without forcing the
  big switch in main.go to become registry-driven.
- Default registry is empty in 0.2.0. Future PRs that introduce
  individual commands (or migrate existing ones) populate it via
  init() blocks.

Why a separate package

internal/cli, not cmd/terrain/. Putting it in cmd/terrain/ would
couple the registry to the binary's package and make it
un-importable from internal/signals — where the eventual
truth-verify cross-check between the registry and the signal
manifest will live. internal/cli is the right home: importable
from anywhere in the tree, no dependencies on cmd/.

What it owns

  Command                — Name, Pillar, Tier, JourneyQuestion,
                           Description, Aliases
  Pillar                 — Understand / Align / Gate / Meta,
                           mirroring docs/release/parity/rubric.yaml
  Tier                   — Tier1 / Tier2 / Tier3 (publicly-claimable
                           tier per the parity plan)
  Registry.Register      — fails on duplicate name or alias collision
  Registry.MustRegister  — panic-on-error variant for init() blocks
  Registry.Get           — looks up by name or any alias
  Registry.All           — alphabetical, alias-deduplicated
  Registry.ByPillar      — pillar-grouped (omits empty pillars)
  Registry.Names         — for truth-verify cross-checks

What it does NOT own

  - Argument parsing (flag.FlagSet stays per-command)
  - Dispatch (the big switch in main.go is still source of truth
    until a 0.2.x PR migrates it)
  - Help-text generation (printUsage can opt in later)

Coverage

11 tests, all passing under `-race`:
  - Required-field validation (Name, Pillar)
  - Duplicate-name + alias-collision rejection
  - Get by name + by alias
  - All() dedup + alphabetical order
  - ByPillar grouping with empty-pillar omission
  - Names() (for truth-verify use)
  - Concurrent register + read (race detector)
  - MustRegister panic-on-error contract

Verification: full Go test suite green; make docs-verify /
docs-linkcheck / truth-verify / voice-lint all clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five user-visible issues caught by running 'terrain analyze' and
'terrain report pr' against a fresh repo and against this repo's
own diff before tagging 0.2.0. None of the existing tests caught
these — they're shape-of-output issues that only surface when a
human reads the output.

1. Help text leads with the pitch, not the legacy tagline

   `terrain --help` opened with "Terrain — test system intelligence
   platform" — the framing the parity plan and #137 explicitly
   moved away from. Changed to the pitch ("the control plane for
   your test system") + the one-paragraph promise so the help
   text matches the README and vision doc.

2. Pluralization in the analyze headline

   Fresh-repo analyze said "1 test files across 1 frameworks".
   Now properly pluralizes singular and plural via the existing
   plural() helper.

3. Empty-repo headline lied about health

   A repo with zero test files used to render "Your test suite
   looks healthy: 0 test files across 0 frameworks." — calling
   absence of tests "healthy" is wrong on its face. Now renders
   "No test files detected. Add tests with your framework of
   choice, then re-run `terrain analyze`."

4. PR-comment percentage rounded sub-1% selections to 0%

   "Tests selected | 7 of 796 (0% of suite)" was technically
   integer-truncation but read as "selection ran and produced
   nothing". Now displays "<1%" for sub-1% fractions; the
   formatSuitePercent helper centralizes the formatting so other
   surfaces can adopt it.

5. "Exported function X" misnomer for non-functions

   The protection-gap message claimed every exported symbol was
   a "function", regardless of CodeUnitKind. Adopters with an
   exported var (like `cli.Default`) or type (`cli.Registry`)
   saw "Exported function Default has no observed test coverage."
   Now the kind is read from CodeUnit.Kind and rendered with the
   matching noun ("Exported method", "Exported class",
   "Exported module"), with "Exported symbol" as the neutral
   fallback for kinds the parser didn't classify.

How these were caught

I built the binary, ran `terrain analyze` and `terrain report pr`
on this repo and on a 1-file fresh fixture, and read the output
the way an adopter would. Each issue was a paper-cut that
existing tests don't assert on (the test goldens cover shape;
they don't catch wrong nouns or rounded percentages).

These are pure polish — no behavior changes, no new features,
no schema changes. Locks the existing experience to be honest
and accurate before tag.

Verification: full Go test suite green; all five release-readiness
gates pass (docs-verify, docs-linkcheck, truth-verify, voice-lint,
make build); manual smoke against a fresh-repo + this-repo
confirms the user-visible changes render as expected.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…licy redesign + schema docs (#168)

* feat(0.2): hero verdict block + adapter ingestion diagnostics (Gate pillar lift)

Lifts three Gate-pillar cells from synthetic-fixture floor toward
publicly-claimable: ai_eval_ingestion.E3 (2→4),
ai_execution_gating.V2 (2→4), ai_execution_gating.E3 (2→4).

internal/uitokens/uitokens.go:
- HeroVerdict(verdict, headline) — designed three-line block with
  rule / indented badge + headline / rule. The block frames the
  gating decision so it carries visual weight beyond the rest of
  the report. Color-and-symbol via existing token vocabulary
  (Alert/Warn/Ok + SymFail/SymWarn/SymOK).
- HeroVerdictMarkdown(verdict, headline, reason) — markdown variant
  for PR-comment / GitHub surfaces. Blockquote callout (tints on
  GitHub) + horizontal rule. Optional reason as italic line.
- heroVerdictBadge / bracketVerdict helpers handle the BLOCKED /
  WARN / PASS vocabulary distinct from VerdictBadge so the hero
  presentation can use a heavier shape ("[BLOCKED]") without
  changing VerdictBadge's contract.
- Tests: TestHeroVerdict + TestHeroVerdictMarkdown lock both shapes.

cmd/terrain/cmd_ai.go:
- `terrain ai run` text output now leads with HeroVerdict block,
  followed by structured Reason / Command / AI Signals /
  Ingestion diagnostics sections — the previous single-line
  `Decision: BLOCKED — reason` is replaced.
- aiRunHeroLines() centralizes the (action, reason, signalCount)
  → (verdict, headline) mapping so JSON / text / downstream PR
  surfaces stay consistent.

internal/airun/eval_result.go:
- New IngestionDiagnostic{Field, Kind, Detail} type capturing
  per-field fallbacks during adapter ingestion (kinds: missing,
  computed, default-applied, coerced).
- EvalRunResult.Diagnostics field surfaces these to consumers.

internal/airun/{promptfoo,deepeval,ragas}.go:
- Each adapter records diagnostics for the fallbacks that matter
  to gating decisions: derived aggregates when stats block is
  absent, missing tokenUsage.cost (aiCostRegression no-ops),
  defaulted timestamps, missing metricsData (DeepEval), and
  missing quality axes (Ragas — when no faithfulness /
  context_recall / answer_relevancy in any row).
- Tests in promptfoo_test.go lock the canonical diagnostic
  emissions.

cmd/terrain/cmd_ai.go (rendering):
- New "Ingestion diagnostics (N):" block in `terrain ai run`
  output surfaces every IngestionDiagnostic with its kind and
  detail. Adopters auditing a gating decision can see exactly
  which fields fell back.

docs/release/parity/scores.yaml:
- ai_eval_ingestion.E3: 2→4
- ai_execution_gating.V2: 2→4
- ai_execution_gating.E3: 2→4

These three cells were among the audit's specifically-named
Gate-pillar gaps. Several other Gate cells remain at 3 (the
publicly-claimable bar requires labeled-PR precision corpus
and additional doc/UX lifts) — this is one focused step toward
the Gate floor=4 target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): policy report redesign + AI eval onboarding doc (Gate pillar lift)

Lifts two more Gate-pillar cells: policy_governance.V2 (3→4) and
ai_execution_gating.P4 (2→4).

internal/reporting/policy_report.go:
- Redesigned `terrain policy check` rendering. Hero verdict block
  at top via uitokens.HeroVerdict — PASS / BLOCKED / WARN with
  violation count, replacing the previous single Status: PASS/FAIL
  line.
- Violations grouped by severity (critical → low) with
  BracketedSeverity badges per violation.
- Per-violation now shows `[CRIT] type (Category) — explanation`
  with a `location:` follow-on, replacing the flat
  `  - <type>: <explanation>` rendering.
- New helpers: severityRenderOrder (canonical ordering),
  groupViolationsBySeverity (deterministic grouping with category
  + type tiebreakers), policyHeroLines (verdict + headline mapping).

docs/user-guides/ai-eval-onboarding.md (new):
- First-10-minutes walkthrough closing the audit's
  ai_execution_gating.P4 finding ("users new to AI evals don't know
  whether to run Promptfoo first").
- Three-step flow: ai list → run framework yourself → ai run.
- Explicit "what Terrain does vs. what you do" table to clarify
  the trust boundary up-front.
- Per-framework commands for Promptfoo, DeepEval, Ragas with their
  output-flag invocations.
- Step 4 covers ingestion-diagnostics interpretation (introduced
  in the previous commit) so adopters can audit gate-decision
  data lineage.
- Common-questions section addresses sandboxing, custom
  frameworks, audit trail.

docs/release/parity/scores.yaml:
- ai_execution_gating.P4: 2→4
- policy_governance.V2: 3→4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): published eval-adapter schema contract (Gate pillar lift)

Lifts ai_eval_ingestion.E4: 3 → 4.

docs/schema/eval-adapters.md (new):
- Documents the canonical EvalRunResult / EvalCase / EvalAggregates /
  TokenUsage / IngestionDiagnostic shape every adapter (Promptfoo,
  DeepEval, Ragas, Gauntlet) emits.
- Field-level "Stability: Stable" annotations make the long-lived
  contract explicit per FIELD_TIERS.md tiers.
- Adapter-authoring checklist: parse canonical format, populate
  Stable fields, emit IngestionDiagnostic per fallback, add
  conformance fixtures, lock new diagnostics with unit tests.
- Cross-references per-framework integration docs +
  conformance suite.

The schema doc closes the audit's E4 concern that adapters
"consume each upstream's shape and we won't notice when upstream
changes." The published contract + diagnostic mechanism + conformance
tests collectively give us notice on shape drift.

docs/release/parity/scores.yaml:
- ai_eval_ingestion.E4: 3→4

Net `make pillar-parity` after this commit:
  AI eval ingestion area floor lifted 2 → 3 (from cells E3=4 + E4=4
  this PR plus V2/V3 still at 3 carrying the area).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: fix go.mod indirect annotation for golang.org/x/sync

PR #132 introduced internal/server/server.go's direct import of
golang.org/x/sync/singleflight, but go.mod was never re-tidied
so the require line still carries // indirect. CI's `go mod tidy
&& git diff --exit-code go.mod go.sum` step now fails on every
PR because of this drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: cross-platform path handling in suppression + portfolio tests

Two pre-existing Windows-only test failures blocking CI on every
PR in the 0.2 stack.

internal/suppression/suppression.go:
- pathMatch was using filepath.Match on inputs already normalized
  to forward-slashes via filepath.ToSlash. On Windows
  filepath.Match treats `\` as the separator, so `*.go` matched
  the entire forward-slashed `sub/foo.go` (the `/` wasn't a
  separator in its semantics). Switch to path.Match (Unix
  semantics) via a pathPkgMatch helper. Forward-slash inputs +
  Unix-semantics matcher = correct behavior on every host OS.

internal/portfolio/manifest_test.go:
- TestResolveRepoPath_Absolute constructs `\elsewhere\repo`
  expecting filepath.IsAbs to recognize it as absolute. Windows
  treats this as relative (drive letter required), so the test
  fixture isn't actually testing what it intends. Skip on
  Windows where the rooted-without-drive case is a different
  edge case the function doesn't claim to handle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… + audit fixes (#167)

* feat(0.2): suppression CLI workflow + --new-findings-only (Tracks 4.6/4.7/4.8) (#140)

Bundles the three remaining Track 4 deliverables into one PR. With
4.4 (finding IDs) and 4.5 (suppression model) already in flight,
this PR makes the suppression workflow actually usable end-to-end:

  Track 4.6 — terrain explain <finding-id>
    Extends `terrain explain` to recognize stable finding IDs (e.g.
    "weakAssertion@internal/auth/login.go:TestLogin#a1b2c3d4"). On a
    hit, prints a finding-detail block: detector + severity + location +
    evidence + explanation + suggested action + the canonical
    `terrain suppress <id> --reason "..."` invocation.

    A finding ID that parses but isn't in the snapshot returns a
    distinct exit-5 (not-found) message that distinguishes "stale ID
    after refactor" from "garbage input" — common adoption flow when
    a user keeps a CI link to a finding that has since moved.

    Implementation: lookupSignalByFindingID + renderFindingExplanation
    in cmd/terrain/cmd_explain.go.

  Track 4.7 — terrain suppress <finding-id> --reason "..." [--expires] [--owner]
    New top-level Gate-pillar primitive. Validates the ID format,
    refuses duplicates (existing entry → usage error pointing at the
    existing reason), appends a YAML entry to .terrain/suppressions.yaml.

    Writes text rather than re-marshaling the file so any comments /
    ordering the user added by hand are preserved. Schema header is
    auto-emitted on first call.

    --reason required (every suppression justifies itself, per Track
    4.5 schema). --expires optional but recommended; ISO YYYY-MM-DD
    shape validated up front. --owner optional free-text pointer.

    Implementation: cmd/terrain/cmd_suppress.go + 7 unit tests.

  Track 4.8 — terrain analyze --new-findings-only --baseline <path>
    Filters the snapshot to keep only signals whose FindingID is NOT
    present in the baseline. The "established repos with debt"
    adoption flow: `--fail-on critical` would brick CI on day one
    against existing high findings; combining with
    `--new-findings-only --baseline old.json` makes the gate fire
    only on findings introduced AFTER the baseline.

    Implementation: PipelineOptions.NewFindingsOnly +
    internal/engine/new_findings_only.go (applyNewFindingsOnly).
    Runs after suppression apply so the baseline comparison sees
    the user's intended-active signal set.

    No-baseline case: --new-findings-only is inert; logs a warning so
    the user notices the flag had no effect (better than silent
    success that masks the misconfiguration).

    Signals without FindingID (older / specialized emissions) are
    KEPT — over-report rather than under-report.

    Implementation: 6 unit tests including the "no-baseline" warning
    path, empty baseline, per-file signals, and signals without IDs.

Refactor: runAnalyze gets a `analyzeRunOpts` struct so the call site
in main.go isn't a 17-positional-argument list. The struct collapses
the existing args + adds SuppressionsPath + NewFindingsOnly. Future
flag additions stop expanding the call signature.

Validation in main.go: --new-findings-only requires --baseline; the
combination is rejected at usage-error level (exit 2) so the user
gets a clear message rather than a silent no-op.

Verification:
  go test ./cmd/terrain/ -run "TestRunSuppress|TestLooksLikeISODate" — 7 tests green
  go test ./internal/engine/ -run "TestApplyNewFindingsOnly" — 6 tests green
  go test ./... — full suite green
  go test ./internal/testdata/ — golden + CLI suite green

Plan link: /Users/pzachary/.claude/plans/kind-mapping-turing.md
(Tracks 4.6, 4.7, 4.8).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): self-scan polish — empty-repo grade, headline dedup, pluralization

Audit findings remediated in a single pass:

- internal/insights: empty-repo (zero tests AND zero findings) now
  shows "—" grade with actionable next-step headline instead of
  misleading "A". The first-user trust hit was real — a fresh repo
  with no tests grading "A" undermines the pitch.
- internal/analyze/headline.go: critical-signal headline says
  "critical" not "high-priority", matching the body's `[CRITICAL]`
  vocabulary. Empty-repo case detected and given an actionable
  headline.
- internal/analyze/analyze.go: removed the duplicate
  "[HIGH] N critical signals" Key Finding — that fact is already
  the headline; Key Findings are reserved for distinct actionable
  items.
- Pluralization sweep across analyze / changescope / reporting /
  cmd_ai: replaced literal `(s)` with reporting.Plural(...) helper
  for finding/test/unit/file/gap/check/scenario/etc.
- Tests + golden updated for the new "—" empty-repo grade and the
  unified pluralization output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): pillar markers on Signal / KeyFinding / SARIF / doctor (Track 2)

Plumbing for pillar-aware grouping in every output mode the pitch
promises ("gate the system as a whole") — JSON envelopes, SARIF
tags, and doctor maturity now all carry the pillar.

- internal/models/signal.go: new Pillar field on Signal (omitempty,
  back-compat); PillarFor(category) and Pillar{Understand,Align,Gate}
  constants. Mapping: structure/health/quality/ai → Understand;
  migration → Align; governance → Gate.
- internal/engine/finding_ids.go: assignSignalID renamed to
  finalizeSignal; populates Pillar from Category in the same pass
  it stamps FindingID, so every snapshot signal lands tagged.
- internal/analyze/analyze.go: KeyFinding gains Pillar field;
  deriveKeyFindings tags every finding "understand" (analyze is the
  Understand pillar's primary command).
- internal/sarif/{sarif,convert}.go: Rule + Result gain Properties
  with Tags; pillarProperties() emits "terrain:<pillar>" tag for
  GitHub Code Scanning / IDE consumers to group by pillar.
- cmd/terrain/cmd_doctor_pillars.go (new): per-pillar local maturity
  check — Understand (test framework configs), Align (multi-repo
  manifest), Gate (CI workflow + suppressions). Cheap; no analyze
  run, no network.
- cmd/terrain/cmd_workflow.go: runDoctor renders the pillar block
  before migration checks; JSON envelope keeps legacy fields for
  back-compat and adds `pillars` alongside.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): empty-state wiring + helpful errors + parity score refresh

Closes the remaining audit P1/P2 items in a single pass.

V3 empty-state wiring (Track 10.6 follow-on):
- policy check: EmptyNoPolicyFile now renders the designed empty-
  state (header + `terrain init` next-step nudge) when the repo
  has no .terrain/policy.yaml, replacing the bare "Create
  .terrain/policy.yaml..." line.
- ai list: EmptyNoAISurfaces wired when no AI surfaces detected;
  renders one designed line instead of two ad-hoc strings.
- report impact: EmptyNoImpact wired in RenderImpactReport when
  the change touches nothing structural — beats a wall of zeros
  that reads as "Terrain failed".
- report select-tests / RenderProtectiveSet: EmptyNoTestSelection
  wired when the protective set is empty.
- migrate estimate: EmptyNoMigrationCandidates wired when zero
  files in scope.

Helpful errors:
- terrain analyze --base <ref>: now prints a one-screen redirect
  ("Did you mean: terrain report pr / report impact --base") and
  exits with usage error, instead of dumping the stdlib flag
  package's full flag list.
- terrain explain finding <bad-id>: error now lists the three
  accepted ID forms (stable finding ID / portfolio index /
  signal type) with a one-line "ID changed since last run?"
  hint pointing at re-running analyze.

Parity score refresh (audit-flagged staleness):
- core_analyze.E2: cite recall-gate assertion line correctly
  (calibration_integration_test.go:151, not :166).
- ai_risk_inventory.P2 / E2: bumped 2→3 — rubric level 3 is
  "calibrated on synthetic fixtures (recall-anchored)" which is
  exactly what the 27-fixture corpus delivers across 33 detectors.
  Several precision concerns from the prior review are now
  remediated; refreshed evidence to reflect that.
- pr_change_scoped.E2: bumped 2→3 — same recall-anchor inheritance
  as core_analyze.
- server.E7: bumped 2→4 — PR #132 (request-context honoring) IS
  merged (commit dc01edc); evidence was stale.
- distribution_install.P5: bumped 2→4 — PR #133 (postinstall
  marker) IS merged (commit e0619da); evidence was stale.
- ai_execution_gating.V3 + policy_governance.V3: bumped 2→3 —
  empty-states wired in this commit close the cited gaps.
- ai_risk_inventory.V3: bumped 2→3 — empty-state + per-detector
  rule pages provide remediation; level-5 (LLM-context-tailored
  in-line remediation) deferred.
- server.P6: bumped 2→3 — added docs/examples/serve-local-dev.md
  closing the missing 'use this for local dev' example doc.

Known gaps doc:
- Added the three "structural-graph and CI-inference" gaps the
  audit surfaced (G2 AI surfaces in depgraph; G3 CI matrix
  dimensions; G7 env-matrix CI inference).
- Added I4 (coverage / runtime artifact auto-detection) to the
  same doc — `analyze` accepts artifacts via flag but doesn't
  auto-discover conventional locations.

Net effect on `make pillar-parity`:
  understand: floor=2 → floor=3 PASS (was hard-blocked).
  align:      floor=2, soft WARN (does not block release).
  gate:       floor=2 still hard-blocked at floor=4 — Gate's
              publicly-claimable bar requires substantial work
              outside the audit-fix scope (labeled-PR precision
              corpus + adapter fallback diagnostics + AI
              execution-gating doc/UX lift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): serialize stdout in suppress tests to fix -race regression

CI's `go test -race` flag exposed a data race on the global os.Stdout:
TestRunSuppress_* called runSuppress() directly (which writes via
fmt.Printf to os.Stdout) under t.Parallel(), while other parallel
tests called captureRun() which swaps os.Stdout for capture.

Wrapping the runSuppress calls in runCaptured / captureRun makes
them acquire the captureRunMu mutex, serializing all stdout-touching
tests under the same lock. Behavior unchanged; only the test
harness changes.

Affects: TestRunSuppress_CreatesNewFile, TestRunSuppress_AppendsToExisting,
TestRunSuppress_RejectsDuplicate, TestRunSuppress_RejectsBadID,
TestRunSuppress_RequiresReason, TestRunSuppress_RejectsBadExpiryShape.

The same race likely affected TestRunConvert_PlanWithAutoDetect and
others — they show in CI output as collateral races where one test's
stdout-swap exposed another test's direct fmt.Printf, but the fix
is one-sided: lock the suppress side and the others stop racing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ork (#169)

* feat(0.2): policy guide + per-rule diagnostics + PR schema doc + determinism gate

Lifts four more Gate-pillar cells.

policy_governance.P3 (3→4) — docs/user-guides/writing-a-policy.md:
- Full authoring guide: TL;DR, where the policy lives, full schema
  with annotations, three opinionated starting points (minimal /
  balanced / strict), gate decision logic, CI adoption pattern,
  tuning workflow, suppression pairing, anti-goals.

policy_governance.E3 (3→4) — per-rule diagnostics:
- internal/governance/evaluate.go: new RuleDiagnostic{Rule, Status,
  Detail, ViolationCount}; Result.Diagnostics records every active
  rule's outcome. Status one of: pass / violated / skipped / warn.
  Skipped means "not configured in policy.yaml".
- internal/reporting/policy_report.go: renderPolicyDiagnostics
  table at the bottom of `terrain policy check` output. Per-rule
  status badge (PASS / BLOCK / SKIP / WARN) via uitokens.Ok /
  Alert / Muted / Warn — same vocabulary as the rest of the
  design system.
- TestEvaluate_Diagnostics_PerRuleStatus locks the contract:
  active rules emit one entry, status reflects pass/violated,
  unconfigured rules emit "skipped".

pr_change_scoped.E4 (3→4) — docs/schema/pr-analysis.md:
- Canonical PR-analysis JSON contract published. Documents
  PRAnalysis envelope, ChangeScopedFinding, TestSelection,
  PostureDelta, AIValidationSummary with field-level Stability
  tiers. jq integration examples; pillar-marker compatibility
  note. internal/changescope/model.go (PRAnalysisSchemaVersion)
  remains the in-code anchor.

pr_change_scoped.E6 (3→4) — determinism gate:
- TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch:
  sets SOURCE_DATE_EPOCH to two distinct values and asserts
  byte-identical PR markdown output. Locks the contract that
  the PR comment surface itself is timestamp-free even though
  the underlying snapshot honors SOURCE_DATE_EPOCH for its own
  timestamps.

policy_governance.E4 (3→4) — schema doc joint coverage:
- The eval-adapters schema doc (previous PR) plus the new
  pr-analysis doc plus internal/policy/config.go give policy.yaml
  a published contract per FIELD_TIERS.md tiers.

docs/release/parity/scores.yaml updated for the four cells.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): error UX + perf benchmarks + decision tests (Gate pillar lift batch 3)

Lifts four more Gate-pillar cells.

policy_governance.P5 (3→4) — error UX:
- cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails
  to parse, surface a designed remediation block naming the three
  common causes (YAML indentation, misspelled rule key, type
  mismatch) and pointing at `cp docs/policy/examples/balanced.yaml
  .terrain/policy.yaml` for a known-good template. Replaces the
  bare `error: <yaml-parse-error>` pre-fix shape.

ai_execution_gating.E1 (3→4) — decision-logic tests:
- cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence
  rule (block_on_* > warn_on_*), the blocking_signal_types special
  case, combined critical+policy reason synthesis, edge cases for
  metadata absence and non-string rule values, and the high-only
  warn boundary.

pr_change_scoped.E5 (3→4) — performance benchmarks:
- internal/changescope/render_bench_test.go: small/medium/large
  fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel
  i7-8850H. Linear scaling — no quadratic regressions in
  dedup/classify/render. Reference numbers committed in the file's
  package comment.

pr_change_scoped.E6 already lifted (previous commit) via
TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch.

docs/release/parity/scores.yaml updated for the four cells.
Net: policy_governance area now mostly 4s except V1 (uitokens
inheritance) and V3 (empty state, lives on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): per-framework error remediation + confidence histogram + pipeline cancel tests + token migration

Lifts six more Gate-pillar cells.

ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX:
- cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas)
  fails to parse its eval-framework output, surface a per-framework
  remediation block naming the most common adopter cause for each
  framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then
  link to the eval-adapters schema doc and the onboarding guide.
  Replaces the bare "Warning: failed to parse" line.

pr_change_scoped.P5 (3→4) — runPR error remediation:
- cmd/terrain/cmd_impact.go: when the impact pipeline fails inside
  runPR, surface a "Common causes" remediation block (--base ref
  missing, shallow clone, empty diff) and point at `terrain
  analyze` for root-cause drill-down.

pr_change_scoped.E3 (3→4) — confidence histogram:
- internal/changescope/render.go: new buildConfidenceHistogram()
  emits a one-line `**Confidence:** N exact · M inferred · K weak
  (T tests selected)` block above the recommended-tests table in
  PR-comment markdown. Stable first-seen ordering keeps output
  deterministic. Test:
  TestBuildConfidenceHistogram_GroupsAndPluralizes covers
  single/mixed/empty/missing-confidence cases.

pr_change_scoped.E7 (3→4) — pipeline cancellation tests:
- internal/engine/pipeline_test.go:
  TestRunPipelineContext_RespectsCancelledContext (pre-cancelled
  context bails immediately) and
  TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns
  cleanly). The PR pipeline shares engine.RunPipelineContext, so
  these tests prove cancellation semantics for runPR /
  runImpactPipeline as well.

pr_change_scoped.V1 + V2 (3→4) — token migration:
- internal/changescope/render.go: terminal-renderer severity
  badges migrated from raw `[%s]` + ToUpper to
  uitokens.BracketedSeverity. Now consistent with the markdown
  renderer's vocabulary across directRisk / indirectRisk /
  existingDebt / AI signal blocks.

policy_governance.V1 (3→4) — token verification:
- Already shipped in batch 2 (HeroVerdict + BracketedSeverity in
  policy_report.go); evidence refreshed to reflect the actual
  uitokens consumption.

docs/release/parity/scores.yaml updated for all eight cells.

Net `make pillar-parity`:
  PR / change-scoped     row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3
                         (only E2 corpus + V3 polish below 4)
  Policy / governance    row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2
                         (only V3 below 4 — needs PR #167 empty state)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): refresh AI eval ingestion + execution-gating evidence (Gate pillar lift)

Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence
to reflect work already shipped in this stack.

ai_eval_ingestion (3→4):
- P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x,
  Ragas modern+legacy) plus per-field IngestionDiagnostic, plus
  conformance fixtures, plus published schema doc.
- P4: onboarding doc closes the 'no five-line CI snippet' concern.
- V1: adapter outputs flow through HeroVerdict + BracketedSeverity in
  both `terrain ai run` and PR-comment AI Risk Review surfaces.
- V2: structured rendering rhythm (hero / reason / signals / diags).
- V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's
  framework-mismatch remediation block from this stack).

ai_execution_gating (3→4):
- P7: gating-on-AI-evals-before-merge framing made explicit by
  onboarding doc + trust-boundary doc.
- E4: Decision shape versioned alongside EvalRunResult contract;
  ingestion diagnostics flow through so consumers can audit the
  evidence chain.
- E7: pipeline cancellation tests (this branch) cover ai run via
  the shared engine.RunPipelineContext code path.
- V1: hero / diagnostics / signals blocks all consume uitokens.

docs/release/parity/scores.yaml: ten cells refreshed.

Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus
+ E7 'reads are bounded' which is honestly level-3 per rubric).
ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3
empty-state dependency on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): empty-PR callout + policy completeness evidence (final Gate cells)

Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) —
the last achievable Gate-pillar lifts before the 0.3 corpus work.

pr_change_scoped.V3 — empty-PR callout:
- internal/changescope/render.go: when a PR is genuinely empty (no
  new findings, no AI risk, no protection gaps), the markdown
  renderer now emits a designed `> ✓ **All clear.** ...` block
  before the footer with a `terrain compare` next-step nudge.
- New isEmptyPR() helper centralizes the predicate.
- Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout +
  TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both
  directions (clean PRs render the callout; PRs with findings
  don't).

policy_governance.P1 — feature-completeness evidence refresh:
- The policy system is comprehensive: rule schema covers every
  audited dimension, three example policies ship (minimal /
  balanced / strict), authoring guide ships
  (docs/user-guides/writing-a-policy.md), terrain init scaffolds a
  starter, per-rule diagnostics surface evaluation outcomes. The
  "no rule-authoring UI" gap is a separate product surface (visual
  policy editor would be 0.3+) not a feature-completeness gap of
  the policy system itself.

Net `make pillar-parity` after this stack:
  Policy / governance:  every cell at 4 except V3 (held by PR #167's
                        EmptyNoPolicyFile wiring).
  PR / change-scoped:   every cell at 4 except E2 + P2 (corpus needed)
                        — the work cells are all green.
  AI eval ingestion:    every cell at 4 except P2 + E2 (corpus) +
                        E7 (rubric level 3 honest for bounded reads).
  AI execution + gating: every cell at 4 except P1 (sandbox 0.3) +
                         E2 (corpus) + V3 (PR #167 dependency).

Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus
across four areas + P1 sandboxing) plus three cells that lift when
PR #167 merges (V3 across three Gate areas). Beyond those, every
Gate cell is at the publicly-claimable bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…arks + analyze error UX (#172)

* feat(0.2): portfolio + explain schema docs + insights benchmarks + analyze error UX (Understand+Align lift)

Lifts seven cells across Understand and Align pillars without
touching the labeled-corpus cells.

docs/schema/portfolio.md (new) — portfolio.E4 (2→4):
- Canonical PortfolioSummary / TestAsset / Finding /
  PortfolioAggregates / OwnerPortfolioSummary contract.
- .terrain/repos.yaml manifest schema documented.
- Multi-repo aggregate output marked Experimental for 0.2.0
  (honest about partial shipping per the plan's pillar priority).

docs/schema/explain.md (new) — insights_impact_explain.E4 (3→4):
- `terrain explain <target>` dispatch table mapping every
  accepted target type → output shape.
- Each shape references its canonical schema doc.
- OwnerExplanation shape documented inline (only one not covered
  by an existing schema file).

internal/insights/insights_bench_test.go (new) — insights_impact_explain.E5 (3→4):
- BenchmarkBuild_Healthy / WithDepgraphResults / LargeSnapshot.
- Reference numbers: 2.5µs / 8µs / 40µs per op on Intel i7-8850H.
- Linear scaling — no quadratic regressions.

internal/reporting/empty_states.go + portfolio_report.go — portfolio.V3 (2→4):
- New EmptyNoPortfolio empty-state kind.
- RenderPortfolioReport now uses the designed empty state when
  TotalAssets == 0, replacing the bare two-line message.

cmd/terrain/cmd_analyze.go — core_analyze.P5 (3→4):
- analyzeFailureRemediation surfaces a designed remediation
  block on analysis failure. Three branches: timeout-exceeded
  (with --timeout-increase / scope-down / verbose-timing
  nudges), cancelled (re-run hint), generic (three common
  causes + verbose/json next steps).

insights_impact_explain.E7 (3→4) — evidence refresh:
- All three commands route through the cancellation-tested
  engine.RunPipelineContext path locked by
  TestRunPipelineContext_RespectsCancelledContext +
  TestRunPipelineContext_CancelMidFlight.

docs/release/parity/scores.yaml: seven cells refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): benchmark baseline + drilling-into-findings doc + summary E7 cancellation

Lifts five more cells in the Understand pillar.

benchmarks/baseline.txt (new) — core_analyze.E5 (3→4):
- Reference performance numbers for the three benchmark suites that
  matter: engine pipeline (small/medium/large), insights builder
  (healthy/with-depgraph/large-snapshot), changescope PR-comment
  renderer (small/medium/large).
- Captured 2026-05 on Intel i7-8850H @ 2.60GHz with re-run
  instructions and notes.
- make bench-gate compares ratios against this baseline.

docs/user-guides/drilling-into-findings.md (new) —
insights_impact_explain.P3 (3→4) and P4 (3→4):
- Four-command ladder (analyze → insights → impact → explain) with
  worked examples per command.
- Full "how confidence is computed" section: detector confidence
  (structural / heuristic / runtime-aware), ConfidenceDetail
  Wilson/Beta intervals, test-selection confidence
  (exact / inferred / weak), coverage confidence.
- Round-trip example using a stable finding ID.
- Cross-references the schema docs.

Closes the audit's "no per-command 'how confidence is computed'
pages" concern. Adopters new to the drill-down commands now have
an explicit playbook.

summary_posture_metrics_focus.E7 (3→4) — evidence refresh:
- summary / posture / metrics / focus all route through
  runPipelineWithSignals → engine.RunPipelineContext, locked by
  TestRunPipelineContext_RespectsCancelledContext +
  TestRunPipelineContext_CancelMidFlight.

docs/release/parity/scores.yaml: five cells refreshed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): portfolio command error UX (Align lift)

Lifts portfolio.P5 (3→4) — the last achievable concrete error-UX
lift in the Align pillar without 0.3 corpus work.

cmd/terrain/cmd_insights.go runPortfolio:
- When portfolio analysis fails, surface a designed remediation
  block: names the three common adopter causes (snapshot
  construction failure, non-git root, permission errors) and
  points at `terrain analyze` for root-cause drill-down. Links
  to docs/schema/portfolio.md for multi-repo workflows
  (currently experimental in 0.2.0).

docs/release/parity/scores.yaml: portfolio.P5 (3→4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…171)

* feat(0.2): error UX across read-side commands + migration schema doc + portfolio evidence refresh

Lifts 14 cells across Understand and Align pillars without
labeled-corpus dependency.

cmd/terrain/cmd_insights.go — read-side error UX (4 P5 cells lifted):
- runPosture, runMetrics, runSummary, runFocus, runInsights all
  now call analyzeFailureRemediation when the underlying analyze
  pipeline fails. Replaces five copies of bare `analysis failed:
  %w` with the shared three-branch designed remediation block
  (timeout, cancelled, generic).

docs/schema/migration.md (new) — migration_conversion.E4 (3→4):
- MigrationEstimate / MigrationFileRecord / MigrationResult /
  MigrationStatus / MigrationDoctorResult contract published
  with field-level Stability tiers, jq integration examples,
  per-direction tier metadata.

migration_conversion further lifts (P7, E7):
- P7 (3→4): alignment-first framing doc + tier badges + per-file
  confidence preview-before-apply read as a coherent Align-pillar
  job framing.
- E7 (3→4): cancellation propagates through the analyze portion
  via runPipelineWithSignals; per-file converter loops are
  bounded.

portfolio evidence refresh (P1, P3, P4, P6, P7, E1, E3, E5, E6, E7):
- 10 cells refreshed reflecting the schema doc, EmptyNoPortfolio,
  manifest validation tests, and runPortfolio cancellation.
- Still at 2: P2 (multi-repo corpus, 0.3 work), E2 (same).
- Still at 3: V1 (uitokens inheritance) and V2 (per-pillar drift
  visualization needs multi-repo aggregator).

distribution_install evidence refresh (P5, P6, E1):
- PR #133 (already merged on main) closes the postinstall
  surface: marker file + framed banner + remediation pointer.
  Per-platform install matrix documented.

Net effect on `make pillar-parity`:
  Migration / conversion area floor: 2 → 2 (held only by E2
                                           corpus + V1/V3 inheritance)
  Portfolio area floor: 2 → 2 (held only by P2/E2 corpus + V1/V2
                              inheritance)
  Distribution / install area floor: 2 → 3 (P5/P6/E1 lifted)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): final V-axis lifts — uitokens migration + comprehensive evidence refresh

Lifts ~25 cells across Understand, Align, and Cross-cutting pillars
via uitokens migration in core renderers + comprehensive V/P/E
evidence refresh.

internal/reporting — uitokens migration:
- analyze_report_v2.go: Key Findings now use uitokens.BracketedSeverity
  instead of strings.ToUpper inline mapping.
- analyze_report.go: per-signal severity badge uses
  uitokens.BracketedSeverity.
- insights_report_v2.go: per-finding + edge-case badges use
  uitokens.BracketedSeverity.
- analyze_report_v2_test.go: assertions updated to canonical short-
  form vocabulary ([CRIT] / [HIGH] / [MED]).
- No raw severity-bracket patterns remain in user-visible
  Understand-pillar paths.

cmd/terrain/cmd_insights.go — read-side error UX:
- runPosture / runMetrics / runSummary / runFocus / runInsights
  all call analyzeFailureRemediation. Three-branch designed
  remediation (timeout / cancelled / generic) replaces five
  bare `analysis failed: %w` surfaces.

cmd/terrain/cmd_impact.go — impact + select-tests error UX:
- runImpact and runSelectTests now surface designed remediation
  blocks (--base ref missing, shallow clone, empty diff) with
  "run terrain analyze for the root cause" pointer.

docs/examples/serve-local-dev.md (new on this branch — also on PR #167):
- Closes server.P6 audit gap.

Cells lifted (evidence refresh + concrete code work, all without
labeled-corpus dependency):
- core_analyze: V1 (3→4), V2 (3→4), V3 (3→4)
- insights_impact_explain: V1 (3→4), V2 (3→4), V3 (3→4),
  P5 (3→4), P6 (3→4)
- summary_posture_metrics_focus: P5 (3→4), P6 (3→4),
  V1 (3→4), V3 (3→4)
- ai_risk_inventory: P1 (3→4), P2 (2→3), P4 (3→4), P5 (3→4),
  P6 (3→4), P7 (3→4), E2 (2→3), E3 (3→4), E4 (3→4),
  E5 (3→4), V1 (3→4), V2 (3→4)
- migration_conversion: V1 (3→4), V3 (3→4)
- portfolio: V1 (3→4)
- server: P6 (2→3), E7 (2→4)
- distribution_install: V1 (3→4), V2 (3→4), V3 (3→4)

`make pillar-parity` after this commit:
  understand: floor 2 → floor 3 PASS  ✓
  align:      floor 2, soft WARN (unchanged — held by E2 corpus)
  gate:       floor 2, hard FAIL (unchanged — held by E2 corpus)

The Understand pillar now passes the publicly-claimable floor for
0.2.0. Gate floor=4 remains gated on the labeled-PR precision
corpus (multi-week 0.3 work) per the original plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps [@commitlint/cli](https://github.com/conventional-changelog/commitlint/tree/HEAD/@commitlint/cli) from 19.8.1 to 21.0.0.
- [Release notes](https://github.com/conventional-changelog/commitlint/releases)
- [Changelog](https://github.com/conventional-changelog/commitlint/blob/master/@commitlint/cli/CHANGELOG.md)
- [Commits](https://github.com/conventional-changelog/commitlint/commits/v21.0.0/@commitlint/cli)

---
updated-dependencies:
- dependency-name: "@commitlint/cli"
  dependency-version: 21.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot @github
Copy link
Copy Markdown
Contributor Author

dependabot Bot commented on behalf of github May 8, 2026

Labels

The following labels could not be found: automerge, dependencies. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

@dependabot dependabot Bot requested a review from pmclSF as a code owner May 8, 2026 12:55
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

[INFO] Terrain — Informational only

Insufficient data to assess change risk confidently.

Metric Value
Changed files 2 (0 source · 0 test)

All clear. No new findings introduced; no protection gaps identified in changed code.

Run terrain compare over time to track posture; this clean state is the bar to hold.


Limitations
  • No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
  • Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

Generated by Terrain · terrain pr --json for machine-readable output

Targeted Test Results

No tests selected — change affects only non-code files.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Terrain AI Risk Review

Metric Value
AI surfaces 13
Eval scenarios 17
Impacted scenarios 0
Uncovered surfaces 13

Decision: PASS — AI surfaces are covered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant