chore(deps-dev): bump the lint-format group with 3 updates by dependabot[bot] · Pull Request #173 · pmclSF/terrain

dependabot · 2026-05-08T12:55:02Z

Bumps the lint-format group with 3 updates: eslint, lint-staged and prettier.

Updates eslint from 8.57.1 to 10.3.0

Release notes

v10.3.0

Features

379571a feat: add suggestions for no-unused-private-class-members (#20773) (sethamus)

Bug Fixes

b6ae5cf fix: handle unavailable require cache (#20812) (Simon Podlipsky)

6fb3685 fix: rule suggestions cause continuation in class body (#20787) (Milos Djermanovic)

Documentation

32cc7ab docs: fix typos in docs and comments (#20809) (Tanuj Kanti)

7f47937 docs: Update README (GitHub Actions Bot)

Chores

d32235e ci: use pnpm in eslint-flat-config-utils type integration test (#20826) (Francesco Trotta)

3ffb14e chore: clean up typos in comments and JSDoc (#20821) (Pixel998)

22eb58a chore: add missing continue-on-error to ecosystem-tests.yml (#20818) (Josh Goldberg ✨)

88bf002 ci: bump pnpm/action-setup from 6.0.1 to 6.0.3 (#20815) (dependabot[bot])

97c8c33 chore: update ilshidur/action-discord action to v0.4.0 (#20811) (renovate[bot])

2f58136 chore: pin peter-evans/create-pull-request action to 5f6978f (#20810) (renovate[bot])

77add7f chore: add initial ecosystem plugin tests workflow (#19643) (Josh Goldberg ✨)

4023b55 test: Add unit tests for SuppressionsService.prune() (#20797) (kuldeep kumar)

54080da test: add unit tests for ForkContext (#20778) (kuldeep kumar)

f0e2bcc test: add unit tests for SuppressionsService.suppress() method (#20765) (kuldeep kumar)

a7f0b94 chore: update dependency prettier to v3.8.3 (#20782) (renovate[bot])

7bf93d9 chore: update TypeScript to v6 (#20677) (sethamus)

b42dd72 ci: bump pnpm/action-setup from 6.0.0 to 6.0.1 (#20781) (dependabot[bot])

2b252be test: add unit tests for IdGenerator (#20775) (kuldeep kumar)

v10.2.1

Bug Fixes

14be92b fix: model generator yield resumption paths in code path analysis (#20665) (sethamus)

84a19d2 fix: no-async-promise-executor false positives for shadowed Promise (#20740) (xbinaryx)

af764af fix: clarify language and processor validation errors (#20729) (Pixel998)

e251b89 fix: update eslint (#20715) (renovate[bot])

Documentation

ca92ca0 docs: reuse markdown-it instance for markdown filter (#20768) (Amaresh S M)

57d2ee2 docs: Enable Eleventy incremental mode for watch (#20767) (Amaresh S M)

c1621b9 docs: fix typos in code-path-analyzer.js (#20700) (Ayush Shukla)

1418d52 docs: Update README (GitHub Actions Bot)

39771e6 docs: Update README (GitHub Actions Bot)

71e0469 docs: fix incomplete JSDoc param description in no-shadow rule (#20728) (kuldeep kumar)

22119ce docs: clarify scope of for-direction rule with dead code examples (#20723) (Amaresh S M)

8f3fb77 docs: document meta.docs.dialects (#20718) (Pixel998)

Chores

7ddfea9 chore: update dependency prettier to v3.8.2 (#20770) (renovate[bot])

fac40e1 ci: bump pnpm/action-setup from 5.0.0 to 6.0.0 (#20763) (dependabot[bot])

7246f92 test: add tests for SuppressionsService.load() error handling (#20734) (kuldeep kumar)

4f34b1e chore: update pnpm/action-setup action to v5 (#20762) (renovate[bot])

... (truncated)

Commits

7889204 10.3.0
5b69b4f Build: changelog update for 10.3.0
d32235e ci: use pnpm in eslint-flat-config-utils type integration test (#20826)
b6ae5cf fix: handle unavailable require cache (#20812)
3ffb14e chore: clean up typos in comments and JSDoc (#20821)
6fb3685 fix: rule suggestions cause continuation in class body (#20787)
22eb58a chore: add missing continue-on-error to ecosystem-tests.yml (#20818)
88bf002 ci: bump pnpm/action-setup from 6.0.1 to 6.0.3 (#20815)
379571a feat: add suggestions for no-unused-private-class-members (#20773)
97c8c33 chore: update ilshidur/action-discord action to v0.4.0 (#20811)
Additional commits viewable in compare view

Updates lint-staged from 16.3.3 to 17.0.2

Release notes

Sourced from lint-staged's releases.

v17.0.2

Patch Changes

#1779 88670ca Thanks @iiroj! - Enable immutable GitHub releases

v17.0.1

Patch Changes

#1776 4a5664b Thanks @iiroj! - Adjust GitHub Actions workflow so that automatic publishing works with signed commits.

v17.0.0

Major Changes
#1745 e244adf Thanks @iiroj! - Node.js v20 is no longer supported, and the oldest supported version is now 22.22.1, which is an active LTS version at the time of this release. Node.js 20 will be EOL after April 2026. Please upgrade your Node.js version!

#1676 0584e0b Thanks @outslept! - Lint-staged now tries to verify the installed Git version is at least 2.32.0, released in 2021. If you're using an even older Git version, you need to upgrade it before running lint-staged!
#1745 2dcc40a Thanks @iiroj! - The dependency yaml is now marked as optional and probably won't be installed by default. If you're using a YAML configuration file you should install the package separately:
npm install --development yaml
If you're using .lintstagedrc as the config file name (without a file extension), it will be treated as a YAML file. If the content is JSON, consider renaming it to .lintstagedrc.json to avoid needing to install yaml.
Minor Changes

#1748 809d5ef Thanks @iiroj! - Add new option --hide-all for hiding all unstaged changes and untracked files, before running tasks. This makes it easier to run tools like Knip which check for unused code. Untracked files are included in the backup stash and restored automatically after running.

#1759 f13045a Thanks @iiroj! - Update dependencies, including tinyexec@1.1.1 to fix the following issues:

When using a Node.js version manager with multiple versions installed (nvm, n, for example), scripts with the #!/usr/bin/env node shebang (Prettier, ESLint, for example) were previously spawned using the default Node.js version configured by the version manager (the one which node points to) on POSIX systems. Now, they will be spawned with the same version that lint-staged itself was started with.

For example, if your default Node.js version is 24.14.1 but lint-staged is run with the latest version 25.9.0, the tasks spawned by lint-staged will now also use version 25.9.0. Previously they were spawned using 24.14.1.

When installing Node.js from the Ubuntu App Center (Snap store), the node executable available in PATH is a symlink pointing to Snap itself. The sandboxing features of Snap prevented lint-staged from spawning scripts with the #!/usr/bin/env node shebang, because it meant lint-staged tried to spawn Snap via the symlink. This resulted in an ENOENT error when trying to run prettier, for example. Now, since the real node executable's directory is available in the PATH, lint-staged will instead spawn the script with the real node binary succesfully.

#1761 d3251b1 Thanks @iiroj! - Lint-staged now runs git update-index --again after running tasks, instead of git add <originally staged files>. This should improve compatibility when using non-default indexes, for example when committing with a pathspec git commit -m "message" . instead of adding files to the index.

#1745 a9585ac Thanks @iiroj! - Remove commander as a dependency and use the built-in parseArgs from node:util to parse CLI flags.

Patch Changes

#1755 c82d30b Thanks @iiroj! - All tests now pass on the Bun runtime (latest).

#1750 a401818 Thanks @iiroj! - Remove manual handling for git stash --keep-index resurrecting deleted files, because the issue was fixed in Git 2.23.0 and lint-staged requires at least Git 2.32.0.

#1771 c4b8936 Thanks @iiroj! - Fix documentation about multiple config files and the --cwd option. When using it, all tasks will be run in the specified directory. For example, to run everything in the actual process.cwd(), use lint-staged --cwd=".".

v16.4.0

Minor Changes

#1739 687fc90 Thanks @hyperz111! - Replace micromatch with picomatch to reduce dependencies.

... (truncated)

Changelog

Sourced from lint-staged's changelog.

17.0.2

Patch Changes

#1779 88670ca Thanks @iiroj! - Enable immutable GitHub releases

17.0.1

Patch Changes

#1776 4a5664b Thanks @iiroj! - Adjust GitHub Actions workflow so that automatic publishing works with signed commits.

17.0.0

Major Changes
#1745 e244adf Thanks @iiroj! - Node.js v20 is no longer supported, and the oldest supported version is now 22.22.1, which is an active LTS version at the time of this release. Node.js 20 will be EOL after April 2026. Please upgrade your Node.js version!

#1676 0584e0b Thanks @outslept! - Lint-staged now tries to verify the installed Git version is at least 2.32.0, released in 2021. If you're using an even older Git version, you need to upgrade it before running lint-staged!
#1745 2dcc40a Thanks @iiroj! - The dependency yaml is now marked as optional and probably won't be installed by default. If you're using a YAML configuration file you should install the package separately:
npm install --development yaml
If you're using .lintstagedrc as the config file name (without a file extension), it will be treated as a YAML file. If the content is JSON, consider renaming it to .lintstagedrc.json to avoid needing to install yaml.
Minor Changes

#1748 809d5ef Thanks @iiroj! - Add new option --hide-all for hiding all unstaged changes and untracked files, before running tasks. This makes it easier to run tools like Knip which check for unused code. Untracked files are included in the backup stash and restored automatically after running.

#1759 f13045a Thanks @iiroj! - Update dependencies, including tinyexec@1.1.1 to fix the following issues:

When using a Node.js version manager with multiple versions installed (nvm, n, for example), scripts with the #!/usr/bin/env node shebang (Prettier, ESLint, for example) were previously spawned using the default Node.js version configured by the version manager (the one which node points to) on POSIX systems. Now, they will be spawned with the same version that lint-staged itself was started with.

For example, if your default Node.js version is 24.14.1 but lint-staged is run with the latest version 25.9.0, the tasks spawned by lint-staged will now also use version 25.9.0. Previously they were spawned using 24.14.1.

When installing Node.js from the Ubuntu App Center (Snap store), the node executable available in PATH is a symlink pointing to Snap itself. The sandboxing features of Snap prevented lint-staged from spawning scripts with the #!/usr/bin/env node shebang, because it meant lint-staged tried to spawn Snap via the symlink. This resulted in an ENOENT error when trying to run prettier, for example. Now, since the real node executable's directory is available in the PATH, lint-staged will instead spawn the script with the real node binary succesfully.

#1761 d3251b1 Thanks @iiroj! - Lint-staged now runs git update-index --again after running tasks, instead of git add <originally staged files>. This should improve compatibility when using non-default indexes, for example when committing with a pathspec git commit -m "message" . instead of adding files to the index.

#1745 a9585ac Thanks @iiroj! - Remove commander as a dependency and use the built-in parseArgs from node:util to parse CLI flags.

Patch Changes

#1755 c82d30b Thanks @iiroj! - All tests now pass on the Bun runtime (latest).

#1750 a401818 Thanks @iiroj! - Remove manual handling for git stash --keep-index resurrecting deleted files, because the issue was fixed in Git 2.23.0 and lint-staged requires at least Git 2.32.0.

#1771 c4b8936 Thanks @iiroj! - Fix documentation about multiple config files and the --cwd option. When using it, all tasks will be run in the specified directory. For example, to run everything in the actual process.cwd(), use lint-staged --cwd=".".

16.4.0

... (truncated)

Commits

0a3b253 Merge pull request #1780 from lint-staged/changeset-release/main
36cae27 chore(changeset): release
ff03baa Merge pull request #1779 from lint-staged/immutable-release
88670ca ci: force Changesets action to sign Git tag
7f4596b Merge pull request #1777 from lint-staged/changeset-release/main
697c9ae chore(changeset): release
7ed4125 Merge pull request #1778 from lint-staged/fix-changesets
bf400fe ci: publish to npm in same Changesets workflow so that a personal token isn't...
0c2e6e5 Merge pull request #1776 from lint-staged/fix-changesets
4a5664b ci: use personal access token with semantic-release so that it can trigger pu...
Additional commits viewable in compare view

Updates prettier from 3.8.1 to 3.8.3

Release notes

Sourced from prettier's releases.

3.8.3

SCSS: Prevent trailing comma in if() function (prettier/prettier#18471 by @kovsu)

🔗 Changelog

3.8.2

Support Angular v21.2

🔗 Changelog

Changelog

Sourced from prettier's changelog.

3.8.3

diff

SCSS: Prevent trailing comma in `if()` function (#18471 by `@kovsu`)

// Input
$value: if(sass(false): 1; else: -1);
// Prettier 3.8.2
$value: if(
sass(false): 1; else: -1,
);
// Prettier 3.8.3
$value: if(sass(false): 1; else: -1);

3.8.2

diff

Angular: Support Angular v21.2 (#18722, #19034 by `@fisker`)

Exhaustive typechecking with @default never;

<!-- Input -->
@switch (foo) {
  @case (1) {}
  @default never;
}
<!-- Prettier 3.8.1 -->
SyntaxError: Incomplete block "default never". If you meant to write the @ character, you should use the "&#64;" HTML entity instead. (3:3)
<!-- Prettier 3.8.2 -->
@switch (foo) {
@case (1) {}
@default never;
}

arrow function and instanceof expressions.

</tr></table>

... (truncated)

Commits

d7108a7 Release 3.8.3
177f908 Prevent trailing comma in SCSS if() function (#18471)
1cd4066 Release @prettier/plugin-oxc@0.1.4
a8700e2 Update oxc-parser to v0.125.0
752157c Fix tests
053fd41 Bump Prettier dependency to 3.8.2
904c636 Clean changelog_unreleased
dc1f7fc Update dependents count
b31557c Release 3.8.2
96bbaed Support Angular v21.2 (#18722)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore <dependency name> major version will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself)
@dependabot ignore <dependency name> minor version will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself)
@dependabot ignore <dependency name> will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself)
@dependabot unignore <dependency name> will remove all of the ignore conditions of the specified dependency
@dependabot unignore <dependency name> <ignore condition> will remove the ignore condition of the specified dependency and ignore conditions

…bing ## Summary Foundation work for the 0.2 release per `docs/release/0.2.md`. **35 commits** covering all five critical-path items plus the bulk of parallelizable work. ## Critical path (5/5 done) 1. **Schema lock + manifest CI gate** (`98336e6`) 2. **SignalV2 schema** (`03998e1`) 3. **Severity rubric** (`7fd018f`) 4. **Calibration corpus infrastructure** (`e67e129`) + Wilson 95% intervals (`0bb7e67`) 5. **Tree-sitter parser pool** (`b450145`) ## AI detectors — 10 of 12 stable, 1 experimental, 2 still planned | Signal | Status | Severity | Source | |---|---|---|---| | `aiHardcodedAPIKey` | stable | Critical | static | | `aiNonDeterministicEval` | stable | Medium | static | | `aiModelDeprecationRisk` | stable | Medium | static | | `aiPromptInjectionRisk` | experimental | High | static | | `aiToolWithoutSandbox` | stable | High | static | | `aiSafetyEvalMissing` | stable | High | graph | | `aiHallucinationRate` | stable | High | runtime/eval | | `aiCostRegression` | stable | Medium | runtime/eval + baseline | | `aiRetrievalRegression` | stable | High | runtime/eval + baseline | | `aiPromptVersioning` | stable | Medium | static | `aiFewShotContamination` and `aiEmbeddingModelChange` remain — both need similarity comparison or content-hash infrastructure that didn't fit cleanly in this PR. ## Eval-framework adapters (3/3 done) - **Promptfoo** (`93d4782`, `34f0e01`, `f1d6b22`) — handles v3 and v4+ shapes - **DeepEval** (`c963501`) — testCases × metricsData → EvalRunResult - **Ragas** (`7bde060`) — pulls every numeric named-score into NamedScores All three target the same `EvalRunEnvelope` shape; CLI flags `--promptfoo-results`, `--deepeval-results`, `--ragas-results` ingest in parallel during analyze. ## Baseline snapshot mechanism (`23e3e2b`, `93c63c4`) `--baseline path/to/old.json` attaches a previous snapshot for regression-aware detectors. `aiCostRegression` and `aiRetrievalRegression` consume it via paired-case comparison. ## Confidence intervals (`0bb7e67`) `airun.WilsonInterval95` + `calibration.MetricInterval` — Wilson score 95% CI for binomial proportions. Detectors emit heuristic intervals today; once corpus is populated those swap to corpus-derived bounds. ## terrain ai run captures eval output (`d6f9f9d`) Promptfoo / DeepEval / Ragas runs now write structured output to a temp file, get parsed via the right adapter, land in `snap.EvalRuns`, and stash on the persisted Artifact. ## Test discovery improvements - **Hierarchical Go t.Run** (`d85b3be`) — full SuiteHierarchy preserved across nesting - **Pytest parametrize value extraction** (`d85b3be`) — `Values []string` per parametrize row - **JUnit 5 `@Nested` + `@DisplayName`** (`44e88f8`) - **Pytest fixture dependency graph** (`aae50de`) — `Dependencies []string` per FixtureSurface - **Vitest in-source tests** (`f003fd4`) — `import.meta.vitest` files discovered as test-bearing - **TSConfig path resolution** (`fc5c865`) — extends chain + multi-target + jsconfig.json fallback ## AI surface detection expansion (`831cfab`) Datasets (.jsonl/.parquet/...), DB-cursor + SQL retrieval (psycopg2/pgvector/Elasticsearch knn), in-memory FAISS, MCP tool decorators. ## Conversion (`a236871`, `8ce4121`, `a32c7a4`) - `terrain convert --preview` with real LCS-based unified diff - Per-file confidence: `ItemsCovered` / `ItemsLossy` / `Confidence` - `.terrain/conversion-history/log.jsonl` audit trail ## Hardening + ops gates - **Performance regression gate** (`2eab842`) — >10% regression fails the PR - **Auto-generated rule docs** (`78fad2b`) — 68 stubs + drift gates - **Cosign hard-fail** (`6e8ebd9`) — npm postinstall switched from warn-only - **SLSA L2 build provenance** (`225e1c7`) — `actions/attest-build-provenance@v3` per archive - **Husky pre-commit** (`56006ed`) — secrets-shaped + archive blocklists - **Dependency pin rationale** (`153bd5e`) ## Documentation (`8459bfd`) - README "What Terrain Is Not" + GitHub Actions templates - Persona guides: `docs/personas/{frontend,backend,ai,manager}.md` - Comparison guide: `docs/compare/codecov-sonar-launchable.md` ## Misc - (`ecbc59e`) GitHub secret scanner alert resolved — synthetic test fixtures split at compile time ## Still pending in 0.2 - 2 AI detectors (`aiFewShotContamination`, `aiEmbeddingModelChange`) — need similarity comparison + content-hash infrastructure - CLI restructure 43 → ~15 canonical commands + universal flag schema (well-bounded but large mechanical change worth its own PR) - Surface detection deeper RAG plumbing (chunker / retriever inference) - Conversion top-3 fixture corpora to A-grade (content work, ~12 SP) - Scoring v2 — band re-anchoring to corpus percentiles (blocks on corpus content) ## Test plan - [x] `go test ./internal/... ./cmd/... -count=1` — all packages green - [x] `make docs-verify` — manifest JSON + severity rubric + 68 rule-doc stubs all in sync - [x] `make calibrate` — runs end-to-end against `tests/calibration/` - [x] `terrain analyze --promptfoo-results sample.json` smoke-tested end-to-end - [x] `terrain analyze --baseline old.json --promptfoo-results new.json` exercises regression detectors - [x] `terrain convert --preview` smoke-tested end-to-end (Jest → Vitest) - [x] AI detectors verified to fire on smoke fixtures - [ ] CI: full pipeline including new docs-verify, bench-gate, drift gates, attestations 🤖 Generated with [Claude Code](https://claude.com/claude-code)

…rieval) ## Summary Closes the eval-data-detector calibration gap from PR #123's followups. Three detectors that ingest framework eval results now have anchors in the calibration corpus: `aiHallucinationRate`, `aiCostRegression`, and `aiRetrievalRegression`. With these landed, all 12/12 AI detectors from the round-4 plan have at least one calibration fixture at 1.00 precision/recall. ## Changes - **Calibration runner extension**: auto-discovers per-fixture eval artifacts (`eval-runs/{promptfoo,deepeval,ragas}.json` and `baseline.json`) and feeds them to the pipeline as `PipelineOptions`. Fixtures without these files behave as before. - **Path-relative matching in `matchFixture`**: eval-data detectors stamp the artifact's absolute path into `Signal.Location.File`; the matcher now normalises against the fixture dir so labels stay portable. - **Baseline-from-fixture helper**: `buildBaselineFile` synthesises a baseline snapshot from `baseline/eval-runs/{promptfoo,deepeval,ragas}.json` rather than requiring a hand-written snapshot JSON with base64-encoded payloads. Regression-shaped fixtures only need two pairs of framework JSON files. - **Three new calibration fixtures**: - `eval-hallucination-rate` — Promptfoo run with 3/8 hallucinated cases (37.5% > 5% threshold). - `eval-cost-regression` — paired runs, avg cost-per-case rose 0.0028 → 0.0084 (+200% vs 25% threshold). - `eval-retrieval-regression` — paired runs with `context_relevance` avg dropping 0.90 → 0.59 (−31 pts vs 5-pt threshold). ## Corpus state after merge - 27 fixtures × 33 distinct signal types at 1.00 precision/recall - All 12/12 AI detectors covered - Calibration gate is load-bearing — any future change that drops a labelled signal trips CI. ## Test plan - [x] `make calibrate` reports 27/33 at 100% - [x] `go test ./internal/... ./cmd/... -count=1` clean locally - [ ] CI green on Ubuntu / macOS / Windows 🤖 Generated with [Claude Code](https://claude.com/claude-code)

## Summary Implements the noun.verb restructure of the Terrain CLI surface. Compresses 35 top-level commands to 11 canonical, all non-breaking — legacy commands keep working through 0.2 with a deprecation note in 0.2.x and removal in 0.3. ## The 11 ``` 1. terrain init 2. terrain analyze (--against=<ref>, --policy=<file> in phase B) 3. terrain report <verb> # 9 verbs: summary, insights, metrics, explain, # show, impact, pr, posture, select-tests 4. terrain migrate <verb> # 11 verbs covering convert + migrate + migration 5. terrain ai <verb> (already grouped) 6. terrain portfolio <verb> (already grouped) 7. terrain config <verb> # feedback, telemetry 8. terrain doctor 9. terrain debug <verb> (already grouped) 10. terrain serve 11. terrain version ``` ## Two former top-level commands collapse into flags - `focus` → `report summary --focus=<path>` (also valid on insights/metrics) - `export` → `--output=<path>` flag pattern across all verbs ## Migration strategy (non-breaking) 1. **0.2 (this PR)** — namespaces ship as aliases. Legacy commands keep working. New users see the canonical shape in `--help`. 2. **0.2.x** — deprecation note on legacy invocations. 3. **0.3** — legacy names removed. ## What this PR does NOT do (phase B) - Fold `policy` into `analyze --policy=<file>` — changes exit-code semantics. Deserves its own review. - Fold `compare` into `analyze --against=<ref>` — different output path. Same. - Universal flag schema cleanup (`--root` vs `-root`, `--json` vs `--format json`). The dispatchers added here use the canonical names; the legacy parsers still accept both. ## Three new dispatcher files - `cmd_migrate_namespace.go` — 11 canonical verbs covering convert + migrate + migration. `terrain convert` aliased to `terrain migrate`. - `cmd_report_namespace.go` — 9 read-side verbs. - `cmd_config_namespace.go` — 2 verbs (feedback, telemetry). Each has a corresponding `_test.go` verifying canonical-verb recognition, unknown-verb rejection, and missing-arg handling. Behavioural tests for the underlying runners live in their existing per-command tests. ## Test plan - [x] `go test ./...` clean locally - [x] Smoke-tested `terrain migrate list`, `terrain convert list`, `terrain report`, `terrain config telemetry --status` end-to-end - [ ] CI green on Ubuntu / macOS / Windows 🤖 Generated with [Claude Code](https://claude.com/claude-code)

## Summary Adversarial review (`/gambit:parallel-agents` × 5 domains, ~150 findings) surfaced material ship-blockers. This PR fixes the must-merge-before-tag items and rewrites the CHANGELOG to match what actually shipped. **Without this PR, v0.2.0 should not be tagged.** ## Code fixes 1. **`aiModelDeprecationRisk` false positive** — regex matched `claude-2.1`, `gpt-3.5-turbo-0125`, and other dot-versioned current models against undated parents. One-character fix in the boundary class. 2. **Ragas detector broken end-to-end** — `retrievalScoreKeys` allowlist missed Ragas's modern key (`context_precision`). Detector silently fired zero signals on real Ragas runs. Added modern Ragas keys plus LangSmith variants. 3. **`terrain convert <file>` regressed** — CLI fold-in routed per-file conversion to the project-wide migrate runner (which expects a directory). Split the fall-through: `convert` namespace keeps `runConvertCLI`, `migrate` namespace keeps `runMigrateCLI`. ## CHANGELOG honesty pass The previous CHANGELOG made claims that don't match reality: | Was | Now | |---|---| | "12/12 detectors shipped" | 10 stable + 2 experimental, per-detector caveats | | "27 × 33 at 100% precision/recall" | "32 types fire on real-shaped fixtures; gate is recall, not precision" | | "35→11 commands" | "canonical 11 + 33 legacy aliases" (binary still accepts ~44) | | "`focus → --focus=<path>`, `export → --output=<path>`" | Removed — flags don't exist in 0.2 | | "18 severity clauses" | 17 (matches code) | | "Schema 1.0.0 → 1.1.0" | Snapshot 1.1.0; manifest export still 1.0.0 | | "Cosign hard-fail" | Caveat added — degrades to checksum when cosign absent | ## New: `docs/release/0.2-known-gaps.md` Captures ~30 review-flagged follow-ups across detectors, eval adapters, calibration corpus, CLI, distribution, and documentation. Each entry names where it lives and the planned resolution window (0.2.x or 0.3). ## Deferred-to-0.3 expanded Items from `docs/release/0.2.md` that didn't ship are now tracked explicitly: scoring v2, top-3 conversion at A-grade (was Tier-2 gate), universal flag schema, plugin architecture, in-band deprecation warnings, terrain doctor consolidation, terrain ai gate, several "lands in 0.2" manifest entries that didn't promote. ## Test plan - [x] `go test ./...` clean - [x] `make calibrate` clean (27 × 32 at 100% recall) - [x] `terrain convert /tmp/foo.cy.test.ts --from cypress --to playwright` works - [x] `terrain convert list`, `terrain migrate list`, `terrain report summary` all route correctly - [ ] CI green on Ubuntu / macOS / Windows 🤖 Generated with [Claude Code](https://claude.com/claude-code)

## Summary Re-opening as a squash-target PR after `main` was reset to `b9868cf`. Branch state is identical to the prior PR #126 (16 commits, +2187/-350); the only difference is the merge method this time around. Closes the ~80 unique substantive findings from two adversarial review passes (`/gambit:parallel-agents`) across: - **AI detectors (15+)** — model deprecation, prompt injection, hallucination rate, embedding model change, hardcoded API key, prompt versioning, few-shot contamination, retrieval regression, cost regression, non-deterministic eval, tool-without-sandbox, safety eval missing - **Eval adapters** — Ragas numericValue + cost-column false-failure - **Calibration corpus** — Symbol-aware match key, ExpectedAbsent normalisation, load-bearing gate (t.Skipf → t.Fatalf, minFixtures=25) - **CLI** — uniform Ctrl-C handling on all 18 pipeline call sites, canonical migrate/convert help blocks, deprecation hooks, distinct exit code for not-found - **Engine safety** — detector panic recovery, baseline schema validation - **Severity rubric** — split overloaded clauses; SARIF helpUri - **Supply chain** — mandatory cosign, supply-chain.md v0.2.0 refresh + `gh attestation verify` section - **Docs** — feature-status.md rewrite, COMPAT.md schema bump, scoring-rubric v2 alignment, README canonical-11 framing, honest known-gaps.md ## Test plan - [x] `go test ./...` green locally - [x] CI green on identical SHAs in PR #126 prior to its merge (12/12 checks) - [x] No regressions in calibration corpus (24 fixtures × 30 detector types at 1.00 precision/recall) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

…c honesty (#128) * fix(0.2): Final polish — release blockers, detector-panic catalog, doc honesty Final adversarial-review pass before tag. Closes the highest-impact findings from a 7-domain parallel review (~245 findings total; this batch is the verified P0/P1 subset). ## Release infra (P0) - npm-release job adds setup-go: pre-fix, `npm publish --provenance` triggered prepublishOnly → npm test → verify-pack.js → `go build`, which would fail with `go: not found` because Go isn't installed on the npm-release runner. The first release attempt would crash at publish time. Adds permissions: id-token: write explicitly so provenance attestation works regardless of inherited workflow permissions. - supply-chain.md drops the windows/arm64 row. goreleaser only builds windows/amd64; the doc was promising an artifact that doesn't exist. ## Engine self-diagnostic (P0) - detectorPanic added to models.SignalCatalog and the manifest. Pre-fix: safeDetect's panic-recovery path emitted a "detectorPanic" signal, but ValidateSnapshot rejected it as an unknown type — so the whole snapshot got dropped, defeating the graceful-degradation promise. New regression test (TestRegistry_RunRecoversFromDetectorPanic_ProducesValidSnapshot) asserts the snapshot from a panicking detector still passes validation. A new TER-ENGINE-001 rule doc captures the contract. ## CLI (P1) - README claimed `terrain score` as a canonical command — it doesn't exist. Removed from the README architecture block. - runDepgraph (cmd_debug.go) bypassed runPipelineWithSignals, inheriting context.Background() — Ctrl-C didn't unwind during deep monorepo scans. Routes through AnalyzeContext now, matching every other analysis command since 0.2. - terrain version --json now includes schemaVersion. CI tooling that pins on snapshot shape no longer needs to load a snapshot to find out which version a binary produces. ## Detector quality (P1) - aiFewShotContamination silenced for auto-derived scenarios (empty CoveredSurfaceIDs — the dominant shape in the wild). Adds the same implicit path-based coverage fallback aiSafetyEvalMissing already shipped: scope to top-level dir when scenario.Path is set, whole-repo otherwise. Two new tests lock in both branches. ## Eval adapter (P1) - Promptfoo errors-bucket wiring: row-derived stats fallback used to classify every non-success as Failure, even when the row had a provider-crash `error: "..."` field. That polluted aiHallucinationRate's `caseIsScoreable` denominator. Now routes errored rows into Aggregates.Errors via a new promptfooRowErrored helper, and per-case Cost falls back to the top-level `cost` field when r.response.tokenUsage.cost is zero (modern Promptfoo writes to both). Two new tests pin both behaviors. ## Documentation honesty (P0/P1) - README:61 footnote: "marked [experimental] or [planned] in 0.1.2" → "in 0.2.0". - docs/personas/ai.md: 8 detectors marked *(planned)* that ship stable in 0.2.0; aligned with feature-status.md. - CONTRIBUTING.md and DESIGN.md "30+ commands" → "10 canonical + legacy aliases"; "47 packages" / "48 packages" / "46 packages" → 49 (actual). - docs/release/0.2.md "Status: planned (starts after 0.1.2 ships)" → "shipping (release 0.2.0)". - docs/release/0.1.2.md "Status: in progress" → "shipped". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): Doc honesty + CLI deprecation hints + serve --read-only enforcement Continuation of the final-polish review work. Closes a wider band of findings from the 7-domain review. ## Doc honesty - Exit-code 4 comment in main.go: claimed `terrain ai gate` exists, reframed as `terrain ai run --baseline`'s actionBlock decision. - 0.2.md: "43 → ~15 canonical" → "43 → 11 canonical" (actual landing). - 0.2.md Goal 4: honest about 50-repo labelled corpus slipping to 0.3 while 0.2 ships a regression-only fixture corpus + ≥95% recall gate. - 0.2.md release-gates: replace "≥90% precision" with the actual "≥95% recall" gate that shipped. - release-notes.md: was 334 lines of 0.1.0 ground-up-rewrite framing — now a short redirect to CHANGELOG + per-release docs. - release-checklist-final.md: rewritten for the 0.2 surface (canonical 11 + legacy aliases) and replaces the stale 0.1.x checklist with load-bearing gates (calibrate, docs-verify, bench-gate). - quality-bar-and-gates.md: adds the three 0.2 release gates that the table previously omitted (docs-verify, calibrate, bench-gate). - scoring-rubric.md: stale "0.1.x"/"0.1.2's job" framing aligned to what 0.2.0 actually carries forward. - 0.2-known-gaps.md: mid-cycle banner points at `main` instead of a deleted branch; installer/supply-chain/README rows marked **(fixed in 0.2)** with the actual fix described. ## CLI deprecation hints - 11 legacy top-level commands gain `legacyDeprecationNotice` calls so `TERRAIN_LEGACY_HINT=1` produces a uniform migration prompt: shorthands, detect, estimate, status, checklist, reset, policy, portfolio, focus, compare, migration, show, export, depgraph. Pre-fix only ~6 of the legacy commands triggered the hint, making the documented opt-in environment variable inconsistent. ## serve --read-only — promote no-op to actual enforcement - `--read-only` was documented as "no-op in 0.1.2; reserved for 0.2" on a 0.2 ship — a CLI-contract trap. The security middleware now rejects any non-GET/HEAD/OPTIONS request with HTTP 405 + Allow header when ReadOnly is set. Every endpoint in 0.2 is GET-only, so this is a contract gate for future state-changing endpoints rather than a behaviour change for current traffic. Flag help and Config doc-comment updated. ## README onboarding - Quick Start adds `terrain explain <test-path>` so the README's own "four canonical workflows" promise (analyze / insights / impact / explain) is exercised in the start path. - Documentation section now links CHANGELOG.md, feature-status.md, and SECURITY.md — pre-fix you couldn't reach release notes from the README in one click. - "Pre-built binaries" line corrected: macOS (amd64+arm64), Linux (amd64+arm64), Windows (amd64). Pre-fix it claimed Windows arm64 too, which goreleaser doesn't build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): AI detector polish — FP/FN edges + confidence scaling + severity ladder Wide pass through the AI detectors closing P1/P2 findings from the final-polish review. ## aiToolWithoutSandbox - delete_<benign-noun> false-positive class closed: `delete_cache`, `purge_logs`, `remove_session`, `truncate_buffer`, etc. now silenced via `classifyDestructive` + `benignDestructiveObjects` whitelist. Always-high verbs (exec, eval, run_shell, send_payment, transfer, charge, refund) keep firing regardless of object — their blast radius isn't bounded by the noun. - Word-boundary regex bug fixed: Go's `\b` treats `_` as a word char, so `\bdelete\b` did NOT match `delete_user`. Trailing class is now `(?:_|\b)` to catch the dominant `verb_object` snake-case form. - Two new tests cover both the benign-suppression path and the always-high-verb-still-fires path. ## aiPromptInjectionRisk - userInputShapes expanded to include FastAPI `Body/Query/Form/File/ Header/Cookie/Path`, Flask `request.form/request.values`, Django `request.POST/request.GET/request.FILES`, gRPC `request.message/ request.payload`, Pyramid `request.json`, plus `sys.stdin/argv` for CLI-driven user input. Pre-fix only Express-style `request.body` and `req.body` shapes were caught. - Multi-line concatenation false-negative closed: scanner now reads the file as a slice of lines and looks at a 3-line window for the user-input match when a prompt-shaped variable appears. Real-world Black/Prettier-formatted code routinely splits `prompt += \n user. input` across two lines, which the line-by-line scanner missed. - `bufio.Scanner` replaced with `os.ReadFile` + line split. ## aiHallucinationRate - Keyword list expanded from 5 stems to 17 to catch real failure- reason text: "not in source", "not in context", "no evidence", "no citation", "unsupported", "outside scope", "off-topic", "contradicts source". Pre-fix list was closed-class English with only "fabricat*", "hallucinat*", "grounding", "made up", "ungrounded". - Threshold-boundary comment clarified (rate <= threshold skips, so fire is rate > threshold; equal-to-threshold does not fire). ## aiHardcodedAPIKey - Scan-error abuse fixed: pre-fix the synthetic `keyHit` for bufio.Scanner.Err() was emitted as a SeverityCritical Signal with Type=aiHardcodedAPIKey, painting a binary blob as a real API key in the rendered report. Now ScanError is a flag on keyHit; the caller routes scan-error hits to a SeverityMedium degraded-coverage diagnostic with explanation "Secret-scan coverage degraded: scanner failed mid-file." ## aiModelDeprecationRisk - Severity by category: deprecated tags (text-davinci-003, claude-1) are now SeverityHigh; floating tags (gpt-4, claude-3-opus) stay SeverityMedium. Pre-fix every category was Medium, which under- prioritized the genuinely-broken cases where the next API call WILL fail. - BASIC `'` comment-prefix tightened to `' ` (space-required) so it doesn't match Python single-quoted strings at column 0. ## aiNonDeterministicEval - Extension list adds `.toml` (docstring promised TOML support; map didn't include it). Promptfoo + DeepEval TOML configs now reach the detector. ## aiPromptVersioning - Placeholder-version rejection: `version: "TODO"`, `version: TBD`, `version: ???`, `version: placeholder`, `version: none`, `version: unknown` no longer satisfy the inline-version requirement. New `lineLooksLikePlaceholderVersion` post-check runs after the regex match. New parameterised regression test covers 6 placeholder shapes. ## aiCostRegression / aiRetrievalRegression - Confidence scales by paired-case count via shared `pairedConfidence` helper: 0.5 at paired=1, 0.7 at paired=5, 0.85 at paired=10, plateau at 0.9 from paired>=20. Pre-fix every regression fired at fixed 0.9 confidence regardless of evidence quality. - aiCostRegression severity ladder: catastrophic regressions (delta >= 1.0, i.e. >=2x cost) escalate to SeverityHigh under new clause `sev-high-008`. Merely-above-threshold stays SeverityMedium under `sev-medium-006`. ## Severity rubric - New clause `sev-high-008` ("Catastrophic cost regression") added to rubric.go and regenerated rule docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): Engine robustness — RequiresGraph hard-fail, sort tiebreak, panic-recovery parity ## detector_registry.go - RequiresGraph mismatch surfaces a detectorPanic-shaped diagnostic instead of silently dropping the registration. Pre-fix a detector declaring `RequiresGraph: true` whose runtime type didn't satisfy GraphDetector vanished from the pipeline with no signal, no log, no diagnostic — a configuration bug indistinguishable from a detector that simply found nothing. - Phase-1 results slice pre-allocated to len(registrations) to avoid the 0/1/2/4/8/16/32 reallocation chain under the mutex. - Phase-2 graphResults likewise pre-allocated. ## pipeline.go (signals) - RunDetectors (the simple non-registry entry point) now wraps each detector in safeDetect for parity with the registry path. Tests calling RunDetectors directly previously had no panic protection; a single broken detector would unwind the test goroutine. Added a detectorTypeName helper so the synthetic Meta.ID still names something useful in the recovered-from-panic diagnostic. ## models/sort.go - sortSignals adds Symbol as a tiebreaker after Line and switches to sort.SliceStable. Pre-fix two signals on the same (Category, Type, File, Line) but different Symbols sorted non-deterministically, so byte-identical snapshot output under SOURCE_DATE_EPOCH could break for snapshots that happened to have such ties. ## engine/artifacts.go - filepath.Rel error in the depth-limit branch now causes filepath.SkipDir instead of being silently treated as depth=0. A computation failure on a pathological symlink loop no longer slips past the depth gate. ## engine/pipeline.go - loadBaselineSnapshot stream-decodes via json.NewDecoder instead of os.ReadFile + json.Unmarshal. 100MB+ historical snapshots no longer spike RSS by the same amount; empty-file check via fi.Size() avoids the read at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): Eval adapter polish — Promptfoo time magnitude, Ragas modern shapes, DeepEval whitespace, envelope SourcePath rel ## Promptfoo - createdAt magnitude check: pre-fix `time.UnixMilli(raw.CreatedAt)` assumed millis, but some Promptfoo v4 CLI paths emit unix-seconds. A 10-digit second-epoch timestamp from 2026 silently decoded as 1970. Now: `< 1e12` treated as seconds, otherwise millis. ## DeepEval - runId fallback: newer DeepEval (1.x) writes `runId` instead of `testRunId`. When TestRunID is empty fall back to RunID. Without this, baseline matching fell into the framework-wide first-match fallback and could cross-attribute runs across multiple suites. - CreatedAt format flexibility: tries RFC3339 (newer), space- separated `2006-01-02 15:04:05` (older), and microsecond-fraction without timezone. Failures stay silent. - Metric-name whitespace normalisation: `Answer Relevancy` (with internal space) now normalises to `answer_relevancy` so it matches the retrievalScoreKeys / hallucinationGroundingKeys whitelists in the consumer detectors. Pre-fix human-readable metric names from the DeepEval CLI quietly mismatched. ## Ragas - Accept `evaluation_results` (modern Ragas ≥0.1.0) and `scores` (DataFrame export) shapes, in addition to legacy `results`. New `rowsForParsing()` helper merges all three. Pre-fix the modern shape was rejected with a misleading "ragas payload has no results" error — Ragas 0.2 users couldn't run the adapter at all. - isRagasQualityKey strips `eval_` prefix and normalises spaces, so `eval_faithfulness` and `eval context_relevance` (ragas-evaluate- helpers shapes) now match the quality-key whitelist. - ragasQualityKeys whitelist adds Ragas 0.2 modern metrics: context_utilization, noise_sensitivity, summarization, factual_correctness. ## Envelope SourcePath - New `relativeArtifactPath(root, p)` helper. EvalRunEnvelope SourcePath is now stored relative to the analysis root rather than as the raw CLI path. Pre-fix `--promptfoo-results /Users/alice/proj/...` produced absolute paths that leaked into SARIF output and downstream signal Location.File fields. New helper attempts filepath.Rel(root, p); falls back to original path on error or upward-traversal. - All three ingest helpers (`ingestPromptfooArtifacts`, `ingestDeepEvalArtifacts`, `ingestRagasArtifacts`) updated to take `root` and apply the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): Supply chain — concurrency, timeouts, COSIGN cleanup, post-release smoke, archive files ## CI workflows - All PR-triggered workflows (ci, codeql, terrain-pr, terrain-ai) gain a `concurrency` block that cancels in-progress runs on PR re-pushes. Pre-fix force-pushes piled up overlapping runs that all consumed runner minutes pointlessly. Main-branch CI runs leave cancel-in-progress=false so post-merge runs always complete. - All jobs gain `timeout-minutes` (15-45 depending on workload). Pre- fix a hung windows go-test could hold the matrix for 6h. - CodeQL: dropped `python` from the language matrix. The repo has only a few Python helper scripts (no production Python under analysis); CodeQL Python autobuild was burning ~5 min/week with no useful coverage. ## Release workflow - `concurrency: cancel-in-progress: false` is explicit. Once OIDC certs are issued mid-release, cancelling leaves orphan certs in the public Sigstore log; preventing concurrent cancellation protects the supply-chain story. - All release jobs (verify, go-release-build matrix, go-release- publish, npm-release) gain `timeout-minutes`. - COSIGN_EXPERIMENTAL=1 removed from the two cosign sign-blob steps. cosign 2.x makes keyless the default; the env var emits a deprecation notice when set. - New `release-smoke` job downloads the just-published linux/amd64 archive, extracts it, runs `terrain version --json`, and asserts the reported version matches the tag. Catches the class of "release published but archive contains a stale build" bugs that previously could only be caught after the fact by a user. ## Goreleaser - archives.files now ships LICENSE + README.md inside every archive. Pre-fix archives shipped only the binary; users had no in-tree license text — a soft compliance issue for OSS-policy review. ## Installer (bin/terrain-installer.js) - downloadFile redirect chain capped at MAX_REDIRECTS=5. Pre-fix recursion was unbounded — a misconfigured proxy redirect loop hung the installer until the OS killed it. Error message points users at TERRAIN_INSTALLER_BASE_URL as the documented escape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): Documentation gaps — issue templates, CoC, glossary, versioning, compatibility, integration guides ## New top-level files - CODE_OF_CONDUCT.md (Contributor Covenant 2.1, with reporting via private security advisory). Standard for OSS distribution; the agent reviews flagged its absence. ## New issue templates under .github/ISSUE_TEMPLATE/ - bug-report.md — structured reproduction, version output, debug log - false-positive.md — detector + code + why-it's-fine + opt-in to add as a calibration corpus regression fixture - feature-request.md — outcome-first, with optional implementation hint The pre-existing feedback.md stays as the catch-all. ## New user-facing docs - docs/glossary.md — single page for the Terrain-specific vocabulary (snapshot, signal, surface, detector, posture, severity clause, evidence strength, calibration corpus, eval run, baseline, manifest, severity rubric, docs-verify gate, surface ID, capability). Pre-fix the README and per-doc files used these terms liberally with no central definition. - docs/versioning.md — explicit semver policy: what counts as breaking, behaviour change, bug fix; pre-release identifiers; release cadence; compatibility windows table. - docs/compatibility.md — host platform tier table (Linux/macOS amd64+ arm64 + Windows amd64), build-time tools (Go 1.23, Node 22.x), test framework support tiers, AI eval framework version coverage. - docs/integrations/promptfoo.md, deepeval.md, ragas.md — per- framework wiring guides with schema versions accepted, baseline comparison setup, calibration fixture format, troubleshooting table. ## Internal docs disclaimer - docs/internal/README.md added so external readers landing in this directory know the contents are planning artifacts, not product truth, and aren't subject to the docs-verify gate. The agent reviews flagged 30+ internal files mixed into the public docs tree. ## README updates - Documentation section now links Glossary, Versioning Policy, Compatibility, Integrations directory, Code of Conduct. Reorders for first-time-reader flow (concepts → CLI → release info → community). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): relativeArtifactPath normalises separators to forward slash Windows CI failure regression from the prior eval-adapter polish commit (f85f488). `filepath.Rel` returns native separators — backslash on Windows — which produced `eval-runs\\promptfoo.json` SourcePaths that mismatched the forward-slash paths in calibration labels.yaml. Three calibration fixtures (eval-cost-regression, eval-hallucination-rate, eval-retrieval-regression) tripped the load-bearing gate. `filepath.ToSlash(...)` applied to every return path normalises to `/` consistently. Snapshot JSON, calibration labels, and SARIF all expect forward slashes; this aligns the SourcePath stamp with that convention. Verified: TestCalibration_CorpusRunner now passes; full `go test ./...` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(0.2): Public-facing documentation refresh for 0.2.0 tag Updates user-facing docs to reflect everything that actually shipped in 0.2.0 — including the post-initial-publish polish work that landed during the ship-blocker + final-polish reviews. ## CHANGELOG.md - 0.2.0 header gets the release date (2026-05-02). - Per-detector entries replace stale "Known limitation" callouts with the actual shipped behaviour (per-provider scoping for aiNonDeterministicEval, structural key-name + benign-object whitelist for aiToolWithoutSandbox, implicit path-based coverage for aiSafetyEvalMissing + aiFewShotContamination, multi-line concatenation + expanded user-input shapes for aiPromptInjectionRisk, severity-by-category for aiModelDeprecationRisk, paired-count confidence scaling for aiCostRegression + aiRetrievalRegression, placeholder-token rejection for aiPromptVersioning, env-var constructor patterns for aiEmbeddingModelChange). - Calibration corpus section reframed: gate is recall-only, precision floor slipped to 0.3 (corpus v2). Match-key precision improvement (Symbol added) and empty-corpus-bypass closure documented. - New "Polish (release-prep adversarial review fixes)" section bullets the cross-cutting fixes from the two adversarial-review passes: release infra, engine self-diagnostic (detectorPanic in catalog, RequiresGraph hard-fail), eval adapters (Promptfoo errors-bucket + per-case cost + time magnitude, DeepEval runId + metric-name normalisation, Ragas evaluation_results + scores shapes, envelope SourcePath rel), CLI (deprecation hints, --read-only enforcement, version JSON schemaVersion, exit 5 for not-found), determinism (sortSignals Symbol tiebreak), supply chain (concurrency, timeouts, CodeQL Python drop, COSIGN_EXPERIMENTAL cleanup), documentation (CODE_OF_CONDUCT, issue templates, glossary/versioning/compatibility, integration guides). ## docs/release/feature-status.md - 12 AI detector rows updated to describe the actual shipped behaviour, not the pre-fix state. Specifically: per-provider scoping, severity-by-category, structural-with-benign-objects, implicit-coverage, paired-count confidence, placeholder rejection, env-var constructor patterns. - Cosign npm-install row now reflects the mandatory-by-default posture (was "degrades to checksum-only"). Documents the two escapes (TERRAIN_INSTALLER_ALLOW_MISSING_COSIGN=1, TERRAIN_INSTALLER_SKIP_VERIFY=1) and the redirect cap. ## docs/cli-spec.md - New "Surface — canonical 11 + legacy aliases" section at the top documents the 0.2 namespace dispatchers and the TERRAIN_LEGACY_HINT=1 deprecation flow. Detailed per-command entries below stay valid since legacy aliases still work through 0.2.x. - `terrain version` entry mentions the new schemaVersion field in --json output (CI tools can pin the snapshot contract). - `terrain serve --read-only` description updated: was "no-op in 0.1.2", now reflects actual HTTP 405 method enforcement that shipped in 0.2.0. ## docs/telemetry.md - Example version field updated 0.1.0 → 0.2.0. `make docs-verify` passes; full `go test ./...` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(0.2): Repo-wide American English sweep The repo's existing convention is American English (240 AmE markers vs 21 BrE pre-sweep). Recent docs and Go-source comments drifted to British spellings during the polish review cycles; this commit brings everything back to convention. Mechanical replacements applied uniformly across `*.go`, `*.md`, `*.js`, `*.yml`, `*.yaml` (excluding `node_modules/`, `benchmarks/repos/`, `.git/`, `.claude/`, `tests/calibration/`, `vendor/`): - behaviour → behavior (incl. plurals + capitalization) - normalise / normalised / normalising / normalisation → normalize / normalized / normalizing / normalization - categorise / recognise / synthesise / summarise / prioritise / utilise / optimise / favour / finalise → -ize / -or / -ze variants - labelled / labelling / modelled / modelling → labeled / labeling / modeled / modeling - cancelled → canceled - centre → center - analyse → analyze (verb form; the -yze variant is the AmE convention and was already used as such elsewhere) 64 files changed, 132 insertions / 132 deletions. Pure word-level substitutions; no semantics affected. `make docs-gen` regenerated manifest.json + severity-rubric.md + rule doc stubs to track the Go-source comment changes. Verified: `go build ./...` clean, `go test ./...` all pass, `make docs-verify` zero-diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): AI validation gate is now impact-scoped The terrain-ai.yml workflow was reporting whole-repo AI signals as "blocking" for every PR, including doc-only ones. Both PR #128 (code-changing) and PR #129 (docs-only) generated 29 / 68 blocking signals respectively — most referencing files the PR never touched (calibration-corpus fixtures, the detector's own source, etc.). Root cause in `internal/changescope/analyze.go`: the AI signals collection loop iterated EVERY signal of category AI in the snapshot, regardless of whether `sig.Location.File` was in the PR's changed-files set. Compare to the impacted-scenarios loop right above it (line 250), which IS impact-scoped via `result.ImpactedScenarios`. The asymmetry meant the gate was useless: the calibration corpus contains intentionally-bad fixtures designed to trigger the detectors (that's their job — they're regression tests), and those fixtures showed up as "merge blockers" on every PR. Fix: hoist the `changedPaths` set construction above the signals loop and filter by `sig.Location.File`. Signals without a Location.File (whole-repo emergent signals) are also dropped — they belong in `terrain analyze`, not `terrain pr`. The pre-existing `TestBuildAIValidationSummary_WithSignals` test was constructed around the old (broken) behavior with no Location.File set on the fixture signals; rewritten to assert the new contract: - Critical AI signal on a changed file → BlockingSignals - Medium AI signal on a changed file → WarningSignals - High AI signal on an UNCHANGED file → dropped - Quality signal → dropped (category filter, unchanged) New regression test `TestBuildAIValidationSummary_DropsSignalsOnUnchangedFiles` locks in the doc-only-PR case: a PR that only touches docs/* and CHANGELOG.md should produce zero blocking signals from calibration-corpus fixtures elsewhere in the tree. Verified: `go test ./...` green; the new asymmetry between "whole-repo signals via terrain analyze" and "PR-introduced signals via terrain pr" matches the design contract for the two commands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): Rewrite AI Validation PR comment for actual humans The PR-comment format for `terrain pr --format markdown` had three problems that made the AI Validation section unhelpful: 1. Detector taxonomy as the headline. Bullets started with bold `**aiPromptInjectionRisk**:` followed by the raw detector explanation. PR authors had to know the rubric to interpret it. 2. No file paths or line numbers. The Recommended Tests section right above already shows file paths (and they're clickable on GitHub); the AI Validation section dropped them. 3. No deduplication. 12 prompt-injection hits across 4 files produced 12 identical-looking bullets. The signal was lost in the noise. Fix: - AISignalSummary gains File/Line/Symbol fields (model.go); populated in analyze.go from sig.Location. - New `internal/changescope/ai_signal_humanize.go` carries: - `humanSummary[type]` — one-sentence plain-language description per detector type (no taxonomy) - `humanAction[type]` — one-sentence concrete next step - `groupSignalsByFileAndType()` — collapses N (file, type) duplicates into one entry whose Lines slice carries every distinct line number - `renderGroupedSignal()` — outputs `**\`path:42, 47, 51\`** — <plain summary>` followed by `→ <action>` - render.go's markdown path now calls the grouped renderer for Blocking + Warning sections. Section header becomes "**N new finding(s) introduced by this PR**" instead of "Blocking signals (N)" so the framing matches what `terrain pr` is actually about (PR-introduced findings, not whole-repo state). - The text renderer (used when --format isn't markdown) gets the same grouped output. Test updates: - TestRenderPRSummaryMarkdown_AISection rewritten for the new contract: locator presence, plain-language summary presence, action arrow, line-grouping for duplicates, symbol-keyed locator for tool findings, NO bare detector taxonomy in the bold headline. Sample new output (12-line repetition collapses to 4 bullets): ### AI Validation Scenarios: 2 of 12 selected **4 new finding(s) introduced by this PR:** - **`evals/promptfoo.yaml:7`** — API key embedded in source or config — should be in env / secret store. → Move the secret to an env var (or your secrets manager). - **`agents/tools.yaml (delete_user)`** — Destructive tool can run without an approval gate, sandbox, or dry-run mode. → Add `requires_approval: true`, route through a sandbox. - **`src/auth/login.ts:42, 47, 51`** — User input flows into a prompt without visible escaping or boundary tokens. → Wrap user input through a sanitizer, or use a prompt template with explicit user-content boundaries. - **`src/chat/handler.ts:18`** — User input flows into a prompt ... [same summary, separate file] Combined with the impact-scope filter from the previous commit, the AI Validation section now does what it should: PR-introduced findings only, grouped, with file paths and plain-language explanations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): Aesthetic polish for terrain pr markdown comment The PR comment now reads as a sequence of discrete cards rather than a wall of bullets. Concrete changes: ## Header - H2 + blockquote subtitle ("> High-severity gaps found in changed code.") instead of `## ... — verdict` + `*italic line*`. Blockquote renders as a callout on GitHub (left rule + soft background) which visually separates the "why" from the "what." ## Metrics table - Compact 2-column layout with bold left-column labels: | Metric | Value | |---|---| | **Changed files** | 7 (5 source · 2 test) | | **Impacted units** | 12 | Empty rows skipped (`Impacted units: 0` doesn't render). Middle-dot separator (·) replaces commas in compound stats so cells read cleanly. ## Sentence-case headings + horizontal rules - "New Risks (directly changed)" → "Coverage gaps in changed code" - "Recommended Tests" → "Recommended tests" - "Impacted capabilities:" + "Scenarios: 2 of 12 selected" inline → blockquote with both: `> **Capabilities:** ... · **Scenarios:** ...` - Every major section is preceded by a `---` horizontal rule, giving the comment visual rhythm. Each section reads as its own card. ## Finding card shape (parallel across all sections) - Coverage gaps and AI signals now share the same bullet shape: - **`path/to/file.ts`** [HIGH] — <plain-language description> → <suggested action> Pre-fix coverage gaps and AI signals were rendered by two different paths producing different shapes (one used severity prefix, one used taxonomy prefix); the inconsistency was visible in the same comment. Now both go through `renderFindingCard` / `renderGroupedSignal` with parallel structure. ## Pluralization - New `pluralize(n, singular, plural)` helper replaces the awkward "finding(s)" / "issue(s)" / "gap(s)" notation in user-visible headers. "1 advisory finding", "2 advisory findings", "1 indirectly impacted protection gap", "3 pre-existing issues." ## Footer - Owners + branding + limitations consolidated into a small-text footer using `<sub>` tags, separated from main content by a final `---`. Pre-fix these were scattered mid-comment. ## Test updates - TestRenderPRSummaryMarkdown_DirectVsIndirectSections updated for new heading + card shape + pluralization. - TestRenderPRSummaryMarkdown_FindingTruncation: italicized "_...and N more_" overflow message. - TestRenderPRSummaryMarkdown_Deterministic + AISection + MixedTraditionalAndAI: matched against new sentence-case headings. Sample new comment shape: ## [WARN] Terrain — Merge with caution > High-severity gaps found in changed code. | Metric | Value | |---|---| | **Changed files** | 7 (5 source · 2 test) | | **Impacted units** | 12 | --- ### Coverage gaps in changed code - **`src/auth/login.ts`** [HIGH] — Exported function authenticate has no observed test coverage. --- ### AI Validation > **Capabilities:** customer-support-bot · **Scenarios:** 2 of 12 > selected **2 new findings introduced by this PR:** - **`src/auth/login.ts:42, 47, 51`** — User input flows into a prompt without visible escaping or boundary tokens. → Wrap user input through a sanitizer. --- <sub>Generated by [Terrain](...) · `terrain pr --json` for machine-readable output</sub> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ement totals (#130) * fix(0.2): Drop spurious 'file:' prefix in `terrain insights` output CLI visual audit (running each canonical command and reading the output) caught one real bug in `terrain insights`: Recommended Actions ------------------------------------------------------------ 7. [coverage] Add tests for 17 uncovered source files files: file:bin/terrain-installer.js, file:extension/vscode/src/extension.ts +2 more The `file:` prefix is the dependency-graph node-ID prefix from `internal/depgraph/build.go` — `SourceCoverage.SourceID` carries it because it's the graph node identifier. The renderer expects bare file paths, but `internal/insights/insights.go:1189` was appending `src.SourceID` to `rec.TargetFiles` when it should have used `src.Path` (which the SourceCoverage struct documents as "File path (without 'file:' prefix)"). The leak was visible to every user running `terrain insights` against a repo with uncovered sources. Fix: switch `src.SourceID` → `src.Path`. Output now reads `bin/terrain-installer.js` instead of `file:bin/terrain-installer.js`. Other commands audited and confirmed clean: - `terrain analyze` — headline + Key Findings + What to do next - `terrain summary` — exec summary with posture + drivers - `terrain doctor` — minimal pass/warn/fail check list - `terrain ai list` — table inventory - `terrain explain selection` — strategy + reason breakdown - `terrain pr` (text format) — clean Known minor cosmetic issue (NOT fixed here, not a tag blocker): `Recommended Actions` items have inconsistent template fields — some have why/impact/effort/files/run, others have only why. Hits when a finding's category-builder doesn't populate the structured fields (e.g., the meta-summary "491 high-severity signals detected"). Worth a follow-up but doesn't affect correctness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): CLI aesthetic polish — proper pluralization + title-case posture labels Two cross-cutting visual issues across the CLI output that the manual audit caught: ## 1. Drop `n thing(s)` notation everywhere Rendered output throughout the CLI used the parenthetical-`s` pattern as a stand-in for proper pluralization: 890 high-fanout fixture(s) — changes trigger wide test impact 3 more finding(s) available — run `terrain insights`... 2 untested source file(s) — start with src/auth.js Investigate 5 test(s) with concentrated instability Reads like a tool's escape hatch, not a sentence. Fix: introduce a small `plural(n, base) → "base" | "bases"` helper in each renderer package (analyze, insights, summary, internal/reporting). 19 sites updated. The reporting helper is exported as `Plural()` for cross-renderer use; the leaf-package helpers stay private to avoid coupling. After: 890 high-fanout fixtures — changes trigger wide test impact 1 more finding available — run `terrain insights`... 2 untested source files — start with src/auth.js Investigate 5 tests with concentrated instability (and singular-form versions render correctly when n == 1) ## 2. Title-case posture labels in `terrain summary` Pre-fix output dumped snake_case dimension keys directly: health: strong coverage_depth: strong coverage_diversity: moderate structural_risk: strong operational_risk: strong The keys are stable storage form (preserved in JSON for API roundtrip); the display form should be human-readable. Fix: `titleCaseDimension()` and `titleCaseBand()` helpers in `internal/reporting/executive_report.go`. Snake-case underscores become spaces; first letter capitalized; bands ("strong", "moderate", etc.) get the same first-letter cap. After: Health: Strong Coverage depth: Strong Coverage diversity: Moderate Structural risk: Strong Operational risk: Strong JSON output is unchanged — the snake_case keys still round-trip; only the human-readable text renderer normalizes them. ## Test fixups - Two `analyze_report_v2_test.go` assertions were checking for the literal `(s)` notation; updated to check for the natural pluralized form ("3 more findings available", "5 findings"). - Insights goldens regenerated via `-update-golden` (new title text). - `cmd/terrain` snapshot golden likewise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): catch the remaining (s)/(ies) leaks I missed Audit pass after the first sweep found: - internal/insights/insights.go: 'AI surface(s)', 'capability(ies)', 'cluster(s)' - internal/reporting/analyze_report_v2.go: 'cluster(s)' (3 sites) - internal/reporting/insights_report_v2.go: 'cluster(s)' Same fix pattern: use plural() / Plural() helper or inline if/else for the irregular 'capability/capabilities' case. Goldens regenerated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): polarity-aware bands + concrete measurements in summary posture line Three layered improvements to `terrain summary` Overall Posture: 1. Polarity-aware band rendering. Previously "Structural risk: Strong" read as "high risk" on natural-English interpretation, inverting the band's meaning for the two risk dimensions. Bands now translate per dimension: Health/Coverage stay Strong/Moderate/Weak; risk dimensions render Low/Moderate/Significant/Elevated/Critical. 2. Sentence-case dimension labels. "Coverage Depth" → "Coverage depth" reads naturally inline; reserve title-case for headings. 3. Concrete measurement totals. The line now shows what drove the band — "28 / 772 skipped" instead of an opaque "3.6% skipped" or the original bare "Strong". Counts come from the measurement's own explanation ("28 of 772 test files contain skipped tests"). Zero measurements are dropped so the line shows what changed, not a wall of "0.0% flaky · 0.0% dead · 0.0% slow". Goldens regenerated for the dimension-name + explanation changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): close pre-existing CLI test gaps + CHANGELOG entry for #130 The 4 CLI help-text tests in internal/testdata/cli_test.go were written for the pre-restructure CLI shape and started failing when the canonical 11-command surface landed: TestCLI_HelpContainsCanonicalWorkflow expected `terrain insights` bare-form; canonical layout uses `terrain report insights`. TestCLI_HelpContainsAllCommands expected `ai list`, `ai run`, etc. as substrings; canonical layout shows `ai <verb>` with verbs listed inline. TestCLI_HelpContainsDebugNamespace expected `debug graph`, etc. as substrings; debug had no inline verb list at all. TestCLI_HelpContainsPrimaryCommands expected a "Primary commands:" header and journey questions for impact / insights / explain; canonical layout uses "Canonical commands" and only `analyze` retains a journey question (the other three moved under report). Test expectations updated to match the canonical layout. Also added the missing debug verb list to the top-level help for consistency with report / migrate / ai / config (where each namespace shows its verbs inline). Separately, TestCLI_ExportBenchmarkAcceptsJSONFlag was failing because `export benchmark` didn't accept `--json`. The command always emits JSON, so `--json` is now accepted as a no-op for flag parity. CHANGELOG.md gains a "CLI visual polish (PR #130)" line under 0.2.0's Polish section so the tag captures the full scope of this branch's work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A senior-engineer launch-readiness review (~65 items, 2026-05-02) flagged that the codebase is more honest than its public surface — the docs, README, CHANGELOG, and PR-comment workflow all overstated what 0.2 actually delivers. This PR closes the user-visible contradictions before the v0.2.0 tag is cut so an evaluator who reads README + CHANGELOG + quickstart sees the same product the binary delivers. What this PR is NOT: not the suppression model, not the command registry, not the SARIF audit. Those are 0.2.1 / 0.2.2 work tracked in `/Users/pzachary/.claude/plans/kind-mapping-turing.md`. This PR is the truth-up only. CHANGELOG truth-up: * Headline calibration claim: "100% precision/recall" → "100% recall on a 27-fixture corpus; precision floors against a labeled-repo corpus deferred to 0.3." Closes the most visible internal contradiction (the same file disclaims it later). * Per-detector calibration paragraph: same nuance applied. * Cosign installer block: rewrote to match actual code behavior. The previous text said the installer "degrades to checksum-only" when cosign isn't on PATH; bin/terrain-installer.js:153–177 actually hard-fails by default and accepts two opt-out env vars (TERRAIN_INSTALLER_ALLOW_MISSING_COSIGN=1, TERRAIN_INSTALLER_SKIP_VERIFY=1). The bin/postinstall.js wrapper swallows the failure and prints a warning; that UX gap is now explicitly tracked for 0.2.1. * New "What's stable in 0.2" section: stable / experimental / planned, sourced from feature-status.md so users see the honest scoping next to the headline. "AI Validation" → "AI Risk Review" rename: * .github/workflows/terrain-ai.yml workflow name + PR-comment template + decision strings + comment-marker HTML id * docs/cli-spec.md gate description (with detector-shape caveats) * internal/changescope/render.go markdown + plain-text section headers, plus tests that asserted the old strings * model.go / analyze.go / airun/artifact.go / policy/config.go code comments * CHANGELOG references and docs/product/terrain-overview.md, docs/integrations/gauntlet.md, docs/architecture/19-ai-scenario-and-eval-model.md * Internal scorecards under docs/internal/product/ left as-is (historical artifacts, not user-visible) * Avoid "AI validation" / "AI safety" / "AI security" until 0.3 ships precision metrics + AST taint flow + sandboxing. README + quickstart language tightening: * Headline: "30 seconds" promise now qualified inline rather than only at line 61. * "Reads your repository — test code, source structure, coverage data, runtime artifacts, ownership files, and local policy" now says "when available" / "best effort" so coverage/runtime artifact dependence is honest. * "No SaaS / no telemetry" claim now distinguishes runtime ("does not phone home") from installation ("downloads signed binaries from GitHub Releases"). * Quickstart line 150 ("Terrain gives AI surfaces the same CI treatment as regular tests") rewritten to honest scope. * New "What 0.2 Is and Isn't" section in README between "What Terrain Is Not" and "Who Uses Terrain". Stable / experimental / planned summary + cross-link to feature-status. * Supporting commands table: portfolio gets an explicit "(canonical, experimental)" qualifier so it isn't both "supporting" and "in the canonical 11" without flagging the multi-repo work as experimental. Doc drift fixes: * docs/telemetry.md: bare `terrain telemetry` → canonical `terrain config telemetry`; legacy alias documented. * docs/compatibility.md Tier-1 description: "full CI matrix" → "binary target + extended gates on Linux only, unit-test parity on macOS/Windows" per actual ci.yml. * docs/compatibility.md: dropped the conversion-direction A-grade rating reference (deferred to 0.3 per CHANGELOG). * docs/user-guides/getting-started.md: new Prerequisites section documenting cosign install (brew/apt/scoop) and the two opt-out env vars. Server stale comments + scope honesty: * cmd/terrain/cmd_serve.go: 0.1.2 reference removed; flag docstring explicit about no-auth + localhost-only + not-production-ready. * internal/server/server.go: security posture comment block rewritten for 0.2.0; documents the request-context-not-wired and mutex-blocking-analysis bugs as known issues tracked for 0.2.1 rather than silently shipping. British spelling sweep: * sanitiser → sanitizer, sanitised → sanitized, sanitisation → sanitization across internal/aidetect, internal/signals, docs/rules/ai. Manifest regenerated via cmd/terrain-docs-gen. * docs/personas/frontend.md: behavioural → behavioral. Verification: * `go build ./...` clean * `go test ./...` and `go test ./internal/testdata/` green * `make docs-verify` green * grep for "AI [Vv]alidation\b", "sanitiser|sanitised|behaviour", and "0.1.2" all return zero hits in user-facing paths Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): parity rubric + baseline scores (Track 0.1) Encodes the 12-area × 17-axis maturity rubric as data so the parity dashboard (Track 0.2) can consume it. Two files: docs/release/parity/rubric.yaml — structural source of truth - 3 pillars (understand/align/gate) + cross-cutting distribution - 12 functional areas with id / name / pillar / tier / surface - 17 axes (7 product / 7 engineering / 3 UI-visual) with anchored level definitions for 1, 3, 5 - Per-pillar floor requirements: gate ≥ 4, understand ≥ 3, align ≥ 3 (soft) - 7 cross-cutting uniformity gates (detector_shape, framework_depth, command_shape, doc_scaffold, renderer_tokens, empty_state_coverage, voice_and_tone) — these catch unevenness across detectors, frameworks, commands, and outputs docs/release/parity/scores.yaml — current per-cell scores - 12 × 17 = 204 cells, each scored 1–5 with one-line evidence - Snapshot taken against `main` at 8545f03 (post PR #131) - Reflects the state captured in `docs/release/0.2.x-maturity-audit.md` The baseline scores are honest about the floor: every pillar currently has at least one cell at score 2, which is exactly what the launch-readiness review flagged as "uneven product." The 0.2.0 work (Tracks 1–10) lifts those cells to clear the gate — Gate to ≥ 4, Understand to ≥ 3, Align to ≥ 3 soft. The V-axes (V1 visual consistency / V2 information rhythm / V3 fun-to-use polish) are baselined as the lowest-scoring axis cluster across the codebase, which matches reality: most user-visible output uses ad-hoc styling; Track 10 (Visual & design system) lifts these. Track 0.2 (parity-gate Go binary) reads both files and emits the matrix + floor map. Track 0.4 wires `make pillar-parity` into CI as a hard gate that rejects PRs that drop a cell below its pillar floor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): parity-gate binary + make target + CONTRIBUTING (Tracks 0.2/0.3/0.6) Three Track 0 deliverables in one PR: Track 0.2 — `cmd/terrain-parity-gate/main.go` Reads rubric.yaml + scores.yaml (defaults to docs/release/parity/), validates structure (every cell scored, scores in [1,5], pillar references valid), computes per-area floors + per-pillar verdicts, emits human-readable matrix or JSON, exits non-zero when any hard-gate pillar is below its floor. Soft gates (Align in 0.2.0) print WARN but do not fail. Three output modes: default matrix, --json, --floor-map (compact). Exit codes: 0 PASS, 1 hard-gate FAIL, 2 usage error. 12 unit tests cover validation rejections, floor computation, soft-gate WARN semantics, mixed hard/soft pillar behavior, and a real-rubric load test that catches drift between YAML and Go types. Track 0.3 — `make pillar-parity` (+ `pillar-parity-floor`, `pillar-parity-json` variants) wired through Makefile. Same posture as `make docs-verify` — anyone can run it locally before opening a PR. Track 0.6 — CONTRIBUTING.md "Parity gate" section. Documents: - per-pillar floors and which are hard / soft - the source-of-truth split (rubric.yaml = structure, scores.yaml = per-cell numbers, audit doc = prose companion) - how a parity-lift PR updates a cell (one-line evidence + score change + audit doc narrative if relevant) - the seven uniformity gates (advisory in 0.2.0, hard in 0.2.x) Current baseline output: Pillar verdict understand floor=2 / required=3 FAIL weakest=core_analyze/V1 align floor=2 / required=3 WARN (soft) gate floor=2 / required=4 FAIL weakest=pr_change_scoped/E2 Overall: FAIL This is the honest starting point. Tracks 1-10 lift cells until every hard-gate pillar passes; that's the 0.2.0 release gate. Track 0.4 (CI hard gate) intentionally not included in this PR. Wiring parity-gate into CI today would fail every PR until enough cells are lifted; the gate flips to mandatory after the parity-lifting tracks land enough to make it useful. For now, anyone can run `make pillar-parity` locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

) Foundational deliverable for Track 10 (Visual & design system). Every user-visible renderer in 0.2.0 — terminal output, HTML report, PR-comment markdown, SARIF tags — consumes from this package. Ad-hoc styling outside `internal/uitokens/` becomes a parity-gate violation on the V1 (visual consistency) axis; Track 10.2 migrates existing renderers to the tokens. What's in the package: Color tokens Six semantic colors (muted / accent / ok / warn / alert / bold). Names describe ROLES not specific shades, so the underlying ANSI codes can change without rewriting callers. Wrappers (Muted, Ok, Warn, Alert, Bold) emit only when ColorEnabled is true; empty- input wraps are no-ops to avoid stray escape sequences. TTY / NO_COLOR detection ColorEnabled initialized once via stdout TTY check + NO_COLOR env var (https://no-color.org/) + TERM=dumb fallback. Pipes / file redirects automatically suppress color. Tests can flip the flag to assert plain-text output. Symbol vocabulary SymOK / SymFail / SymWarn / SymInfo / SymArrow / SymBullet / SymDash / SymDot / SymRule / SymSubrule. One vocabulary used across every command — the V2 (information rhythm) axis depends on this consistency. Severity model + badges SeverityCritical / High / Medium / Low / Info ladder with SeverityBadge() rendering. CRITICAL and HIGH bold so blocking findings stand out at a glance. Verdict badges VerdictBadge("PASS"|"WARN"|"FAIL") returns the canonical glyph + label combo used by the parity-gate matrix, AI risk review hero block, and policy check. Spacing & rules SectionWidth = 60 — every renderer uses this width for section rules so headings line up across commands. Heading() / Subheading() return ready-to-print two-line blocks. ASCII bar renderer BarChar / BarEmpty constants; Bar() with auto-coloring by proportion (≥80% alert, ≥40% warn, < muted); BarPlain() for callers that want inverse-polarity coloring (e.g. coverage, where high is good). Text helpers Truncate / PadRight / PadLeft — rune-aware, unicode-safe. Used by every table layout in the codebase once Track 10.2 migrates. Tests (15 total, ~280 lines): - Color wrappers respect ColorEnabled in both directions - Color wrappers no-op on empty strings (avoid escape-sequence noise around "") - Severity badges produce the right labels - Severity badges bold ≥ HIGH - Verdict badges canonicalize case + whitespace - Bar rendering covers full / empty / half / overflow / negative / zero-max / zero-width - Bar coloring threshold transitions at 0.4 and 0.8 - Truncate / PadRight / PadLeft handle unicode correctly - Rule and SubRule render at SectionWidth - Heading is two lines (title + rule) - Color composition (Bold(Alert(...))) preserves both escapes Zero dependencies. Stateless. Tested at 100% coverage of public API. Track 10.2 (renderer audit + migration) is the follow-on PR that moves existing internal/reporting/* code paths to consume from here. A vet rule that flags raw ANSI codes outside this package is also on the Track 10.2 list. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…G anchored to pitch (Track 1) (#137) * docs(0.2): commit product vision (Track 1.1) Durable north-star doc capturing the unified Terrain pitch we converged on through the launch-readiness review threads. The headline verbatim: Terrain is the control plane for your test system. It maps how your unit, integration, e2e, and AI tests actually relate to your code — and lets you gate changes based on that system as a whole. See what's covered, what's missing, and what's overlapping. See which tests matter for a PR — and why. Bring AI evals into the same review pipeline as the rest of your tests. What's in the doc: * The user's actual job (six questions; today no single tool answers more than two) * "What Terrain is" — the control-plane framing in two phrases * The three pillars (Understand / Align / Gate) with internal job + external framing + anchored capabilities * The unifying thread (CI gate primitives shared across pillars) * What's distinctive vs. Jest / pytest / SonarQube / Promptfoo / Bazel / GitHub code scanning / AI safety tools * What Terrain explicitly isn't * Anti-goals for 0.2.x (no safe-skip guarantee; we don't run tests; we don't judge model truthfulness; no public precision floor yet) * Trajectory: 0.2.0 = "see clearly + gate progressively"; 0.3 = "take control"; 0.4 = "test the universe" (AI-aware integration/e2e under the control plane) * Capability → pillar → tier map covering every shipping command * Primary workflow (terrain analyze && terrain report pr) as the canonical entry point * Doc-evolution rules (what stays stable, what updates per release) This is the source of truth that the 0.2.0 README rewrite (Track 1.2), quickstart anchor (Track 1.4), and feature-status pillar columns (Track 1.5) point at. When the README and this doc disagree, this doc wins until the README is updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(0.2): narrative + tiering anchored to product vision (Tracks 1.2/1.3/1.4/1.5/1.8) Bundles the doc-only Track 1 work into one cohesive PR following the "control plane for your test system" pitch as the verbatim anchor. Track 1.6 (per-pillar reproducible examples) and Track 1.7 (per- detector "known false positives" sections) are larger and follow as separate PRs. Track 1.1 — `docs/product/vision.md` (committed earlier in this branch) Durable north-star doc with the verbatim pitch, three pillars, capability map, anti-goals, and 0.2.0 → 0.3 → 0.4 trajectory. Track 1.2 + 1.3 — `README.md` headline rewrite + capability list Lead with the pitch verbatim. "Map your test terrain" demoted to the secondary tagline. Primary workflow (terrain analyze && terrain report pr) shown above the install snippet so the entry point is unmistakable. "What 0.2 Is and Isn't" rewritten around pillars + tiers (Understand / Align / Gate × Tier 1 / 2 / 3) instead of a flat stable/experimental list. Track 1.4 — `docs/quickstart.md` anchored to first-user gate insights Walkthrough restructured around the three insight types from the first-user success gate: 1. Coverage gap with explanation (`terrain analyze`) 2. PR risk explanation (`terrain report pr --base main`) 3. Test-selection explanation (`terrain report impact --explain-selection`) Each step is ~90 seconds; total walkthrough ≤ 5 min. The safe-skip caveat is called out explicitly per the 0.2.x anti-goal. Track 1.5 — `docs/release/feature-status.md` Pillar + Tier columns Every shipping capability now has Pillar (Understand / Align / Gate / cross-cutting) and Tier (1 / 2 / 3) tags. Tier-1 means publicly claimable in 0.2.0 marketing; Tier-2 ships but stays experimental; Tier-3 is in development with no public claim. Workflows table extended to include `terrain explain finding <id>` and `terrain suppress <id>` (Tracks 4.6 / 4.7). Track 1.8 — `CHANGELOG.md` 0.2.0 entry restructure Headline replaced with the pitch verbatim as the section opener. New paragraph explains parity-gate framing (Gate ≥ 4, Understand ≥ 3, Align ≥ 3 soft) and points at the audit doc as source of truth. The release groups deliverables by pillar instead of by feature category. Plan link: `/Users/pzachary/.claude/plans/kind-mapping-turing.md` (Tracks 1.1, 1.2, 1.3, 1.4, 1.5, 1.8). Verification: `make docs-verify` green; `go test ./...` green; manual read of README + quickstart confirms continuity with the pitch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the silent-install path flagged in the launch-readiness review. Pre-fix: $ npm install -g mapterrain > postinstall fails (cosign missing) > [warn] one-line stderr message > exit 0 $ terrain analyze > silently retries the same fetch > fails with the same error > user is confused about why "install" reported success Two acceptable resolutions were on the table; the plan chose Option B (lazy-with-loud-signal) over Option A (hard-fail npm install). Hard-failing every cosign-missing host would be more disruptive than the failure mode itself; CI pipelines that wrap `npm install` would break for legitimate reasons. The right user experience is: install "succeeds" with a loud, framed warning, and the *first* `terrain` invocation refuses to retry silently — printing the original error verbatim with concrete remediation. Implementation: * `bin/postinstall.js` writes ~/.terrain/install-failure.log with the captured error (timestamp, platform, version, message, stack) when ensureTerrainBinary throws. The stderr warning is upgraded from one line to a multi-line framed banner so it's hard to miss in a CI log. * `bin/terrain-installer.js` exports `writeInstallFailureMarker` and `clearInstallFailureMarker`. `runTerrainCli` (the trampoline for `terrain ...`) consults the marker before retrying the fetch; if a marker exists AND no installed binary is present, it throws the recorded error with a structured remediation block instead of attempting another silent retry. * On a successful install or successful first run, the marker is cleared so future invocations don't see stale state. Tests added (`scripts/test-installer-marker.mjs`, run via `node --test`): * writeInstallFailureMarker captures the error fields * clearInstallFailureMarker removes the marker * clearInstallFailureMarker is idempotent Wired into `npm test` as a `test:unit` step so it runs ahead of `scripts/verify-pack.js` in the existing release-verify chain. CHANGELOG note in PR #131 already calls this out as 0.2.1 work; a follow-up PR can promote the entry from "known issue" to "fixed". Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2.1): add --fail-on and --timeout flags to terrain analyze CI integration was the launch-readiness review's top adoption gap for the platform/SRE audience. Two flags close most of it: --fail-on critical|high|medium Exit with the new severity-gate code (6) when at least one finding at or above the requested severity is present. Standard CI integration pattern. The report still renders first; the gate decision is the last thing that happens. --timeout <duration> Abort the analysis after the duration elapses (e.g. 5m, 30s). Wraps the existing SIGINT-aware context with a deadline so a runaway monorepo scan doesn't block CI indefinitely. Exit-code conventions extended: 6 — Severity gate block. Returned by analyze --fail-on. Same pattern as the existing AI-gate code 4. CI scripts can branch on "the analysis succeeded but the gate blocked us" without parsing stderr text. Code 3 stays reserved for the planned policy-vs-usage split. Implementation: * cmd/terrain/cmd_severity_gate.go — parseSeverityGate + severityGateBlocked + errSeverityGateBlocked sentinel. Pure function over analyze.SignalBreakdown; no engine coupling. * cmd/terrain/cmd_pipeline_helpers.go — new runPipelineWithSignalsAndTimeout(...). The original runPipelineWithSignals is preserved as a 0-timeout shim so the other six callsites (impact / explain / pr / ai *) keep their contract; they can adopt the timeout flag in follow-up PRs without churning this one. * cmd/terrain/main.go — flag wiring; errors.Is(err, errSeverityGateBlocked) routes the gate exit code without confusing it for an analysis crash. * cmd/terrain/cmd_analyze.go — gate evaluated last so the report is always rendered before the exit decision. Tests: * TestParseSeverityGate covers canonical / case-insensitive / whitespace / invalid inputs * TestSeverityGateBlocked covers the threshold cascade (critical-only, critical+high, critical+high+medium) and confirms gateNone never blocks Manual smoke (against the Terrain repo itself): $ terrain analyze --fail-on critical → exit 0 (no critical) $ terrain analyze --fail-on medium → exit 6 (matched "0 critical + 493 high + 169 medium finding(s)") $ terrain analyze --fail-on bogus → exit 2, usage error The remaining gate flag from the plan (--new-findings-only --baseline) is a 0.2.x follow-up; it needs the suppressions/finding-IDs work to ship clean against renamed files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): close PR #134 review gaps + uniform gate across output formats The launch-readiness review flagged six concrete gaps in PR #134 that block defensible merge. This commit closes all six and fixes one real bug uncovered while writing the e2e tests. Bug fix: the JSON output branch (and SARIF / annotation / HTML branches) early-returned from runAnalyze before reaching the --fail-on gate check. So `terrain analyze --json --fail-on=medium` silently exited 0 even with matching findings — the gate worked in text mode only. The TestRunAnalyze_JSONStdoutPurity test caught this: it expected a gate-blocked error but got nil. The fix factors the gate decision into a closure (gateErr) computed before any rendering branch, then each branch returns gateErr() after its renderer completes. Text, JSON, SARIF, annotation, and HTML output all gate uniformly now. Review gaps closed: Pluralization cmd_severity_gate.go: replaced "finding(s)" with proper plural via plural() helper, mirroring the 0.2 polish work in internal/reporting/plural.go. Negative-timeout validation cmd/terrain/main.go: --timeout < 0 now exits with usage error (code 2) and a clear message, rather than the silent immediate-DeadlineExceeded that read like an analysis crash. Stale "0.1.2 contract" comments cmd/terrain/main.go: exit-code conventions block rewritten as pre-0.1.2 / additive 4+ semantics; "0.2 will move policy violations" deferred phrasing replaced with concrete 0.2.x → 0.3 milestone reference. E2E test for exit code 6 + report-renders-before-exit invariant TestRunAnalyze_GateBlocksOnFixture: runs runAnalyze against the calibration corpus with --fail-on=medium, asserts: 1. Returns errSeverityGateBlocked (so main.go maps to exit 6) 2. Error message contains the --fail-on label + counts 3. stdout is non-empty (report rendered before gate fired) 4. stdout contains the report header JSON stdout purity test TestRunAnalyze_JSONStdoutPurity: with --json + --fail-on matching, the entire stdout body parses as JSON. Verifies the gate message goes to the error channel (stderr via main.go), not into the JSON document. This is the test that uncovered the early-return bug above. Gate-passes inverse test TestRunAnalyze_GatePassesWhenSeverityAbsent: --fail-on=critical against a fixture whose worst severity is below critical returns nil. Locks in the false-positive-prevention property. All three new e2e tests run via captureRun (existing helper at cli_smoke_test.go:154) — function-level invocation, not exec subprocess, so they're fast and don't shell out. Verification: go build ./... clean go test ./... green go test ./internal/testdata/ green make docs-verify green Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Foundational deliverable for the Gate-pillar parity lift. Stable IDs unblock three downstream features: suppressions (`.terrain/suppressions.yaml` honoring per-finding entries), `terrain explain finding <id>` round-trip, and `--new-findings-only --baseline <path>` baseline-aware gating. Format: {detector}@{normalized_path}:{anchor}#{hash} Example: weakAssertion@internal/auth/login_test.go:TestLogin#a1b2c3d4 Where: detector = signal type (e.g. "weakAssertion") normalized_path = forward-slash, repo-relative anchor = symbol when present, "L<line>" otherwise, "_" when neither hash = 8 hex chars derived from canonical form Stability guarantees (documented in the FindingID field doc-comment and the package-level comment in `internal/identity/finding_id.go`): * Same (Type, Location.File, Location.Symbol, Location.Line) → same FindingID across runs. Used for suppression matching and baseline-aware gating. * Symbol takes precedence as the anchor when present, so line drift WITHIN a symbol does not change the ID. This is the common case (whitespace edits, import reordering). * File rename or symbol rename produces a new ID. The underlying finding has moved; old suppressions don't apply. * Line drift WITHOUT a symbol changes the ID — known limitation documented; AST-anchored 0.3 work removes it. What landed: internal/identity/finding_id.go (new, ~145 lines) BuildFindingID / ParseFindingID / MatchFindingID + helpers. Reuses GenerateID + NormalizePath from the existing identity package. internal/identity/finding_id_test.go (new, ~200 lines) 16 table-driven tests covering: stability, shape, distinct on rename/file-move/detector-change, path normalization (back- vs forward-slash), line anchor when no symbol, placeholder when nothing, symbol-precedence-over-line invariant, line drift changes ID without symbol, parser round-trip + malformed rejection (8 cases), anchor-with-colons round-trip, MatchFindingID semantics. internal/models/signal.go New `FindingID string `json:"findingId,omitempty"` field on models.Signal with the stability documentation inline. omitempty so older snapshots remain valid; pre-existing JSON consumers that don't read the field are unaffected. internal/engine/finding_ids.go (new) assignFindingIDs(snapshot) — walks top-level Signals + per-file Signals, populates FindingID for any signal that doesn't already have one. Pre-set IDs are preserved (lets specialized detectors like detectorPanic emit their own anchor). Idempotent and nil-safe. internal/engine/pipeline.go Calls assignFindingIDs(snapshot) right after SortSnapshot in Step 10 of RunPipelineContext. Sort runs first so IDs land in canonical order; this preserves byte-identical SOURCE_DATE_EPOCH output. internal/engine/finding_ids_test.go (new) 4 tests: top-level + per-file population, pre-set ID preserved, idempotency, nil-safe. Why the model package isn't importing identity directly: keeps `internal/models/` dependency-free (zero internal imports today). Engine orchestrates detectors and is the natural place for ID assignment; the dependency arrow runs engine → identity, which matches the layering elsewhere. Verification: go test ./internal/identity/ ./internal/engine/ — green go test ./... — green go test ./internal/testdata/ — green (goldens unchanged because FindingID is omitempty and the existing goldens don't assert its presence; 0.2.1 work updates goldens to include IDs) Next: Track 4.5 — `.terrain/suppressions.yaml` minimal viable shape; suppressions match against FindingID Track 4.6 — `terrain explain finding <id>` lookup Track 4.7 — `terrain suppress <id>` writer Track 4.8 — `--new-findings-only --baseline <path>` Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…7.6/7.7) (#141) Three small parity-axis lifts bundled into one PR. Each addresses a named cell from the audit doc. Track 5.2 — aiHallucinationRate framing The detector's name implies Terrain judges hallucinations directly. It does not — it reads hallucination-shaped failure metadata that the eval framework (Promptfoo / DeepEval / Ragas) has already computed and returns the rate. The launch-readiness review flagged this as overpromising. For 0.2.0: keep the type name `aiHallucinationRate` for back-compat (renaming the type touches every adapter, every test, every fixture) but tighten the manifest description + remediation so the trust framing is correct: Title: "Eval-Flagged Hallucination Share" Description: explicitly notes that Terrain reads framework metadata, does not judge hallucinations directly Remediation: tells the user to fix the eval scenario or raise the threshold with documented justification when they disagree with the framework's classification The actual type rename to `aiEvalFlaggedHallucinationShare` is 0.3 work (deprecation alias + back-compat consumer migration). Lifts area 5 (AI risk + inventory) E4 (Stability) — the misleading name was the load-bearing concern there. Track 7.6 — terrain init policy template Existing `generatePolicyYAML` already emits a commented template. Tightened to: - Reference the new docs/policy/examples/ files (Track 7.7) so users have a clear "copy this file" path instead of uncommenting the boilerplate one rule at a time. - Add inline comments per rule explaining what each does (was bare key/value pairs before). - Cleaner section split between "Core test-system rules" and "AI governance rules". Lifts area 10 (Policy / governance) P4 (Onboarding) from 2 to 3. Track 7.7 — three example policies New files under docs/policy/examples/: minimal.yaml — safe defaults for first-time adoption; every rule warn-only, nothing blocks the build balanced.yaml — recommended starting point for most teams; blocks on critical AI regressions + safety gaps + skipped tests; warns elsewhere strict.yaml — mature-repo enforced-quality branch; blocks on every high+ finding, zero accuracy-regression tolerance Plus README.md that documents the adoption ramp (minimal → balanced → strict), pairs each policy with the right CLI invocation, and cross-links to vision.md / CONTRIBUTING.md / feature-status.md. Lifts area 10 (Policy / governance) P6 (Examples) from 2 to 3. Pillar parity impact: this PR is a slice of the Understand-pillar (via 5.2) + Gate-pillar (via 7.6 / 7.7) lift work toward the 0.2.0 release gate. Verification: go test ./... — full suite green make docs-verify — manifest + rule docs regenerated and in sync Manual: read each new policy file end-to-end; the adoption ramp reads as one coherent story Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…it pointer (Tracks 8.4/8.5/8.6) (#142) Three small deliverables that close the install → CI gate journey the launch-readiness review flagged as missing. Track 8.5 — `docs/examples/gate/github-action.yml` The ONE recommended GitHub Action config for 0.2.0. Drops into `.github/workflows/terrain-pr.yml` and gives adopters: - per-PR `terrain analyze --write-snapshot --json` - per-PR `terrain report pr --base ... --new-findings-only --baseline ...` posting a unified comment via the `body-includes` marker so successive runs update the same thread - SARIF upload to GitHub code scanning (Security tab) - **safe-default mode**: warn-only by default; --fail-on critical is one uncomment away - --new-findings-only --baseline baked in by default so adopters with existing debt don't brick CI on day one Concurrency group + cancel-in-progress so a force-push doesn't pile up runs. Permissions list documents what each step needs. Track 8.6 — `docs/product/trust-ladder.md` The four-rung adoption path: Inventory → Warnings → CI annotations → Blocking gates. Each rung says what you do, what you get, what it doesn't do, and when to move up. The fundamental pattern this addresses: teams that jump from Rung 1 to Rung 4 in one step have CI bricking on day one against inherited debt. The ladder makes "see signals first, gate later" the recommended path, with the recommended config matching it. Cross-links: vision.md, feature-status.md, policy/examples/, github-action.yml. Closes the loop so an adopter who lands on any one of those docs can navigate to the rest. Track 8.4 — `terrain init` CI pointer Existing `terrain init` walks through "next steps" (run analyze, generate coverage, generate runtime artifacts, edit policy). Added: - Step (n+1) "Wire Terrain into CI (warn-only by default):" with copy-this-file pointer to the github-action.yml template and a pointer to the trust ladder for which mode to run when. - Policy step now references the three starter policies (minimal/balanced/strict) instead of the implicit "uncomment stuff" workflow. The flow from `terrain init` to a working CI gate is now four bullet points instead of five separate doc trails. Pillar parity impact: lifts area 12 (Distribution / install) P4 (Onboarding) from 2 → 3 (with the suspended Node 22 prominence work being the remaining gap). All three deliverables also lift "Examples" axes across multiple areas via the cross-cutting reach of the recommended config + trust ladder. Verification: go test ./... — full suite green go test ./internal/engine/ -run TestRunInit — all 9 init tests green Manual: read trust-ladder.md end-to-end; cross-references resolve Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…R comment (#145) The three Track 3 deliverables that close the pitch's "maps how your unit, integration, e2e, and AI tests actually relate to your code" promise. Together they make the recommended workflow (`terrain analyze && terrain report pr`) defensible against the launch-readiness review's "this is uneven across test types" critique. Track 3.3 — integration-test classification rigor Adds explicit content-based detection of HTTP-testing libraries (supertest, httptest, MockMvc, supertest, requests, etc.) so integration tests living alongside unit tests in flat directories are correctly classified. New `internal/testtype/integration_imports.go` with a curated allowlist of 30+ patterns spanning JS/TS, Go, Python, Java, Ruby ecosystems. New `refineIntegrationClassification` step in the analyzer reads each test file once via the existing FileCache and merges the content-based signal with the path/suite/framework-based signal, with explicit conflict-handling that lets explicit imports override directory naming. 14 new tests covering positive matches, prose-mention false-positive guards, multi-library co-detection, and the merge logic. Track 3.4 — e2e-to-code attribution honest carve-out New `docs/product/e2e-attribution.md` documents the exact limit of e2e attribution in 0.2: structural-only, via path co-location + framework-config declarations + shared-fixture transitive links + convention fallback. Explicitly carves out runtime trace ingestion, URL-to-route mapping, DOM-selector-to-component mapping, and cross-language attribution as 0.3+ work. The honest carve-out is the contract — the recommended-tests stanza tags e2e selections as `[structural-only]` so adopters know to inspect rather than trust blindly. Track 3.5 — unified PR-comment rendering audit New `internal/changescope/unified_render_test.go` enforces the visual contract for `terrain report pr --format markdown`: bracketed `[LABEL]` badges across stanzas, `**\`path\`**` locator format, em-dash separator, and a single recommended-tests stanza for unit + integration + e2e (no per-type splitting). Plus a consistent-section-order test. Companion doc `docs/product/unified-pr-comment.md` documents the contract, the section-level severity-grouping exception for the AI stanza (and why), and the alternatives that were considered and rejected. Verification: all internal + cmd tests pass; `make docs-verify` green; new tests run in ~0.5s combined. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two Track 5 deliverables for the parity-gated 0.2.0 plan. Together they raise the trust posture of the AI Risk Review surface — both in how it's classified (Track 5.1) and how it behaves under cancellation (Track 5.3). Track 5.1 — AI risk subdivision (Inventory / Hygiene / Regression) Adds `internal/signals/ai_subdomain.go` with three trust tiers and a classification for every CategoryAI signal in the manifest: - Inventory (Tier 1, publicly claimable): direct facts derived from declared AI surfaces — uncoveredAISurface, aiPromptVersioning, aiSafetyEvalMissing, capabilityValidationGap, phantomEvalScenario, aiPolicyViolation, untestedPromptFlow. - Hygiene (Tier 2, visible but not gating-critical): heuristic structural patterns — aiPromptInjectionRisk, aiHardcodedAPIKey, aiToolWithoutSandbox, aiModelDeprecationRisk, aiFewShotContamination, contextOverflowRisk. - Regression (Tier 2, eval-data-dependent): fires only when eval-framework artifacts present — every cost / latency / hallucination / retrieval / tool-routing / RAG-grounding signal across the airun catalog. Public helpers `AISubdomainOf`, `AISubdomainLabel`, and `AISubdomainTrustBadge` give renderers a single source of truth for tier vocabulary so PR comment, terminal report, and JSON all speak the same language. Drift gate test `TestAISubdomain_AllAISignalsClassified` fails CI if a new AI signal is added without a tier — closes the "silent dump into legacy umbrella stanza" failure mode. Companion doc `docs/product/ai-risk-tiers.md` documents the three tiers, the public-claim posture per tier, the gating contract (Tier 1 may be critical; Tier 2 caps at high), and the add-a-signal recipe. Track 5.3 — ctx audit on AI detector file walk Adds `aidetect.DetectContext(ctx, root)` that respects ctx in the source-walk inner loop — checks `ctx.Err()` every 64 entries, aborts cleanly when cancelled. The pre-Track-5.3 shape silently ignored ctx, so a `terrain analyze --timeout 5s` run against a large repo with AI patterns would still wait for the AI walk to finish after ctx had been cancelled. Pipeline call site (`internal/engine/pipeline.go:413`) now uses DetectContext so cancellation propagates end-to-end. Existing `aidetect.Detect(root)` is preserved as a backwards-compatible wrapper that delegates to DetectContext(context.Background()). New `cancellation_test.go` proves the contract: - already-cancelled ctx returns within 250ms on a 200-file fixture (vs ~50ms+ without short-circuit) - mid-walk cancel after 20ms aborts within 1s on a 1000-file fixture (vs ~200ms+ without honoring) - Detect / DetectContext produce identical results on the same fixture (backwards-compat invariant) Verification: 48 internal packages pass; cmd tests pass; make docs-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…147) Three Track 7 deliverables that lift the Gate-pillar AI side from "works on the happy path" to "works across the framework × version matrix we claim to support, with honest drift signals." Track 7.1 — Adapter conformance fixtures New `internal/airun/conformance/` package with shape-fixture tests covering each (framework × version) combination Terrain supports today: - Promptfoo v3 nested + v4 flat + missing-evalId variants - DeepEval 1.x camelCase + 1.x snake_case + bare-array variants - Ragas modern (samples + scores envelope) + legacy (bare array) Adding a new shape fixture is the documented extension recipe; the README at the package root spells it out. Track 7.2 — Shape detection + warn-on-drift New `internal/airun/shape.go` introduces `ShapeInfo` (framework, detected version, version source, drift warnings) plus three detector functions — DetectPromptfooShape, DetectDeepEvalShape, DetectRagasShape. Each does a top-level envelope probe (no full payload parse), classifies the shape, and emits warnings on unfamiliar variants. Public helpers `ShapeInfo.HasWarnings()` and `FormatWarnings()` let downstream callers log a single per-run notice when an adapter is parsing an unfamiliar shape, before the detector chain consumes the result. Track 7.4 — End-to-end Promptfoo+Terrain CI walkthrough New `docs/examples/gate/ai-eval-ci/` directory with: - README.md spelling out the full installation → baseline → first-PR flow, plus what the example does NOT do (anti-goals up front) - github-action.yml — drop-in workflow that runs Promptfoo, feeds results into Terrain, and posts the unified comment - promptfoo.config.yaml + prompts/ + evals/ — minimal working scenario package an adopter can copy and edit The walkthrough's posture matches the parity plan's recommended-CI-config rule: --new-findings-only --baseline by default, --fail-on critical for blocking gates, hygiene findings visible but non-gating. Verification: 8 new conformance tests pass; full airun package green; make docs-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

….6/10.8) (#149) Two Track 10 deliverables that lift the visual / UX side of 0.2.0 from "every renderer invents its own empty-state message" to "one designed vocabulary, regression-locked." Track 10.6 — Empty state helpers New `internal/reporting/empty_states.go` defines an EmptyStateKind enum with seven shipped kinds (zero-findings, no-AI-surfaces, no-policy, first-run, no-impact, no-test-selection, no-migration-candidates) plus three render targets: - EmptyStateFor(kind) → structured EmptyState (Header + NextMove) - RenderEmptyState(io.Writer, kind) → terminal-text rendering - EmptyStateMarkdown(kind) → blockquote-callout markdown Each shipped kind carries a designed header and a next-move nudge that names a concrete `terrain ...` command the user can run. Three contract tests enforce the design rules: - every kind has a non-empty designed header - voice & tone — no exclamation marks, no emoji codepoints, no British spellings (Track 10.7 voice rules locked at the test level) - every shipped kind's next-move references a command in backticks (no purely invitational empty states) First integration: `internal/reporting/insights_report_v2.go` swaps its "No significant issues detected." line for the EmptyZeroFindings rendering with the next-move nudge. Track 10.8 — Visual regression goldens (foundation) New `testdata/empty_state_goldens/<kind>.txt` — seven byte-for- byte golden files matching the seven shipped EmptyStateKind values. Companion test `TestEmptyState_Goldens` asserts byte equality on every kind. The drift gate `TestEmptyState_GoldensCoverEveryKind` verifies that the goldens directory matches the shipped enum 1:1 — adding a new kind without a golden, or vice versa, fails CI. Running `go test ./internal/reporting/... -update-empty-state-goldens` regenerates the goldens after intentional copy changes. This is the foundation for the broader Track 10.8 visual regression suite — once Track 10.2-10.4 (renderer migration to uitokens) lands, the same golden pattern extends to PR-comment markdown, SARIF tags, and HTML report screenshots. Verification: 11 new tests pass; full Go test suite green; no regression in existing renderers. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ure (Tracks 9.9/9.11) (#150) Two Track 9 (engineering hardening) deliverables. Both are pure test additions — no behavior changes — and both raise the adopter-trust posture against scenarios the existing test suite didn't exercise. Track 9.9 — Adversarial filesystem suite New `internal/analysis/adversarial_fs_test.go` exercises the analyzer against 8 deliberately weird filesystem inputs that real repositories surface but synthetic fixtures usually don't: - binary file with .ts extension (misnamed asset / compiled artifact) - oversize source file (~2MB; tests size-skip threshold) - UTF-8 BOM at file start (Windows-edited files) - NUL bytes mid-content (transpiler / minifier output) - 0-byte test file (developer commits before filling in) - nested .git directories (submodules) — load-bearing assertion that .git contents never leak into inventory - 50-level deep directory nesting (skipped on Windows due to path-length limits) - 180-char filename (long-but-legal, tests filesystem assumptions) Contract: every test asserts Analyze completes without panic / hang / OOM. Some tests additionally assert that legitimate inputs survive (binary-poisoned tree must not lose the real test file; nested .git contents must never leak). Skipped (out of scope, with rationale): - Symlink loops: behavior differs across platforms; add per-platform tests when an adopter hits one. - Permission-denied: hard to set up portably; manual smoke verifies the walker silently skips. Track 9.11 — Schema migration fixture New `internal/models/testdata/snapshot_v0_1_x_legacy.json` — hand-crafted JSON snapshot in the shape that 0.1.x actually wrote: schema version field absent, no SignalV2 envelope, no UnitID on code units, simpler snapshotMeta. New `internal/models/migrate_fixture_test.go` with 3 tests: - LoadLegacyFixture — load via Unmarshal + migrate via MigrateSnapshotInPlace; assert SchemaVersion stamped, generatedAt backfilled, UnitIDs backfilled (incl. parent- qualified case), compatibilityNotes in Metadata - LegacyFixtureDataPreserved — every field present in fixture (frameworks, test files, signals) survives intact through migration - FixtureRoundTripsViaJSON — migrated snapshot can be re-serialized + re-loaded + re-migrated near-idempotently (regression guard for the byte-identical determinism contract that `terrain analyze --write-snapshot` depends on) Verification: 8 adversarial-FS tests + 3 fixture migration tests pass; full Go test suite green; no regression in existing tests; make docs-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the canonical 3-repo convergence example for the Align pillar: docs/examples/align/multirepo/. Companion to the Track 6.1 manifest format (in PR #148) and the alignment-first migration framing (also #148). The example walks through Acme Corp's three Node services on partially-divergent test stacks, declaring a portfolio manifest, and the convergence sequence the cross-repo aggregator will recommend when it lands in 0.2.x. Until the aggregator binary ships, the example is illustrative — the README documents the *shape* of the expected output so: - the contract is locked before the implementation chases it - adopters who hand-write a repos.yaml today learn the file format that 0.2.x will consume unchanged - the alignment-first vs health-first sequencing rule is documented as the contract, not a runtime detail Files: - README.md full convergence story + expected output shape - .terrain/repos.yaml runnable manifest matching the story - snapshots/README.md notes on snapshotPath: future expansion Status posture matches the parity plan: Track 6 is parallel and partial-ship-OK in 0.2.0; Align is the secondary pillar; multi- repo is explicitly Tier 3 / experimental until 0.2.x ships the aggregator. This example is the bridge that locks the contract between today's manifest format and tomorrow's binary output. Verification: make docs-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the missing axis to the bench surface: existing BenchmarkFullAnalysis_* measure CPU but never fail on memory regressions. Real adopter complaints take the shape "Terrain ate 4 GB on my monorepo," not "Terrain was slow"; this suite plugs the gap. Two categories: Allocation benchmarks (Benchmark*_Memory) Wrap the existing analysis benches with b.ReportAllocs() so bytes/op + allocs/op surface as regression-comparable baselines. Run via: go test -bench Memory ./internal/analysis/... Ceiling tests (TestMemoryCeiling_*, TestMemoryNoLeak_*) Run analysis at known scales and assert peak heap growth stays under a configured ceiling. Three tests: - 1k files: ceiling 250 MB (current observed: ~177 MB) - 5k files: ceiling 1300 MB (current observed: ~1050 MB) - 5-iter repeated analysis: ceiling 2000 MB (current observed: ~1500 MB across 5 iterations on a 500-file fixture) Skipped by default Ceiling tests are gated on TERRAIN_MEMORY_BENCH=1 — they're expensive (force GCs, run analysis at scale) and surface ceiling regressions per the Track 9.10 baseline rather than smoke failures. The new `make memory-bench` target sets the env var for you. The default `go test ./...` loop is unaffected. Track 9.10 follow-up: the leak test reports unexpectedly high growth (~1.5 GB across 5 iterations on a 500-file fixture) — much higher than the FileCache amortization should produce. Comment in the test names this as the leading hypothesis (something in the per-run allocation graph holds onto data the cache should amortize) and points at where to look. Investigation is its own work; the ceiling here catches regressions BEYOND the current state, so a future fix that lowers actual growth will pass with large headroom — at that point the ceiling should ratchet down. Verification: all three ceiling tests pass under `make memory-bench`; default `go test ./...` skips them; existing benchmarks unaffected. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the per-detector budget mechanism that protects analyze runs from any single hung detector blocking the whole pipeline. The mechanism - New `DetectorMeta.Budget time.Duration` field. Zero means "use DefaultDetectorBudget" (30 seconds). Detectors with legitimate long-running work set this explicitly. - New `safeDetectWithBudget(reg, fn)` wrapper composes with the existing safeDetect panic-recovery: a panicking detector still produces the detectorPanic marker; a slow detector produces the new detectorBudgetExceeded marker. - All call sites in detector_registry.go (Run + RunWithGraph Phase 1/2/3 paths) routed through the budget wrapper. Pre-Track- 9.4 a hung detector would block the goroutine waiting on wg.Wait(); now the budget elapses first and the wait completes. - New SignalDetectorBudgetExceeded type registered in the manifest + signal catalog so ValidateSnapshot accepts the marker (same posture as detectorPanic — without the catalog entry, a single budget overrun would invalidate the whole snapshot). Behavior contract When a detector exceeds its budget, the pipeline returns the budget-exceeded marker instead of waiting for the eventual completion. The detector goroutine completes in the background (Go has no goroutine kill primitive); its post-budget signals are discarded. This is the right trade-off for the failure modes the budget targets: runaway regex, accidentally-O(n²) graph walks, blocking I/O on a slow filesystem. Coverage Five new tests in detector_budget_test.go: - budget exceeded → marker returned within budget window - fast detector → original signals returned - zero budget → DefaultDetectorBudget applied - panic + budget compose correctly (detectorPanic wins) - registry-level integration: r.Run() returns within budget even when a registered detector deliberately sleeps past it Plus the regenerated docs/rules/engine/detector-budget.md doc. Verification: all 5 budget tests pass; full Go test suite green; make docs-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rix (#144) Bundle of three Track 7 / Track 8 deliverables for the parity-gated 0.2.0 release plan. Track 7.3 — AI execution trust-boundary doc Adds docs/product/ai-trust-boundary.md spelling out what Terrain executes vs. what it parses, where the LLM call actually happens (inside the eval framework, not Terrain), the per-command trust surface for `analyze`, `ai list/doctor`, `ai run`, and `ai run --ingest-only`, plus the 0.2 → 0.3 sandboxing roadmap. Closes the "is this safe?" question that the launch-readiness review flagged as insufficiently documented. Track 8.1 — Node 22 prominence Documents the Node 22 npm-path requirement up front in README and docs/user-guides/getting-started.md, with explicit brew / `go install` alternatives for CI images on Node 20 LTS. Avoids the silent install-time failure mode the review caught. Track 8.2 — Release-smoke matrix expansion Extends the post-publish smoke job from linux/amd64-only to also cover darwin/arm64 (Apple Silicon — the modern Mac default) and windows/amd64 (the most likely Windows shape). Matrix uses the POSIX tar.gz extract path on Linux/macOS and a pwsh Expand-Archive path on Windows; both verify `terrain version --json` reports the tagged version. Catches per-platform "wrong build / wrong version string" regressions that previously could only surface after a user installed. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…racks 3.1/3.2) (#157) Two of the five "load-bearing centerpiece" Track 3 deliverables that turn the unified pitch into something the binary actually delivers. Track 3.1 — `terrain report pr --fail-on <severity>` Defends the pitch claim "Gate changes based on that system as a whole." Pre-fix, --fail-on existed only on `analyze`; the gating flow (`analyze && report pr`) silently lost the gate at the second step. Implementation reuses the same severityGateBlocked helper that PR #134 introduced for analyze. Added prSeverityBreakdown(severities []string) that converts a PR's NewFindings + AI BlockingSignals into the same SignalBreakdown shape `analyze.SignalSummary` uses, so the gate decision logic is shared, not duplicated. Wired through both the legacy `terrain pr` command (deprecated alias of `terrain report pr`) and the canonical `terrain report pr` namespace dispatcher. Same render-then-gate pattern as analyze: every output format (json, markdown, comment, annotation, default text) renders before the gate decision returns through the error channel. So `--json --fail-on=high` produces a valid JSON document on stdout AND exits with code 6 if the gate fired — the property the launch- readiness review's "JSON stdout purity" gate test asks for. Tests: - TestPRSeverityBreakdown: empty / mixed bag / case-insensitive / unknown-severities-dropped-silently table - cli_smoke_test.go updated for the new runPR signature Track 3.2 — `terrain report impact --explain-selection` Defends the pitch claim "See which tests matter for a PR — and why." Pre-fix, `report impact` showed selected tests but not the reason chains; the "and why" half of the pitch wasn't deliverable. Implementation reuses the existing internal/explain.ExplainSelection + reporting.RenderSelectionExplanation (already shipping for `terrain explain selection`). When --explain-selection is set, runImpact computes the selection explanation and renders it with verbose=true so per-test evidence (selection reasons, code unit matches, confidence) appears. Wired through both `terrain impact` (legacy) and `terrain report impact` (canonical). --json + --explain-selection emits the SelectionExplanation JSON structure for tooling consumption. Pillar parity impact: Track 3.1 + 3.2 are the centerpiece work that the plan calls "highest-priority track — every pitch claim must be directly verifiable in the CLI output before 0.2.0 ships." This PR closes two of the five Track 3 items; 3.3 (integration-test classification rigor), 3.4 (E2E attribution), and 3.5 (unified PR-comment audit) are separate follow-ups. Verification: go build ./... clean go test ./cmd/terrain/ green (4 new TestPRSeverityBreakdown cases; existing TestCLISmoke_PRCommand updated for new signature) go test ./... full suite green go test ./internal/testdata/ golden + CLI suite green Plan link: /Users/pzachary/.claude/plans/kind-mapping-turing.md (Tracks 3.1 / 3.2). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…2) (#151) Two release-readiness deliverables that lift the contract Terrain makes with adopters: links in user-facing docs go where they say they go, and JSON fields carry an explicit stability promise. Track 9.8 — Make docs-linkcheck New `cmd/terrain-docs-linkcheck/main.go` walks docs/ and verifies that every `[label](relative.md)` link resolves to a real file. Skips external (http/https/mailto), same-page anchors (#foo), and — by default — the docs/internal/ + docs/legacy/ subtrees whose link discipline is inherited debt (run with `-include-internal` to scan them anyway). Resolves directory links to <dir>/README.md when present; reports "directory link with no README.md" otherwise. Wired into the Makefile as `make docs-linkcheck`. Same posture as `make docs-verify` — fails with `::error::N broken links` + per-link source:line:target output suitable for CI annotation. Initial run on the public docs surface caught 3 real broken links (README.md → legacy/, quickstart.md → examples/, release-notes.md → ../architecture/) — all fixed in this PR by pointing at concrete files. The remaining 39 links inside docs/internal/planning/ are pre-0.2 inherited debt, deliberately skipped pending a separate cleanup PR. Track 9.12 — Schema field stability tiers New `docs/schema/FIELD_TIERS.md` documents the three-tier contract Terrain commits to per JSON field: Stable — name + type + semantics won't change without a major schema bump; safe for long-lived tooling Beta — name + type stable for the next minor; semantics may evolve as calibration corpora arrive Internal — diagnostic / debug fields; treat as scratch space Spells out concrete examples per tier (snapshotMeta.schemaVersion is stable; signals[].confidence is beta; metadata.diagnostics.* is internal); names the precedence order for telling a field's tier (name pattern → x-terrain-tier annotation → this page); defines the promotion path internal → beta → stable. Adopter guidance section: pin the schemaVersion, defensively read beta fields with fallbacks, never branch on internal. `docs/schema/README.md` linked from the new page so adopters reach the contract via the canonical schema entry point. Verification: `make docs-linkcheck` passes; `make docs-verify` clean; `go build ./cmd/terrain-docs-linkcheck/` clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

These keep getting committed unintentionally during PR rebases when `go run` materializes a binary in the repo root and `git add -A` sweeps it up. Ignoring at the source so the merge wave doesn't keep churning amend-and-force-push to remove them.

) Two related defects in `terrain serve`, both flagged by the launch- readiness review and verified in the codebase: 1. **Request context not wired.** The HTTP handlers ignored r.Context() and called engine.RunPipeline (which wraps context.Background()), so a client disconnect during a long analysis left the analysis running in the background with no handler waiting on it. 2. **Mutex-blocking analysis cache.** getResult held Server.mu via defer-Unlock for the full analysis duration. One slow analysis serialized every other request needing a pipeline result behind a single goroutine, regardless of whether the cache was warm enough for them. Both ship together because the right fix to (2) replaces the cache mutex with an RWMutex + singleflight, which also gives us a clean seam for (1): * Fast path: RWMutex.RLock so warm-cache hits don't contend. * Slow path: singleflight.Group.DoChan dedups concurrent in-flight analyses (one analysis per cache window, even with N waiters). * Per-caller cancellation: each handler threads r.Context() through getResult; the select on (ch | ctx.Done()) returns ctx.Err() immediately on disconnect. The shared analysis runs with context.Background() so a single caller's disconnect doesn't kill work other waiters depend on. Tradeoff documented inline: a single-waiter request whose context is canceled won't (yet) cancel the underlying analysis. Reference- counting waiters is on the 0.3 list. Tests added: * TestGetResult_CacheHit — fast path returns cached pointer * TestGetResult_RespectsCanceledContext — pre-canceled context returns context.Canceled within 2s rather than blocking on analysis (pre-fix this hung until pipeline completion) * TestGetResult_ConcurrentCallsShareCache — 50 concurrent callers on a warm cache observe the same report pointer Dep: golang.org/x/sync v0.10.0 (singleflight). Pinned to v0.10.0 because v0.20.0 bumps the go directive to 1.25; v0.10.0 is compatible with the existing go 1.23. Server-package doc-comment refreshed to describe the new concurrency model and remove the now-fixed "known issues" block that was added in PR #131. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ration + tier badges (#148) Three Track 6 deliverables that establish the Align pillar's 0.2.0 foundation. Track 6 is intentionally parallel and partial-ship-OK per the parity plan (Align is the secondary pillar; floor ≥ 3 soft); these three deliverables are the parts that lift Align without requiring the full multi-repo aggregator. The aggregator (Tracks 6.2 / 6.3) lands in 0.2.x once the manifest format ships here. Track 6.1 — Multi-repo manifest format New `internal/portfolio/manifest.go` with `RepoManifest` and `RepoEntry` types matching the parity plan's `.terrain/repos.yaml` schema: per-repo path or pre-saved snapshot, optional owner, frameworks-of-record declaration, free-form tags. Loader + validator + path resolution helpers. Validation enforced (rather than trusting YAML schema): - schema version is the only currently-supported value (1) - non-empty repos list - per-repo non-empty Name + at least one of Path / SnapshotPath - unique Name across the manifest - error messages cite the repo position so adopters can find the bad entry without grepping 12 new tests covering the canonical happy path plus every validation rule; ResolveRepoPath / ResolveSnapshotPath helpers for the aggregator that lands in 0.2.x. Track 6.5 — Migration alignment-first reframe New `docs/product/alignment-first-migration.md` documents the framing shift the launch-readiness review surfaced: most teams care about *aligning* (declare a framework of record per repo, see drift, converge gradually), not just *converting*. Conversion is one tool inside convergence, not the headline. Spells out single-repo and multi-repo flows end-to-end; pins the per-direction tier matrix (Stable / Experimental / Preview / Cataloged); names anti-goals (we don't auto-convert, don't pick the "best" framework, don't block convergence on calibration). Track 6.6 — Per-direction tier badges in `migrate list` New `tierLabelForState` helper renders GoNativeState as the Tier-badge vocabulary used elsewhere in 0.2 (Stable / Experimental / Preview / Cataloged). `terrain migrate list` and `terrain migrate shorthands` now surface tier badges instead of raw state strings. `humanizeGoNativeState` is preserved so any other call sites that depend on the raw state string don't break — only the user-facing list renderers switched. Tier label test locks the mapping; renaming a label is a public-facing change and the test surfaces the dependency. Verification: full Go test suite green; make docs-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Tracks 9.1/9.3) (#155) Two paired Track 9 deliverables that lift detector self-description from "what does it emit" to "what does it consume" — and surface input gaps as visible diagnostics instead of silent zero-output. Track 9.1 — Capability metadata extension Adds five new fields to DetectorMeta: RequiresRuntime — needs RuntimeStats from runtime artifact ingestion (--runtime junit.xml / jest.json) RequiresBaseline — needs Baseline snapshot pointer (--baseline) RequiresEvalArtifact — needs EvalRuns from Promptfoo / DeepEval / Ragas adapter ingestion ContextAware — honors ctx.Err() in inner loops (descriptive today; surfaced in `terrain doctor` cancellation posture) Experimental — detector implementation not yet stable (distinct from manifest-level signal status) All zero-default; existing detectors continue working unchanged. New detectors that genuinely consume these inputs declare them via the metadata so the missing-input check knows what to flag. Track 9.3 — Missing-input diagnostics New `safeDetectChecked(reg, snap, fn)` — the registry's canonical detector-invocation path. Pre-Track-9.3 a runtime-needing detector on a no-runtime snapshot would silently emit zero signals; adopters had no way to know whether the detector ran-and-found- nothing or ran-but-was-blind. When `missingInputs(meta, snap)` returns non-empty, the helper returns a single SignalDetectorMissingInput marker per affected detector, with the explanation listing every flag the user needs to add (Oxford-comma joined). The actual detector body is skipped — no panic, no waste. All call sites in detector_registry.go (Run + RunWithGraph Phase 1/2/3) routed through safeDetectChecked. The check composes with safeDetect's panic recovery: a panicking detector with sufficient inputs still produces detectorPanic; a panicking detector with missing inputs is shielded by the early-return. New SignalDetectorMissingInput type registered in the manifest + signal catalog so ValidateSnapshot accepts the marker (same posture as detectorPanic). Coverage 7 new tests (missing_input_test.go): - happy path (detector runs when no inputs required) - missing runtime → diagnostic with --runtime flag named - runtime present (RuntimeStats on TestFile) → detector runs - missing baseline → diagnostic with --baseline flag named - missing eval artifact → diagnostic with promptfoo-results flag named - multiple missing → one diagnostic listing all three (Oxford) - joinInputNames covers 0/1/2/3+/4 cases including the Oxford comma fix Verification: full Go test suite green; make docs-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Track 9.7) (#156) Adds the Track 9.7 release-readiness gate: cross-checks docs/release/feature-status.md against the canonical signal manifest. Drift between what the curated doc promises and what the engine actually ships is the failure mode adopters notice when they evaluate the binary against the marketing claim. The gate - New `cmd/terrain-truth-verify/main.go` walks docs/release/feature-status.md, extracts every backtick-delimited camelCase signal reference between the "Detectors / signal types" anchor and the "Planned" subsection, and verifies each resolves to a real entry in internal/signals/manifest.go. - Camel-case constraint (lowercase start + at least one mid-word uppercase) excludes CLI verbs (`report`, `eval`, `policy`) and English filler words from false-positive matches. - Planned subsection excluded by design — references there name signals that *don't* yet have a code-side implementation; flagging them would invert the signal. - Engine self-diagnostic signals (detectorPanic, detectorBudgetExceeded, detectorMissingInput, suppressionExpired) excluded from the orphan check — they're documented inline alongside the mechanisms that emit them, not in the curated signal table. Wired into Makefile as `make truth-verify`. Same posture as `make docs-verify` and `make pillar-parity` — fails with `::error::N broken signal reference(s)` + per-signal output suitable for CI annotation. What it caught Initial run on the current docs surfaced one real broken reference: `duplicateCluster` was listed as a "stable signal" in the table but isn't a `Signal` type at all — duplicate-cluster analysis surfaces via `DuplicateClusters` in the analyze report rather than through the signal mechanism. Doc updated to clarify the distinction. Plus 15 advisory orphans (stable signals in the manifest that the curated doc doesn't mention by name). The doc explicitly says it's a "curated view" with the manifest as the source of truth, so these are acceptable today; --strict-orphans escalates them to hard failures for adopters who want full coverage. Out of scope today (per the package comment) - README command list ⊆ dispatcher: needs Track 9.6 registry refactor first - CHANGELOG promotion-claim cross-check: per-signal status is already manifest-driven, so docs-verify catches it - CI matrix ⊆ compatibility tier doc: distinct failure mode in workflow YAML rather than markdown Coverage 5 new unit tests in main_test.go: happy-path extraction, no-anchor returns empty, planned subsection excluded, all-lowercase tokens rejected (regression guard against the false-positive class that caught report / eval / policy on the first run), engine-diagnostic classifier. Verification: full Go test suite green; make truth-verify exits 0 on the current docs surface; make docs-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#158) Brings forward what the prior plan deferred to 0.3: a working suppression model in 0.2.0 so adopters can adopt strict CI gating without forking the project. Builds on Track 4.4 (stable finding IDs) — most suppressions match by FindingID; a (signal_type, file glob) fallback covers class-wide waivers. Schema: schema_version: "1" suppressions: - finding_id: weakAssertion@internal/auth/login.go:TestLogin#a1b2c3d4 reason: false positive; sanitized upstream expires: 2026-08-01 owner: "@platform" - signal_type: aiPromptInjectionRisk file: internal/legacy/** reason: rewriting in 0.3 expires: 2026-09-01 Match modes (an entry uses exactly one): * `finding_id` exact match — most precise; survives line drift when the underlying signal has a stable symbol per BuildFindingID semantics * `signal_type` + `file` glob — coarser; supports `**`-style recursive patterns. Useful for class-wide waivers. Anti-goal: suppressions are NOT a free-form ignore-everything switch. The schema rejects entries that satisfy neither mode and entries missing `reason` (every suppression must justify itself). Lifecycle: * `reason` required — printed when a suppressed signal would otherwise have been blocking, so reviewers see the rationale in PR comments without opening the YAML. * `expires` optional ISO 8601 date. After the date, the suppression is INVALID — the underlying signal fires again, and a new `suppressionExpired` warning signal surfaces in the report so silent rot doesn't accumulate. * `owner` optional free-text owner pointer for review. Engine wiring: * `internal/suppression/` package — Load + Apply + path-glob helpers. 9 unit tests covering load validation, expiry, finding-id match, signal-type+glob match, idempotency, nil- safety. * `internal/engine/pipeline.go` — Step 10c after FindingID assignment: load `.terrain/suppressions.yaml` (or PipelineOptions.SuppressionsPath override), apply matched entries, surface expired entries as warning signals. * `PipelineOptions.SuppressionsPath` for `terrain analyze --suppressions <path>`. * 5 engine integration tests: drops matching signal, expired emits warning + lets signal fire, missing file is a no-op, malformed file logs and continues (don't fail pipeline on a fat-fingered YAML edit), override path honored. Manifest: * New `suppressionExpired` signal type, governance category, medium severity, evidence-strong (it's a deterministic check). Registered in `internal/signals/manifest.go` and `internal/models/signal_catalog.go`. Rule doc auto-generated via cmd/terrain-docs-gen. * No new detector — pipeline emits the signal directly. What's NOT in this PR (follow-ups): * Track 4.6: `terrain explain finding <id>` — round-trips an ID back to the underlying signal + suggests a suppression command * Track 4.7: `terrain suppress <id>` — writes a suppression entry to the YAML with goccy/go-yaml round-trip preservation * Track 4.8: `--new-findings-only --baseline <path>` — uses the same FindingID set to filter signals against a baseline Verification: go test ./internal/suppression/ — 9 tests green go test ./internal/engine/ — 5 new integration tests green go test ./... — full suite green make docs-verify — manifest + severity rubric + rule docs in sync Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Updates docs/release/parity/scores.yaml to reflect what the merge wave delivered. 58 cell lifts applied via scripts/parity-lift.py (committed for reproducibility — running it is idempotent), each tied to the merged PR(s) that delivered the evidence. The dashboard now shows accurate honest state, not pre-merge baseline. What lifted: - V1 across nearly every area: uitokens (#136) shipped - V3 across multiple areas: empty-state helpers (#149) - E3 / E7 on detector-related areas: Track 9.1/9.3/9.4 + 5.3 - P3 / P6 broadly: Track 1/5/6/7/9 doc + example deliverables - P1 / P4 on pr_change_scoped + policy_governance: Track 3 + 4 + 7 + 8 deliverables - migration_conversion + portfolio: Track 6 foundation - distribution_install: Track 8 lift What remains at 2 (release-blocking on Gate, surfaced honestly): - pr_change_scoped / E2: needs labeled PR-diff corpus (Track 7.5) - ai_risk_inventory / E2: needs labeled real-repo precision corpus - migration_conversion / E2: conversion-corpus A-grade calibration (Track 6.7, 0.3 work) - portfolio: aggregator (Track 6.2/6.3) still 0.2.x partial-ship Plus drive-by linkcheck fixes: docs/product/ai-risk-tiers.md and docs/product/ai-trust-boundary.md had directory links that the docs-linkcheck gate (Track 9.8) flagged after #146/#144 merged. Both pointed at directory paths missing README.md; replaced with specific file links. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t (Track 10.2) (#161) Migrates internal/changescope to consume from internal/uitokens for its bracketed severity and posture-verdict badges. First in-tree consumer of uitokens — the foundation Track 10.1 shipped (#136) is now live. What changed - New `uitokens.BracketedSeverity(severity)` and `uitokens.BracketedVerdict(band)` helpers. They encapsulate the exact bracketed strings (`[HIGH]`, `[CRIT]`, `[PASS]`, `[FAIL]`, etc.) that the unified-PR-comment visual contract requires — locked by the changescope golden tests shipped in #145. - `internal/changescope/render.go` `severityIcon` and `postureBadge` are now thin wrappers that delegate to the uitokens helpers. The renderer call sites are unchanged; the vocabulary is now owned by one place. Why this shape The PR-comment markdown contract requires `[LABEL]` brackets because GitHub-flavored markdown doesn't render ANSI color and the brackets make badges scan reliably. The terminal-text severity badge (uitokens.SeverityBadge) returns `Bold(Alert("HIGH"))`-style colored tokens for direct terminal use. Both shapes are valid design choices for their respective surfaces; locking each one in uitokens prevents drift. The wrapper-not-inline approach keeps the surgical: the changescope helpers stay one-liners, the visual contract test (#145) keeps asserting the same strings, and a future renderer joining the party imports uitokens directly without changescope being in the middle. Coverage - 2 new tests in uitokens_test.go: TestBracketedSeverity (locks every severity → bracket mapping) + TestBracketedVerdict (same for posture bands) - Existing changescope golden tests (TestRenderPRSummaryMarkdown, TestRenderPRSummaryMarkdown_UnifiedShape, etc.) all still pass — proves the byte-identical rendering contract holds through the migration Verification: full Go test suite green; `make docs-verify`, `make docs-linkcheck`, `make truth-verify` clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the Track 10.5 deliverable: one progress vocabulary used across `terrain analyze`, `terrain migrate run`, `terrain ai run`, and `terrain report pr`. Adopters now see the same shape regardless of which command is running. Two types Spinner — TTY-aware idle-progress indicator (Braille pattern dots, constant-width frame rotation). Goes to stderr so JSON / report output piped to a file or another tool stays clean. Stage — multi-step progress reporter for the canonical pipeline shape (Step 1/5 → Step 5/5). Used where work is segmented into named stages. Both are silent on non-TTY (CI logs, pipes, redirects) and silent when --quiet is passed. Adopters running inside CI never see glyphs in their build logs. The TTY check is one-shot at construction, which matches the actual deployment shape — stderr is either a terminal or it isn't, and adopters who pipe partway through a run get correct behavior either way. Design constraints honored - Zero dependencies on internal/uitokens — symbol vocabulary is parallel but locally owned, so uitokens itself can use progress for long-running token-rendering ops without import cycles. - Nil-safe — Start, Update, Stop, Step, Done all no-op on a nil receiver, saving callers from `if sp != nil` boilerplate. - Idempotent stop — Spinner.Stop is safe to call multiple times and safe without a matching Start. Real-world callers in defer chains hit this regularly. - Thread-safe Update — Spinner runs its animation in a goroutine; Update can be called from any goroutine. Coverage 9 tests covering: non-TTY silence, --quiet silence, TTY happy path with frame + label assertion, Stop idempotency, Stop without Start, nil safety, Stage non-TTY silence, Stage canonical format, Stage nil safety. Wiring in to analyze / migrate run / ai run / pr is a follow-on PR that doesn't conflict with the unified-PR-comment goldens — those goldens assert markdown content, not the stderr progress stream. Verification: full Go test suite green; make docs-verify / docs-linkcheck / truth-verify clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the Track 10.7 release-readiness gate: a Go-source lint that enforces the voice-and-tone rules from the parity plan. What it catches exclamation Word-final exclamation marks ("Done!", "All set!", "Found 3 issues!"). Visual badges like `[!]` / `[!!]` and HTML markup like `<!DOCTYPE` are NOT flagged — they convey severity through bracketed-glyph shape, not through exclamatory tone. british-spelling A curated list of British spellings the Terrain voice rejects. Covers -our endings (colour, behaviour, favour), -re endings (centre, metre), -ise / -isation forms (optimise, organisation), -ce verb endings (defence, licence), -logue (catalogue), and high-frequency individual words (grey, aluminium, fulfil). Word-boundary anchored so legitimate American words don't false-positive. How it scans Default scan targets are the user-visible Go surfaces: the signals manifest + signal_types.go (every Description / Remediation that surfaces in a finding), the cmd/terrain package (every Println / Fprintf), internal/reporting and internal/changescope (every renderer). Test files are skipped — tests can use any prose without tripping the lint. False-positive guards - Regex-pattern guard: literals containing regex syntax (\\w, \\d, [^...], (?P<...>) are not exclamation-checked. Without this, character classes with `!` would false-positive. - Allow-list hook: filter() has the hook for future per-line silences (empty today; documented for use when needed). Wired into Makefile as `make voice-lint`. Initial run on the full codebase reports clean — the existing prose (manifest, README, quickstart, etc.) already follows these rules. The lint catches NEW drift, not retroactive cleanup. Coverage 5 tests in main_test.go: exclamation pattern (prose vs symbol distinction), British spelling pattern (positive + negative cases for each ending family + edge cases), regex pattern guard, end-to-end scanFile against a synthetic fixture, test-files- skipped contract. Verification: full Go test suite green; all five release- readiness gates pass (docs-verify, docs-linkcheck, truth-verify, pillar-parity, voice-lint). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n) (#164) First in-tree consumer of internal/progress (#162). Wraps the conv.RunTestMigration call in cmd_convert.go's runConvert with a TTY-aware Spinner that emits a Braille-pattern dot rotation while the conversion runs. Behavior - Spinner is created with a label that includes from→to when available ('Converting jest → vitest') or 'Converting' alone for the auto-detect path. - Goes to stderr so JSON / report output piped to stdout stays clean. - No-op when --json is set (suppresses any progress to keep JSON output clean) or --plan / --dry-run (the planning phase is fast enough that a flashing spinner would just be noise). - defer sp.Stop() ensures the line is cleared even on early return paths (input errors, validation failures). Why convert first Convert is the longest-running CLI path with no progress indicator today. Adopters running 'terrain convert tests/ --from mocha --to jest' on a multi-hundred-file batch see no output for ~tens-of-seconds; the spinner gives them confirmation that the binary is working. Future wiring (analyze / migrate run / ai run / pr) can use the same pattern. The internal/engine pipeline already has its own ProgressFunc-based step reporter (cmd/terrain/progress.go) that predates internal/progress; reconciling the two is a future polish item. Verification: full Go test suite green; convert command tests unaffected; make voice-lint clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds internal/cli — the registry foundation Track 9.6 of the parity plan calls for. The package owns the Command type, Pillar + Tier enums, and a thread-safe Register / Get / All / ByPillar API. Status - Foundation only. The existing dispatcher in cmd/terrain/main.go is NOT migrated to consume from the registry yet — that's 0.2.x work. The registry is additive: any caller (printUsage, doctor, truth-verify, docs-gen) can read from it without forcing the big switch in main.go to become registry-driven. - Default registry is empty in 0.2.0. Future PRs that introduce individual commands (or migrate existing ones) populate it via init() blocks. Why a separate package internal/cli, not cmd/terrain/. Putting it in cmd/terrain/ would couple the registry to the binary's package and make it un-importable from internal/signals — where the eventual truth-verify cross-check between the registry and the signal manifest will live. internal/cli is the right home: importable from anywhere in the tree, no dependencies on cmd/. What it owns Command — Name, Pillar, Tier, JourneyQuestion, Description, Aliases Pillar — Understand / Align / Gate / Meta, mirroring docs/release/parity/rubric.yaml Tier — Tier1 / Tier2 / Tier3 (publicly-claimable tier per the parity plan) Registry.Register — fails on duplicate name or alias collision Registry.MustRegister — panic-on-error variant for init() blocks Registry.Get — looks up by name or any alias Registry.All — alphabetical, alias-deduplicated Registry.ByPillar — pillar-grouped (omits empty pillars) Registry.Names — for truth-verify cross-checks What it does NOT own - Argument parsing (flag.FlagSet stays per-command) - Dispatch (the big switch in main.go is still source of truth until a 0.2.x PR migrates it) - Help-text generation (printUsage can opt in later) Coverage 11 tests, all passing under `-race`: - Required-field validation (Name, Pillar) - Duplicate-name + alias-collision rejection - Get by name + by alias - All() dedup + alphabetical order - ByPillar grouping with empty-pillar omission - Names() (for truth-verify use) - Concurrent register + read (race detector) - MustRegister panic-on-error contract Verification: full Go test suite green; make docs-verify / docs-linkcheck / truth-verify / voice-lint all clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five user-visible issues caught by running 'terrain analyze' and 'terrain report pr' against a fresh repo and against this repo's own diff before tagging 0.2.0. None of the existing tests caught these — they're shape-of-output issues that only surface when a human reads the output. 1. Help text leads with the pitch, not the legacy tagline `terrain --help` opened with "Terrain — test system intelligence platform" — the framing the parity plan and #137 explicitly moved away from. Changed to the pitch ("the control plane for your test system") + the one-paragraph promise so the help text matches the README and vision doc. 2. Pluralization in the analyze headline Fresh-repo analyze said "1 test files across 1 frameworks". Now properly pluralizes singular and plural via the existing plural() helper. 3. Empty-repo headline lied about health A repo with zero test files used to render "Your test suite looks healthy: 0 test files across 0 frameworks." — calling absence of tests "healthy" is wrong on its face. Now renders "No test files detected. Add tests with your framework of choice, then re-run `terrain analyze`." 4. PR-comment percentage rounded sub-1% selections to 0% "Tests selected | 7 of 796 (0% of suite)" was technically integer-truncation but read as "selection ran and produced nothing". Now displays "<1%" for sub-1% fractions; the formatSuitePercent helper centralizes the formatting so other surfaces can adopt it. 5. "Exported function X" misnomer for non-functions The protection-gap message claimed every exported symbol was a "function", regardless of CodeUnitKind. Adopters with an exported var (like `cli.Default`) or type (`cli.Registry`) saw "Exported function Default has no observed test coverage." Now the kind is read from CodeUnit.Kind and rendered with the matching noun ("Exported method", "Exported class", "Exported module"), with "Exported symbol" as the neutral fallback for kinds the parser didn't classify. How these were caught I built the binary, ran `terrain analyze` and `terrain report pr` on this repo and on a 1-file fresh fixture, and read the output the way an adopter would. Each issue was a paper-cut that existing tests don't assert on (the test goldens cover shape; they don't catch wrong nouns or rounded percentages). These are pure polish — no behavior changes, no new features, no schema changes. Locks the existing experience to be honest and accurate before tag. Verification: full Go test suite green; all five release-readiness gates pass (docs-verify, docs-linkcheck, truth-verify, voice-lint, make build); manual smoke against a fresh-repo + this-repo confirms the user-visible changes render as expected. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…licy redesign + schema docs (#168) * feat(0.2): hero verdict block + adapter ingestion diagnostics (Gate pillar lift) Lifts three Gate-pillar cells from synthetic-fixture floor toward publicly-claimable: ai_eval_ingestion.E3 (2→4), ai_execution_gating.V2 (2→4), ai_execution_gating.E3 (2→4). internal/uitokens/uitokens.go: - HeroVerdict(verdict, headline) — designed three-line block with rule / indented badge + headline / rule. The block frames the gating decision so it carries visual weight beyond the rest of the report. Color-and-symbol via existing token vocabulary (Alert/Warn/Ok + SymFail/SymWarn/SymOK). - HeroVerdictMarkdown(verdict, headline, reason) — markdown variant for PR-comment / GitHub surfaces. Blockquote callout (tints on GitHub) + horizontal rule. Optional reason as italic line. - heroVerdictBadge / bracketVerdict helpers handle the BLOCKED / WARN / PASS vocabulary distinct from VerdictBadge so the hero presentation can use a heavier shape ("[BLOCKED]") without changing VerdictBadge's contract. - Tests: TestHeroVerdict + TestHeroVerdictMarkdown lock both shapes. cmd/terrain/cmd_ai.go: - `terrain ai run` text output now leads with HeroVerdict block, followed by structured Reason / Command / AI Signals / Ingestion diagnostics sections — the previous single-line `Decision: BLOCKED — reason` is replaced. - aiRunHeroLines() centralizes the (action, reason, signalCount) → (verdict, headline) mapping so JSON / text / downstream PR surfaces stay consistent. internal/airun/eval_result.go: - New IngestionDiagnostic{Field, Kind, Detail} type capturing per-field fallbacks during adapter ingestion (kinds: missing, computed, default-applied, coerced). - EvalRunResult.Diagnostics field surfaces these to consumers. internal/airun/{promptfoo,deepeval,ragas}.go: - Each adapter records diagnostics for the fallbacks that matter to gating decisions: derived aggregates when stats block is absent, missing tokenUsage.cost (aiCostRegression no-ops), defaulted timestamps, missing metricsData (DeepEval), and missing quality axes (Ragas — when no faithfulness / context_recall / answer_relevancy in any row). - Tests in promptfoo_test.go lock the canonical diagnostic emissions. cmd/terrain/cmd_ai.go (rendering): - New "Ingestion diagnostics (N):" block in `terrain ai run` output surfaces every IngestionDiagnostic with its kind and detail. Adopters auditing a gating decision can see exactly which fields fell back. docs/release/parity/scores.yaml: - ai_eval_ingestion.E3: 2→4 - ai_execution_gating.V2: 2→4 - ai_execution_gating.E3: 2→4 These three cells were among the audit's specifically-named Gate-pillar gaps. Several other Gate cells remain at 3 (the publicly-claimable bar requires labeled-PR precision corpus and additional doc/UX lifts) — this is one focused step toward the Gate floor=4 target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): policy report redesign + AI eval onboarding doc (Gate pillar lift) Lifts two more Gate-pillar cells: policy_governance.V2 (3→4) and ai_execution_gating.P4 (2→4). internal/reporting/policy_report.go: - Redesigned `terrain policy check` rendering. Hero verdict block at top via uitokens.HeroVerdict — PASS / BLOCKED / WARN with violation count, replacing the previous single Status: PASS/FAIL line. - Violations grouped by severity (critical → low) with BracketedSeverity badges per violation. - Per-violation now shows `[CRIT] type (Category) — explanation` with a `location:` follow-on, replacing the flat ` - <type>: <explanation>` rendering. - New helpers: severityRenderOrder (canonical ordering), groupViolationsBySeverity (deterministic grouping with category + type tiebreakers), policyHeroLines (verdict + headline mapping). docs/user-guides/ai-eval-onboarding.md (new): - First-10-minutes walkthrough closing the audit's ai_execution_gating.P4 finding ("users new to AI evals don't know whether to run Promptfoo first"). - Three-step flow: ai list → run framework yourself → ai run. - Explicit "what Terrain does vs. what you do" table to clarify the trust boundary up-front. - Per-framework commands for Promptfoo, DeepEval, Ragas with their output-flag invocations. - Step 4 covers ingestion-diagnostics interpretation (introduced in the previous commit) so adopters can audit gate-decision data lineage. - Common-questions section addresses sandboxing, custom frameworks, audit trail. docs/release/parity/scores.yaml: - ai_execution_gating.P4: 2→4 - policy_governance.V2: 3→4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): published eval-adapter schema contract (Gate pillar lift) Lifts ai_eval_ingestion.E4: 3 → 4. docs/schema/eval-adapters.md (new): - Documents the canonical EvalRunResult / EvalCase / EvalAggregates / TokenUsage / IngestionDiagnostic shape every adapter (Promptfoo, DeepEval, Ragas, Gauntlet) emits. - Field-level "Stability: Stable" annotations make the long-lived contract explicit per FIELD_TIERS.md tiers. - Adapter-authoring checklist: parse canonical format, populate Stable fields, emit IngestionDiagnostic per fallback, add conformance fixtures, lock new diagnostics with unit tests. - Cross-references per-framework integration docs + conformance suite. The schema doc closes the audit's E4 concern that adapters "consume each upstream's shape and we won't notice when upstream changes." The published contract + diagnostic mechanism + conformance tests collectively give us notice on shape drift. docs/release/parity/scores.yaml: - ai_eval_ingestion.E4: 3→4 Net `make pillar-parity` after this commit: AI eval ingestion area floor lifted 2 → 3 (from cells E3=4 + E4=4 this PR plus V2/V3 still at 3 carrying the area). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: fix go.mod indirect annotation for golang.org/x/sync PR #132 introduced internal/server/server.go's direct import of golang.org/x/sync/singleflight, but go.mod was never re-tidied so the require line still carries // indirect. CI's `go mod tidy && git diff --exit-code go.mod go.sum` step now fails on every PR because of this drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: cross-platform path handling in suppression + portfolio tests Two pre-existing Windows-only test failures blocking CI on every PR in the 0.2 stack. internal/suppression/suppression.go: - pathMatch was using filepath.Match on inputs already normalized to forward-slashes via filepath.ToSlash. On Windows filepath.Match treats `\` as the separator, so `*.go` matched the entire forward-slashed `sub/foo.go` (the `/` wasn't a separator in its semantics). Switch to path.Match (Unix semantics) via a pathPkgMatch helper. Forward-slash inputs + Unix-semantics matcher = correct behavior on every host OS. internal/portfolio/manifest_test.go: - TestResolveRepoPath_Absolute constructs `\elsewhere\repo` expecting filepath.IsAbs to recognize it as absolute. Windows treats this as relative (drive letter required), so the test fixture isn't actually testing what it intends. Skip on Windows where the rooted-without-drive case is a different edge case the function doesn't claim to handle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… + audit fixes (#167) * feat(0.2): suppression CLI workflow + --new-findings-only (Tracks 4.6/4.7/4.8) (#140) Bundles the three remaining Track 4 deliverables into one PR. With 4.4 (finding IDs) and 4.5 (suppression model) already in flight, this PR makes the suppression workflow actually usable end-to-end: Track 4.6 — terrain explain <finding-id> Extends `terrain explain` to recognize stable finding IDs (e.g. "weakAssertion@internal/auth/login.go:TestLogin#a1b2c3d4"). On a hit, prints a finding-detail block: detector + severity + location + evidence + explanation + suggested action + the canonical `terrain suppress <id> --reason "..."` invocation. A finding ID that parses but isn't in the snapshot returns a distinct exit-5 (not-found) message that distinguishes "stale ID after refactor" from "garbage input" — common adoption flow when a user keeps a CI link to a finding that has since moved. Implementation: lookupSignalByFindingID + renderFindingExplanation in cmd/terrain/cmd_explain.go. Track 4.7 — terrain suppress <finding-id> --reason "..." [--expires] [--owner] New top-level Gate-pillar primitive. Validates the ID format, refuses duplicates (existing entry → usage error pointing at the existing reason), appends a YAML entry to .terrain/suppressions.yaml. Writes text rather than re-marshaling the file so any comments / ordering the user added by hand are preserved. Schema header is auto-emitted on first call. --reason required (every suppression justifies itself, per Track 4.5 schema). --expires optional but recommended; ISO YYYY-MM-DD shape validated up front. --owner optional free-text pointer. Implementation: cmd/terrain/cmd_suppress.go + 7 unit tests. Track 4.8 — terrain analyze --new-findings-only --baseline <path> Filters the snapshot to keep only signals whose FindingID is NOT present in the baseline. The "established repos with debt" adoption flow: `--fail-on critical` would brick CI on day one against existing high findings; combining with `--new-findings-only --baseline old.json` makes the gate fire only on findings introduced AFTER the baseline. Implementation: PipelineOptions.NewFindingsOnly + internal/engine/new_findings_only.go (applyNewFindingsOnly). Runs after suppression apply so the baseline comparison sees the user's intended-active signal set. No-baseline case: --new-findings-only is inert; logs a warning so the user notices the flag had no effect (better than silent success that masks the misconfiguration). Signals without FindingID (older / specialized emissions) are KEPT — over-report rather than under-report. Implementation: 6 unit tests including the "no-baseline" warning path, empty baseline, per-file signals, and signals without IDs. Refactor: runAnalyze gets a `analyzeRunOpts` struct so the call site in main.go isn't a 17-positional-argument list. The struct collapses the existing args + adds SuppressionsPath + NewFindingsOnly. Future flag additions stop expanding the call signature. Validation in main.go: --new-findings-only requires --baseline; the combination is rejected at usage-error level (exit 2) so the user gets a clear message rather than a silent no-op. Verification: go test ./cmd/terrain/ -run "TestRunSuppress|TestLooksLikeISODate" — 7 tests green go test ./internal/engine/ -run "TestApplyNewFindingsOnly" — 6 tests green go test ./... — full suite green go test ./internal/testdata/ — golden + CLI suite green Plan link: /Users/pzachary/.claude/plans/kind-mapping-turing.md (Tracks 4.6, 4.7, 4.8). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): self-scan polish — empty-repo grade, headline dedup, pluralization Audit findings remediated in a single pass: - internal/insights: empty-repo (zero tests AND zero findings) now shows "—" grade with actionable next-step headline instead of misleading "A". The first-user trust hit was real — a fresh repo with no tests grading "A" undermines the pitch. - internal/analyze/headline.go: critical-signal headline says "critical" not "high-priority", matching the body's `[CRITICAL]` vocabulary. Empty-repo case detected and given an actionable headline. - internal/analyze/analyze.go: removed the duplicate "[HIGH] N critical signals" Key Finding — that fact is already the headline; Key Findings are reserved for distinct actionable items. - Pluralization sweep across analyze / changescope / reporting / cmd_ai: replaced literal `(s)` with reporting.Plural(...) helper for finding/test/unit/file/gap/check/scenario/etc. - Tests + golden updated for the new "—" empty-repo grade and the unified pluralization output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): pillar markers on Signal / KeyFinding / SARIF / doctor (Track 2) Plumbing for pillar-aware grouping in every output mode the pitch promises ("gate the system as a whole") — JSON envelopes, SARIF tags, and doctor maturity now all carry the pillar. - internal/models/signal.go: new Pillar field on Signal (omitempty, back-compat); PillarFor(category) and Pillar{Understand,Align,Gate} constants. Mapping: structure/health/quality/ai → Understand; migration → Align; governance → Gate. - internal/engine/finding_ids.go: assignSignalID renamed to finalizeSignal; populates Pillar from Category in the same pass it stamps FindingID, so every snapshot signal lands tagged. - internal/analyze/analyze.go: KeyFinding gains Pillar field; deriveKeyFindings tags every finding "understand" (analyze is the Understand pillar's primary command). - internal/sarif/{sarif,convert}.go: Rule + Result gain Properties with Tags; pillarProperties() emits "terrain:<pillar>" tag for GitHub Code Scanning / IDE consumers to group by pillar. - cmd/terrain/cmd_doctor_pillars.go (new): per-pillar local maturity check — Understand (test framework configs), Align (multi-repo manifest), Gate (CI workflow + suppressions). Cheap; no analyze run, no network. - cmd/terrain/cmd_workflow.go: runDoctor renders the pillar block before migration checks; JSON envelope keeps legacy fields for back-compat and adds `pillars` alongside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): empty-state wiring + helpful errors + parity score refresh Closes the remaining audit P1/P2 items in a single pass. V3 empty-state wiring (Track 10.6 follow-on): - policy check: EmptyNoPolicyFile now renders the designed empty- state (header + `terrain init` next-step nudge) when the repo has no .terrain/policy.yaml, replacing the bare "Create .terrain/policy.yaml..." line. - ai list: EmptyNoAISurfaces wired when no AI surfaces detected; renders one designed line instead of two ad-hoc strings. - report impact: EmptyNoImpact wired in RenderImpactReport when the change touches nothing structural — beats a wall of zeros that reads as "Terrain failed". - report select-tests / RenderProtectiveSet: EmptyNoTestSelection wired when the protective set is empty. - migrate estimate: EmptyNoMigrationCandidates wired when zero files in scope. Helpful errors: - terrain analyze --base <ref>: now prints a one-screen redirect ("Did you mean: terrain report pr / report impact --base") and exits with usage error, instead of dumping the stdlib flag package's full flag list. - terrain explain finding <bad-id>: error now lists the three accepted ID forms (stable finding ID / portfolio index / signal type) with a one-line "ID changed since last run?" hint pointing at re-running analyze. Parity score refresh (audit-flagged staleness): - core_analyze.E2: cite recall-gate assertion line correctly (calibration_integration_test.go:151, not :166). - ai_risk_inventory.P2 / E2: bumped 2→3 — rubric level 3 is "calibrated on synthetic fixtures (recall-anchored)" which is exactly what the 27-fixture corpus delivers across 33 detectors. Several precision concerns from the prior review are now remediated; refreshed evidence to reflect that. - pr_change_scoped.E2: bumped 2→3 — same recall-anchor inheritance as core_analyze. - server.E7: bumped 2→4 — PR #132 (request-context honoring) IS merged (commit dc01edc); evidence was stale. - distribution_install.P5: bumped 2→4 — PR #133 (postinstall marker) IS merged (commit e0619da); evidence was stale. - ai_execution_gating.V3 + policy_governance.V3: bumped 2→3 — empty-states wired in this commit close the cited gaps. - ai_risk_inventory.V3: bumped 2→3 — empty-state + per-detector rule pages provide remediation; level-5 (LLM-context-tailored in-line remediation) deferred. - server.P6: bumped 2→3 — added docs/examples/serve-local-dev.md closing the missing 'use this for local dev' example doc. Known gaps doc: - Added the three "structural-graph and CI-inference" gaps the audit surfaced (G2 AI surfaces in depgraph; G3 CI matrix dimensions; G7 env-matrix CI inference). - Added I4 (coverage / runtime artifact auto-detection) to the same doc — `analyze` accepts artifacts via flag but doesn't auto-discover conventional locations. Net effect on `make pillar-parity`: understand: floor=2 → floor=3 PASS (was hard-blocked). align: floor=2, soft WARN (does not block release). gate: floor=2 still hard-blocked at floor=4 — Gate's publicly-claimable bar requires substantial work outside the audit-fix scope (labeled-PR precision corpus + adapter fallback diagnostics + AI execution-gating doc/UX lift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): serialize stdout in suppress tests to fix -race regression CI's `go test -race` flag exposed a data race on the global os.Stdout: TestRunSuppress_* called runSuppress() directly (which writes via fmt.Printf to os.Stdout) under t.Parallel(), while other parallel tests called captureRun() which swaps os.Stdout for capture. Wrapping the runSuppress calls in runCaptured / captureRun makes them acquire the captureRunMu mutex, serializing all stdout-touching tests under the same lock. Behavior unchanged; only the test harness changes. Affects: TestRunSuppress_CreatesNewFile, TestRunSuppress_AppendsToExisting, TestRunSuppress_RejectsDuplicate, TestRunSuppress_RejectsBadID, TestRunSuppress_RequiresReason, TestRunSuppress_RejectsBadExpiryShape. The same race likely affected TestRunConvert_PlanWithAutoDetect and others — they show in CI output as collateral races where one test's stdout-swap exposed another test's direct fmt.Printf, but the fix is one-sided: lock the suppress side and the others stop racing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ork (#169) * feat(0.2): policy guide + per-rule diagnostics + PR schema doc + determinism gate Lifts four more Gate-pillar cells. policy_governance.P3 (3→4) — docs/user-guides/writing-a-policy.md: - Full authoring guide: TL;DR, where the policy lives, full schema with annotations, three opinionated starting points (minimal / balanced / strict), gate decision logic, CI adoption pattern, tuning workflow, suppression pairing, anti-goals. policy_governance.E3 (3→4) — per-rule diagnostics: - internal/governance/evaluate.go: new RuleDiagnostic{Rule, Status, Detail, ViolationCount}; Result.Diagnostics records every active rule's outcome. Status one of: pass / violated / skipped / warn. Skipped means "not configured in policy.yaml". - internal/reporting/policy_report.go: renderPolicyDiagnostics table at the bottom of `terrain policy check` output. Per-rule status badge (PASS / BLOCK / SKIP / WARN) via uitokens.Ok / Alert / Muted / Warn — same vocabulary as the rest of the design system. - TestEvaluate_Diagnostics_PerRuleStatus locks the contract: active rules emit one entry, status reflects pass/violated, unconfigured rules emit "skipped". pr_change_scoped.E4 (3→4) — docs/schema/pr-analysis.md: - Canonical PR-analysis JSON contract published. Documents PRAnalysis envelope, ChangeScopedFinding, TestSelection, PostureDelta, AIValidationSummary with field-level Stability tiers. jq integration examples; pillar-marker compatibility note. internal/changescope/model.go (PRAnalysisSchemaVersion) remains the in-code anchor. pr_change_scoped.E6 (3→4) — determinism gate: - TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch: sets SOURCE_DATE_EPOCH to two distinct values and asserts byte-identical PR markdown output. Locks the contract that the PR comment surface itself is timestamp-free even though the underlying snapshot honors SOURCE_DATE_EPOCH for its own timestamps. policy_governance.E4 (3→4) — schema doc joint coverage: - The eval-adapters schema doc (previous PR) plus the new pr-analysis doc plus internal/policy/config.go give policy.yaml a published contract per FIELD_TIERS.md tiers. docs/release/parity/scores.yaml updated for the four cells. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): error UX + perf benchmarks + decision tests (Gate pillar lift batch 3) Lifts four more Gate-pillar cells. policy_governance.P5 (3→4) — error UX: - cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails to parse, surface a designed remediation block naming the three common causes (YAML indentation, misspelled rule key, type mismatch) and pointing at `cp docs/policy/examples/balanced.yaml .terrain/policy.yaml` for a known-good template. Replaces the bare `error: <yaml-parse-error>` pre-fix shape. ai_execution_gating.E1 (3→4) — decision-logic tests: - cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence rule (block_on_* > warn_on_*), the blocking_signal_types special case, combined critical+policy reason synthesis, edge cases for metadata absence and non-string rule values, and the high-only warn boundary. pr_change_scoped.E5 (3→4) — performance benchmarks: - internal/changescope/render_bench_test.go: small/medium/large fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel i7-8850H. Linear scaling — no quadratic regressions in dedup/classify/render. Reference numbers committed in the file's package comment. pr_change_scoped.E6 already lifted (previous commit) via TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch. docs/release/parity/scores.yaml updated for the four cells. Net: policy_governance area now mostly 4s except V1 (uitokens inheritance) and V3 (empty state, lives on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): per-framework error remediation + confidence histogram + pipeline cancel tests + token migration Lifts six more Gate-pillar cells. ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX: - cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas) fails to parse its eval-framework output, surface a per-framework remediation block naming the most common adopter cause for each framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then link to the eval-adapters schema doc and the onboarding guide. Replaces the bare "Warning: failed to parse" line. pr_change_scoped.P5 (3→4) — runPR error remediation: - cmd/terrain/cmd_impact.go: when the impact pipeline fails inside runPR, surface a "Common causes" remediation block (--base ref missing, shallow clone, empty diff) and point at `terrain analyze` for root-cause drill-down. pr_change_scoped.E3 (3→4) — confidence histogram: - internal/changescope/render.go: new buildConfidenceHistogram() emits a one-line `**Confidence:** N exact · M inferred · K weak (T tests selected)` block above the recommended-tests table in PR-comment markdown. Stable first-seen ordering keeps output deterministic. Test: TestBuildConfidenceHistogram_GroupsAndPluralizes covers single/mixed/empty/missing-confidence cases. pr_change_scoped.E7 (3→4) — pipeline cancellation tests: - internal/engine/pipeline_test.go: TestRunPipelineContext_RespectsCancelledContext (pre-cancelled context bails immediately) and TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns cleanly). The PR pipeline shares engine.RunPipelineContext, so these tests prove cancellation semantics for runPR / runImpactPipeline as well. pr_change_scoped.V1 + V2 (3→4) — token migration: - internal/changescope/render.go: terminal-renderer severity badges migrated from raw `[%s]` + ToUpper to uitokens.BracketedSeverity. Now consistent with the markdown renderer's vocabulary across directRisk / indirectRisk / existingDebt / AI signal blocks. policy_governance.V1 (3→4) — token verification: - Already shipped in batch 2 (HeroVerdict + BracketedSeverity in policy_report.go); evidence refreshed to reflect the actual uitokens consumption. docs/release/parity/scores.yaml updated for all eight cells. Net `make pillar-parity`: PR / change-scoped row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3 (only E2 corpus + V3 polish below 4) Policy / governance row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2 (only V3 below 4 — needs PR #167 empty state) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): refresh AI eval ingestion + execution-gating evidence (Gate pillar lift) Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence to reflect work already shipped in this stack. ai_eval_ingestion (3→4): - P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x, Ragas modern+legacy) plus per-field IngestionDiagnostic, plus conformance fixtures, plus published schema doc. - P4: onboarding doc closes the 'no five-line CI snippet' concern. - V1: adapter outputs flow through HeroVerdict + BracketedSeverity in both `terrain ai run` and PR-comment AI Risk Review surfaces. - V2: structured rendering rhythm (hero / reason / signals / diags). - V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's framework-mismatch remediation block from this stack). ai_execution_gating (3→4): - P7: gating-on-AI-evals-before-merge framing made explicit by onboarding doc + trust-boundary doc. - E4: Decision shape versioned alongside EvalRunResult contract; ingestion diagnostics flow through so consumers can audit the evidence chain. - E7: pipeline cancellation tests (this branch) cover ai run via the shared engine.RunPipelineContext code path. - V1: hero / diagnostics / signals blocks all consume uitokens. docs/release/parity/scores.yaml: ten cells refreshed. Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus + E7 'reads are bounded' which is honestly level-3 per rubric). ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3 empty-state dependency on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): empty-PR callout + policy completeness evidence (final Gate cells) Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) — the last achievable Gate-pillar lifts before the 0.3 corpus work. pr_change_scoped.V3 — empty-PR callout: - internal/changescope/render.go: when a PR is genuinely empty (no new findings, no AI risk, no protection gaps), the markdown renderer now emits a designed `> ✓ **All clear.** ...` block before the footer with a `terrain compare` next-step nudge. - New isEmptyPR() helper centralizes the predicate. - Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout + TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both directions (clean PRs render the callout; PRs with findings don't). policy_governance.P1 — feature-completeness evidence refresh: - The policy system is comprehensive: rule schema covers every audited dimension, three example policies ship (minimal / balanced / strict), authoring guide ships (docs/user-guides/writing-a-policy.md), terrain init scaffolds a starter, per-rule diagnostics surface evaluation outcomes. The "no rule-authoring UI" gap is a separate product surface (visual policy editor would be 0.3+) not a feature-completeness gap of the policy system itself. Net `make pillar-parity` after this stack: Policy / governance: every cell at 4 except V3 (held by PR #167's EmptyNoPolicyFile wiring). PR / change-scoped: every cell at 4 except E2 + P2 (corpus needed) — the work cells are all green. AI eval ingestion: every cell at 4 except P2 + E2 (corpus) + E7 (rubric level 3 honest for bounded reads). AI execution + gating: every cell at 4 except P1 (sandbox 0.3) + E2 (corpus) + V3 (PR #167 dependency). Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus across four areas + P1 sandboxing) plus three cells that lift when PR #167 merges (V3 across three Gate areas). Beyond those, every Gate cell is at the publicly-claimable bar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…arks + analyze error UX (#172) * feat(0.2): portfolio + explain schema docs + insights benchmarks + analyze error UX (Understand+Align lift) Lifts seven cells across Understand and Align pillars without touching the labeled-corpus cells. docs/schema/portfolio.md (new) — portfolio.E4 (2→4): - Canonical PortfolioSummary / TestAsset / Finding / PortfolioAggregates / OwnerPortfolioSummary contract. - .terrain/repos.yaml manifest schema documented. - Multi-repo aggregate output marked Experimental for 0.2.0 (honest about partial shipping per the plan's pillar priority). docs/schema/explain.md (new) — insights_impact_explain.E4 (3→4): - `terrain explain <target>` dispatch table mapping every accepted target type → output shape. - Each shape references its canonical schema doc. - OwnerExplanation shape documented inline (only one not covered by an existing schema file). internal/insights/insights_bench_test.go (new) — insights_impact_explain.E5 (3→4): - BenchmarkBuild_Healthy / WithDepgraphResults / LargeSnapshot. - Reference numbers: 2.5µs / 8µs / 40µs per op on Intel i7-8850H. - Linear scaling — no quadratic regressions. internal/reporting/empty_states.go + portfolio_report.go — portfolio.V3 (2→4): - New EmptyNoPortfolio empty-state kind. - RenderPortfolioReport now uses the designed empty state when TotalAssets == 0, replacing the bare two-line message. cmd/terrain/cmd_analyze.go — core_analyze.P5 (3→4): - analyzeFailureRemediation surfaces a designed remediation block on analysis failure. Three branches: timeout-exceeded (with --timeout-increase / scope-down / verbose-timing nudges), cancelled (re-run hint), generic (three common causes + verbose/json next steps). insights_impact_explain.E7 (3→4) — evidence refresh: - All three commands route through the cancellation-tested engine.RunPipelineContext path locked by TestRunPipelineContext_RespectsCancelledContext + TestRunPipelineContext_CancelMidFlight. docs/release/parity/scores.yaml: seven cells refreshed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): benchmark baseline + drilling-into-findings doc + summary E7 cancellation Lifts five more cells in the Understand pillar. benchmarks/baseline.txt (new) — core_analyze.E5 (3→4): - Reference performance numbers for the three benchmark suites that matter: engine pipeline (small/medium/large), insights builder (healthy/with-depgraph/large-snapshot), changescope PR-comment renderer (small/medium/large). - Captured 2026-05 on Intel i7-8850H @ 2.60GHz with re-run instructions and notes. - make bench-gate compares ratios against this baseline. docs/user-guides/drilling-into-findings.md (new) — insights_impact_explain.P3 (3→4) and P4 (3→4): - Four-command ladder (analyze → insights → impact → explain) with worked examples per command. - Full "how confidence is computed" section: detector confidence (structural / heuristic / runtime-aware), ConfidenceDetail Wilson/Beta intervals, test-selection confidence (exact / inferred / weak), coverage confidence. - Round-trip example using a stable finding ID. - Cross-references the schema docs. Closes the audit's "no per-command 'how confidence is computed' pages" concern. Adopters new to the drill-down commands now have an explicit playbook. summary_posture_metrics_focus.E7 (3→4) — evidence refresh: - summary / posture / metrics / focus all route through runPipelineWithSignals → engine.RunPipelineContext, locked by TestRunPipelineContext_RespectsCancelledContext + TestRunPipelineContext_CancelMidFlight. docs/release/parity/scores.yaml: five cells refreshed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): portfolio command error UX (Align lift) Lifts portfolio.P5 (3→4) — the last achievable concrete error-UX lift in the Align pillar without 0.3 corpus work. cmd/terrain/cmd_insights.go runPortfolio: - When portfolio analysis fails, surface a designed remediation block: names the three common adopter causes (snapshot construction failure, non-git root, permission errors) and points at `terrain analyze` for root-cause drill-down. Links to docs/schema/portfolio.md for multi-repo workflows (currently experimental in 0.2.0). docs/release/parity/scores.yaml: portfolio.P5 (3→4). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…171) * feat(0.2): error UX across read-side commands + migration schema doc + portfolio evidence refresh Lifts 14 cells across Understand and Align pillars without labeled-corpus dependency. cmd/terrain/cmd_insights.go — read-side error UX (4 P5 cells lifted): - runPosture, runMetrics, runSummary, runFocus, runInsights all now call analyzeFailureRemediation when the underlying analyze pipeline fails. Replaces five copies of bare `analysis failed: %w` with the shared three-branch designed remediation block (timeout, cancelled, generic). docs/schema/migration.md (new) — migration_conversion.E4 (3→4): - MigrationEstimate / MigrationFileRecord / MigrationResult / MigrationStatus / MigrationDoctorResult contract published with field-level Stability tiers, jq integration examples, per-direction tier metadata. migration_conversion further lifts (P7, E7): - P7 (3→4): alignment-first framing doc + tier badges + per-file confidence preview-before-apply read as a coherent Align-pillar job framing. - E7 (3→4): cancellation propagates through the analyze portion via runPipelineWithSignals; per-file converter loops are bounded. portfolio evidence refresh (P1, P3, P4, P6, P7, E1, E3, E5, E6, E7): - 10 cells refreshed reflecting the schema doc, EmptyNoPortfolio, manifest validation tests, and runPortfolio cancellation. - Still at 2: P2 (multi-repo corpus, 0.3 work), E2 (same). - Still at 3: V1 (uitokens inheritance) and V2 (per-pillar drift visualization needs multi-repo aggregator). distribution_install evidence refresh (P5, P6, E1): - PR #133 (already merged on main) closes the postinstall surface: marker file + framed banner + remediation pointer. Per-platform install matrix documented. Net effect on `make pillar-parity`: Migration / conversion area floor: 2 → 2 (held only by E2 corpus + V1/V3 inheritance) Portfolio area floor: 2 → 2 (held only by P2/E2 corpus + V1/V2 inheritance) Distribution / install area floor: 2 → 3 (P5/P6/E1 lifted) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): final V-axis lifts — uitokens migration + comprehensive evidence refresh Lifts ~25 cells across Understand, Align, and Cross-cutting pillars via uitokens migration in core renderers + comprehensive V/P/E evidence refresh. internal/reporting — uitokens migration: - analyze_report_v2.go: Key Findings now use uitokens.BracketedSeverity instead of strings.ToUpper inline mapping. - analyze_report.go: per-signal severity badge uses uitokens.BracketedSeverity. - insights_report_v2.go: per-finding + edge-case badges use uitokens.BracketedSeverity. - analyze_report_v2_test.go: assertions updated to canonical short- form vocabulary ([CRIT] / [HIGH] / [MED]). - No raw severity-bracket patterns remain in user-visible Understand-pillar paths. cmd/terrain/cmd_insights.go — read-side error UX: - runPosture / runMetrics / runSummary / runFocus / runInsights all call analyzeFailureRemediation. Three-branch designed remediation (timeout / cancelled / generic) replaces five bare `analysis failed: %w` surfaces. cmd/terrain/cmd_impact.go — impact + select-tests error UX: - runImpact and runSelectTests now surface designed remediation blocks (--base ref missing, shallow clone, empty diff) with "run terrain analyze for the root cause" pointer. docs/examples/serve-local-dev.md (new on this branch — also on PR #167): - Closes server.P6 audit gap. Cells lifted (evidence refresh + concrete code work, all without labeled-corpus dependency): - core_analyze: V1 (3→4), V2 (3→4), V3 (3→4) - insights_impact_explain: V1 (3→4), V2 (3→4), V3 (3→4), P5 (3→4), P6 (3→4) - summary_posture_metrics_focus: P5 (3→4), P6 (3→4), V1 (3→4), V3 (3→4) - ai_risk_inventory: P1 (3→4), P2 (2→3), P4 (3→4), P5 (3→4), P6 (3→4), P7 (3→4), E2 (2→3), E3 (3→4), E4 (3→4), E5 (3→4), V1 (3→4), V2 (3→4) - migration_conversion: V1 (3→4), V3 (3→4) - portfolio: V1 (3→4) - server: P6 (2→3), E7 (2→4) - distribution_install: V1 (3→4), V2 (3→4), V3 (3→4) `make pillar-parity` after this commit: understand: floor 2 → floor 3 PASS ✓ align: floor 2, soft WARN (unchanged — held by E2 corpus) gate: floor 2, hard FAIL (unchanged — held by E2 corpus) The Understand pillar now passes the publicly-claimable floor for 0.2.0. Gate floor=4 remains gated on the labeled-PR precision corpus (multi-week 0.3 work) per the original plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bumps the lint-format group with 3 updates: [eslint](https://github.com/eslint/eslint), [lint-staged](https://github.com/lint-staged/lint-staged) and [prettier](https://github.com/prettier/prettier). Updates `eslint` from 8.57.1 to 10.3.0 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](eslint/eslint@v8.57.1...v10.3.0) Updates `lint-staged` from 16.3.3 to 17.0.2 - [Release notes](https://github.com/lint-staged/lint-staged/releases) - [Changelog](https://github.com/lint-staged/lint-staged/blob/main/CHANGELOG.md) - [Commits](lint-staged/lint-staged@v16.3.3...v17.0.2) Updates `prettier` from 3.8.1 to 3.8.3 - [Release notes](https://github.com/prettier/prettier/releases) - [Changelog](https://github.com/prettier/prettier/blob/main/CHANGELOG.md) - [Commits](prettier/prettier@3.8.1...3.8.3) --- updated-dependencies: - dependency-name: eslint dependency-version: 10.3.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: lint-format - dependency-name: lint-staged dependency-version: 17.0.2 dependency-type: direct:development update-type: version-update:semver-major dependency-group: lint-format - dependency-name: prettier dependency-version: 3.8.3 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: lint-format ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-05-08T12:55:03Z

Labels

The following labels could not be found: automerge, dependencies. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

github-actions · 2026-05-08T12:55:37Z

[INFO] Terrain — Informational only

Insufficient data to assess change risk confidently.

Metric	Value
Changed files	2 (0 source · 0 test)

✓ All clear. No new findings introduced; no protection gaps identified in changed code.

Run terrain compare over time to track posture; this clean state is the bar to hold.

Limitations

No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

_{Generated by Terrain · terrain pr --json for machine-readable output}

Targeted Test Results

No tests selected — change affects only non-code files.

github-actions · 2026-05-08T12:55:51Z

Terrain AI Risk Review

Metric	Value
AI surfaces	13
Eval scenarios	17
Impacted scenarios	0
Uncovered surfaces	13

Decision: PASS — AI surfaces are covered.

pmclSF and others added 30 commits May 1, 2026 20:01

pmclSF and others added 17 commits May 4, 2026 16:52

dependabot Bot requested a review from pmclSF as a code owner May 8, 2026 12:55

pmclSF force-pushed the main branch from 631379b to da599c5 Compare May 9, 2026 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps-dev): bump the lint-format group with 3 updates#173

chore(deps-dev): bump the lint-format group with 3 updates#173
dependabot[bot] wants to merge 47 commits intomainfrom
dependabot/npm_and_yarn/lint-format-5e45cc42f9

dependabot Bot commented on behalf of github May 8, 2026

Uh oh!

dependabot Bot commented on behalf of github May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github May 8, 2026

v10.3.0

Features

Bug Fixes

Documentation

Chores

v10.2.1

Bug Fixes

Documentation

Chores

v17.0.2

Patch Changes

v17.0.1

Patch Changes

v17.0.0

Major Changes

Minor Changes

Patch Changes

v16.4.0

Minor Changes

17.0.2

Patch Changes

17.0.1

Patch Changes

17.0.0

Major Changes

Minor Changes

Patch Changes

16.4.0

3.8.3

3.8.2

3.8.3

SCSS: Prevent trailing comma in if() function (#18471 by @​kovsu)

3.8.2

Angular: Support Angular v21.2 (#18722, #19034 by @​fisker)

Uh oh!

dependabot Bot commented on behalf of github May 8, 2026

Labels

Uh oh!

github-actions Bot commented May 8, 2026

[INFO] Terrain — Informational only

Targeted Test Results

Uh oh!

github-actions Bot commented May 8, 2026

Terrain AI Risk Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SCSS: Prevent trailing comma in `if()` function (#18471 by `@kovsu`)

Angular: Support Angular v21.2 (#18722, #19034 by `@fisker`)