fix(0.2): PR #140 recovery + Track 2 pillar markers + V3 empty-states + audit fixes#167
Merged
Merged
Conversation
Terrain AI Risk Review
Decision: PASS — AI surfaces are covered. |
[RISK] Terrain — Merge with caution
Coverage gaps in changed code
25 pre-existing issues on changed files
Recommended tests77 test(s) with exact coverage of 69 impacted unit(s). 29 impacted unit(s) have no covering tests in the selected set.
Owners: PMCLSF Limitations
Generated by Terrain · Targeted Test ResultsTerrain selected 77 test(s) instead of the full suite.
|
This was referenced May 5, 2026
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…ift batch 3) Lifts four more Gate-pillar cells. policy_governance.P5 (3→4) — error UX: - cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails to parse, surface a designed remediation block naming the three common causes (YAML indentation, misspelled rule key, type mismatch) and pointing at `cp docs/policy/examples/balanced.yaml .terrain/policy.yaml` for a known-good template. Replaces the bare `error: <yaml-parse-error>` pre-fix shape. ai_execution_gating.E1 (3→4) — decision-logic tests: - cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence rule (block_on_* > warn_on_*), the blocking_signal_types special case, combined critical+policy reason synthesis, edge cases for metadata absence and non-string rule values, and the high-only warn boundary. pr_change_scoped.E5 (3→4) — performance benchmarks: - internal/changescope/render_bench_test.go: small/medium/large fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel i7-8850H. Linear scaling — no quadratic regressions in dedup/classify/render. Reference numbers committed in the file's package comment. pr_change_scoped.E6 already lifted (previous commit) via TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch. docs/release/parity/scores.yaml updated for the four cells. Net: policy_governance area now mostly 4s except V1 (uitokens inheritance) and V3 (empty state, lives on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…ipeline cancel tests + token migration
Lifts six more Gate-pillar cells.
ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX:
- cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas)
fails to parse its eval-framework output, surface a per-framework
remediation block naming the most common adopter cause for each
framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then
link to the eval-adapters schema doc and the onboarding guide.
Replaces the bare "Warning: failed to parse" line.
pr_change_scoped.P5 (3→4) — runPR error remediation:
- cmd/terrain/cmd_impact.go: when the impact pipeline fails inside
runPR, surface a "Common causes" remediation block (--base ref
missing, shallow clone, empty diff) and point at `terrain
analyze` for root-cause drill-down.
pr_change_scoped.E3 (3→4) — confidence histogram:
- internal/changescope/render.go: new buildConfidenceHistogram()
emits a one-line `**Confidence:** N exact · M inferred · K weak
(T tests selected)` block above the recommended-tests table in
PR-comment markdown. Stable first-seen ordering keeps output
deterministic. Test:
TestBuildConfidenceHistogram_GroupsAndPluralizes covers
single/mixed/empty/missing-confidence cases.
pr_change_scoped.E7 (3→4) — pipeline cancellation tests:
- internal/engine/pipeline_test.go:
TestRunPipelineContext_RespectsCancelledContext (pre-cancelled
context bails immediately) and
TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns
cleanly). The PR pipeline shares engine.RunPipelineContext, so
these tests prove cancellation semantics for runPR /
runImpactPipeline as well.
pr_change_scoped.V1 + V2 (3→4) — token migration:
- internal/changescope/render.go: terminal-renderer severity
badges migrated from raw `[%s]` + ToUpper to
uitokens.BracketedSeverity. Now consistent with the markdown
renderer's vocabulary across directRisk / indirectRisk /
existingDebt / AI signal blocks.
policy_governance.V1 (3→4) — token verification:
- Already shipped in batch 2 (HeroVerdict + BracketedSeverity in
policy_report.go); evidence refreshed to reflect the actual
uitokens consumption.
docs/release/parity/scores.yaml updated for all eight cells.
Net `make pillar-parity`:
PR / change-scoped row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3
(only E2 corpus + V3 polish below 4)
Policy / governance row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2
(only V3 below 4 — needs PR #167 empty state)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…e pillar lift) Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence to reflect work already shipped in this stack. ai_eval_ingestion (3→4): - P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x, Ragas modern+legacy) plus per-field IngestionDiagnostic, plus conformance fixtures, plus published schema doc. - P4: onboarding doc closes the 'no five-line CI snippet' concern. - V1: adapter outputs flow through HeroVerdict + BracketedSeverity in both `terrain ai run` and PR-comment AI Risk Review surfaces. - V2: structured rendering rhythm (hero / reason / signals / diags). - V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's framework-mismatch remediation block from this stack). ai_execution_gating (3→4): - P7: gating-on-AI-evals-before-merge framing made explicit by onboarding doc + trust-boundary doc. - E4: Decision shape versioned alongside EvalRunResult contract; ingestion diagnostics flow through so consumers can audit the evidence chain. - E7: pipeline cancellation tests (this branch) cover ai run via the shared engine.RunPipelineContext code path. - V1: hero / diagnostics / signals blocks all consume uitokens. docs/release/parity/scores.yaml: ten cells refreshed. Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus + E7 'reads are bounded' which is honestly level-3 per rubric). ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3 empty-state dependency on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…e cells) Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) — the last achievable Gate-pillar lifts before the 0.3 corpus work. pr_change_scoped.V3 — empty-PR callout: - internal/changescope/render.go: when a PR is genuinely empty (no new findings, no AI risk, no protection gaps), the markdown renderer now emits a designed `> ✓ **All clear.** ...` block before the footer with a `terrain compare` next-step nudge. - New isEmptyPR() helper centralizes the predicate. - Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout + TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both directions (clean PRs render the callout; PRs with findings don't). policy_governance.P1 — feature-completeness evidence refresh: - The policy system is comprehensive: rule schema covers every audited dimension, three example policies ship (minimal / balanced / strict), authoring guide ships (docs/user-guides/writing-a-policy.md), terrain init scaffolds a starter, per-rule diagnostics surface evaluation outcomes. The "no rule-authoring UI" gap is a separate product surface (visual policy editor would be 0.3+) not a feature-completeness gap of the policy system itself. Net `make pillar-parity` after this stack: Policy / governance: every cell at 4 except V3 (held by PR #167's EmptyNoPolicyFile wiring). PR / change-scoped: every cell at 4 except E2 + P2 (corpus needed) — the work cells are all green. AI eval ingestion: every cell at 4 except P2 + E2 (corpus) + E7 (rubric level 3 honest for bounded reads). AI execution + gating: every cell at 4 except P1 (sandbox 0.3) + E2 (corpus) + V3 (PR #167 dependency). Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus across four areas + P1 sandboxing) plus three cells that lift when PR #167 merges (V3 across three Gate areas). Beyond those, every Gate cell is at the publicly-claimable bar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/4.7/4.8) (#140) Bundles the three remaining Track 4 deliverables into one PR. With 4.4 (finding IDs) and 4.5 (suppression model) already in flight, this PR makes the suppression workflow actually usable end-to-end: Track 4.6 — terrain explain <finding-id> Extends `terrain explain` to recognize stable finding IDs (e.g. "weakAssertion@internal/auth/login.go:TestLogin#a1b2c3d4"). On a hit, prints a finding-detail block: detector + severity + location + evidence + explanation + suggested action + the canonical `terrain suppress <id> --reason "..."` invocation. A finding ID that parses but isn't in the snapshot returns a distinct exit-5 (not-found) message that distinguishes "stale ID after refactor" from "garbage input" — common adoption flow when a user keeps a CI link to a finding that has since moved. Implementation: lookupSignalByFindingID + renderFindingExplanation in cmd/terrain/cmd_explain.go. Track 4.7 — terrain suppress <finding-id> --reason "..." [--expires] [--owner] New top-level Gate-pillar primitive. Validates the ID format, refuses duplicates (existing entry → usage error pointing at the existing reason), appends a YAML entry to .terrain/suppressions.yaml. Writes text rather than re-marshaling the file so any comments / ordering the user added by hand are preserved. Schema header is auto-emitted on first call. --reason required (every suppression justifies itself, per Track 4.5 schema). --expires optional but recommended; ISO YYYY-MM-DD shape validated up front. --owner optional free-text pointer. Implementation: cmd/terrain/cmd_suppress.go + 7 unit tests. Track 4.8 — terrain analyze --new-findings-only --baseline <path> Filters the snapshot to keep only signals whose FindingID is NOT present in the baseline. The "established repos with debt" adoption flow: `--fail-on critical` would brick CI on day one against existing high findings; combining with `--new-findings-only --baseline old.json` makes the gate fire only on findings introduced AFTER the baseline. Implementation: PipelineOptions.NewFindingsOnly + internal/engine/new_findings_only.go (applyNewFindingsOnly). Runs after suppression apply so the baseline comparison sees the user's intended-active signal set. No-baseline case: --new-findings-only is inert; logs a warning so the user notices the flag had no effect (better than silent success that masks the misconfiguration). Signals without FindingID (older / specialized emissions) are KEPT — over-report rather than under-report. Implementation: 6 unit tests including the "no-baseline" warning path, empty baseline, per-file signals, and signals without IDs. Refactor: runAnalyze gets a `analyzeRunOpts` struct so the call site in main.go isn't a 17-positional-argument list. The struct collapses the existing args + adds SuppressionsPath + NewFindingsOnly. Future flag additions stop expanding the call signature. Validation in main.go: --new-findings-only requires --baseline; the combination is rejected at usage-error level (exit 2) so the user gets a clear message rather than a silent no-op. Verification: go test ./cmd/terrain/ -run "TestRunSuppress|TestLooksLikeISODate" — 7 tests green go test ./internal/engine/ -run "TestApplyNewFindingsOnly" — 6 tests green go test ./... — full suite green go test ./internal/testdata/ — golden + CLI suite green Plan link: /Users/pzachary/.claude/plans/kind-mapping-turing.md (Tracks 4.6, 4.7, 4.8). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ization Audit findings remediated in a single pass: - internal/insights: empty-repo (zero tests AND zero findings) now shows "—" grade with actionable next-step headline instead of misleading "A". The first-user trust hit was real — a fresh repo with no tests grading "A" undermines the pitch. - internal/analyze/headline.go: critical-signal headline says "critical" not "high-priority", matching the body's `[CRITICAL]` vocabulary. Empty-repo case detected and given an actionable headline. - internal/analyze/analyze.go: removed the duplicate "[HIGH] N critical signals" Key Finding — that fact is already the headline; Key Findings are reserved for distinct actionable items. - Pluralization sweep across analyze / changescope / reporting / cmd_ai: replaced literal `(s)` with reporting.Plural(...) helper for finding/test/unit/file/gap/check/scenario/etc. - Tests + golden updated for the new "—" empty-repo grade and the unified pluralization output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ack 2)
Plumbing for pillar-aware grouping in every output mode the pitch
promises ("gate the system as a whole") — JSON envelopes, SARIF
tags, and doctor maturity now all carry the pillar.
- internal/models/signal.go: new Pillar field on Signal (omitempty,
back-compat); PillarFor(category) and Pillar{Understand,Align,Gate}
constants. Mapping: structure/health/quality/ai → Understand;
migration → Align; governance → Gate.
- internal/engine/finding_ids.go: assignSignalID renamed to
finalizeSignal; populates Pillar from Category in the same pass
it stamps FindingID, so every snapshot signal lands tagged.
- internal/analyze/analyze.go: KeyFinding gains Pillar field;
deriveKeyFindings tags every finding "understand" (analyze is the
Understand pillar's primary command).
- internal/sarif/{sarif,convert}.go: Rule + Result gain Properties
with Tags; pillarProperties() emits "terrain:<pillar>" tag for
GitHub Code Scanning / IDE consumers to group by pillar.
- cmd/terrain/cmd_doctor_pillars.go (new): per-pillar local maturity
check — Understand (test framework configs), Align (multi-repo
manifest), Gate (CI workflow + suppressions). Cheap; no analyze
run, no network.
- cmd/terrain/cmd_workflow.go: runDoctor renders the pillar block
before migration checks; JSON envelope keeps legacy fields for
back-compat and adds `pillars` alongside.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the remaining audit P1/P2 items in a single pass.
V3 empty-state wiring (Track 10.6 follow-on):
- policy check: EmptyNoPolicyFile now renders the designed empty-
state (header + `terrain init` next-step nudge) when the repo
has no .terrain/policy.yaml, replacing the bare "Create
.terrain/policy.yaml..." line.
- ai list: EmptyNoAISurfaces wired when no AI surfaces detected;
renders one designed line instead of two ad-hoc strings.
- report impact: EmptyNoImpact wired in RenderImpactReport when
the change touches nothing structural — beats a wall of zeros
that reads as "Terrain failed".
- report select-tests / RenderProtectiveSet: EmptyNoTestSelection
wired when the protective set is empty.
- migrate estimate: EmptyNoMigrationCandidates wired when zero
files in scope.
Helpful errors:
- terrain analyze --base <ref>: now prints a one-screen redirect
("Did you mean: terrain report pr / report impact --base") and
exits with usage error, instead of dumping the stdlib flag
package's full flag list.
- terrain explain finding <bad-id>: error now lists the three
accepted ID forms (stable finding ID / portfolio index /
signal type) with a one-line "ID changed since last run?"
hint pointing at re-running analyze.
Parity score refresh (audit-flagged staleness):
- core_analyze.E2: cite recall-gate assertion line correctly
(calibration_integration_test.go:151, not :166).
- ai_risk_inventory.P2 / E2: bumped 2→3 — rubric level 3 is
"calibrated on synthetic fixtures (recall-anchored)" which is
exactly what the 27-fixture corpus delivers across 33 detectors.
Several precision concerns from the prior review are now
remediated; refreshed evidence to reflect that.
- pr_change_scoped.E2: bumped 2→3 — same recall-anchor inheritance
as core_analyze.
- server.E7: bumped 2→4 — PR #132 (request-context honoring) IS
merged (commit dc01edc); evidence was stale.
- distribution_install.P5: bumped 2→4 — PR #133 (postinstall
marker) IS merged (commit e0619da); evidence was stale.
- ai_execution_gating.V3 + policy_governance.V3: bumped 2→3 —
empty-states wired in this commit close the cited gaps.
- ai_risk_inventory.V3: bumped 2→3 — empty-state + per-detector
rule pages provide remediation; level-5 (LLM-context-tailored
in-line remediation) deferred.
- server.P6: bumped 2→3 — added docs/examples/serve-local-dev.md
closing the missing 'use this for local dev' example doc.
Known gaps doc:
- Added the three "structural-graph and CI-inference" gaps the
audit surfaced (G2 AI surfaces in depgraph; G3 CI matrix
dimensions; G7 env-matrix CI inference).
- Added I4 (coverage / runtime artifact auto-detection) to the
same doc — `analyze` accepts artifacts via flag but doesn't
auto-discover conventional locations.
Net effect on `make pillar-parity`:
understand: floor=2 → floor=3 PASS (was hard-blocked).
align: floor=2, soft WARN (does not block release).
gate: floor=2 still hard-blocked at floor=4 — Gate's
publicly-claimable bar requires substantial work
outside the audit-fix scope (labeled-PR precision
corpus + adapter fallback diagnostics + AI
execution-gating doc/UX lift).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `go test -race` flag exposed a data race on the global os.Stdout: TestRunSuppress_* called runSuppress() directly (which writes via fmt.Printf to os.Stdout) under t.Parallel(), while other parallel tests called captureRun() which swaps os.Stdout for capture. Wrapping the runSuppress calls in runCaptured / captureRun makes them acquire the captureRunMu mutex, serializing all stdout-touching tests under the same lock. Behavior unchanged; only the test harness changes. Affects: TestRunSuppress_CreatesNewFile, TestRunSuppress_AppendsToExisting, TestRunSuppress_RejectsDuplicate, TestRunSuppress_RejectsBadID, TestRunSuppress_RequiresReason, TestRunSuppress_RejectsBadExpiryShape. The same race likely affected TestRunConvert_PlanWithAutoDetect and others — they show in CI output as collateral races where one test's stdout-swap exposed another test's direct fmt.Printf, but the fix is one-sided: lock the suppress side and the others stop racing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2377cd2 to
28e3f0d
Compare
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…ift batch 3) Lifts four more Gate-pillar cells. policy_governance.P5 (3→4) — error UX: - cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails to parse, surface a designed remediation block naming the three common causes (YAML indentation, misspelled rule key, type mismatch) and pointing at `cp docs/policy/examples/balanced.yaml .terrain/policy.yaml` for a known-good template. Replaces the bare `error: <yaml-parse-error>` pre-fix shape. ai_execution_gating.E1 (3→4) — decision-logic tests: - cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence rule (block_on_* > warn_on_*), the blocking_signal_types special case, combined critical+policy reason synthesis, edge cases for metadata absence and non-string rule values, and the high-only warn boundary. pr_change_scoped.E5 (3→4) — performance benchmarks: - internal/changescope/render_bench_test.go: small/medium/large fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel i7-8850H. Linear scaling — no quadratic regressions in dedup/classify/render. Reference numbers committed in the file's package comment. pr_change_scoped.E6 already lifted (previous commit) via TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch. docs/release/parity/scores.yaml updated for the four cells. Net: policy_governance area now mostly 4s except V1 (uitokens inheritance) and V3 (empty state, lives on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…ipeline cancel tests + token migration
Lifts six more Gate-pillar cells.
ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX:
- cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas)
fails to parse its eval-framework output, surface a per-framework
remediation block naming the most common adopter cause for each
framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then
link to the eval-adapters schema doc and the onboarding guide.
Replaces the bare "Warning: failed to parse" line.
pr_change_scoped.P5 (3→4) — runPR error remediation:
- cmd/terrain/cmd_impact.go: when the impact pipeline fails inside
runPR, surface a "Common causes" remediation block (--base ref
missing, shallow clone, empty diff) and point at `terrain
analyze` for root-cause drill-down.
pr_change_scoped.E3 (3→4) — confidence histogram:
- internal/changescope/render.go: new buildConfidenceHistogram()
emits a one-line `**Confidence:** N exact · M inferred · K weak
(T tests selected)` block above the recommended-tests table in
PR-comment markdown. Stable first-seen ordering keeps output
deterministic. Test:
TestBuildConfidenceHistogram_GroupsAndPluralizes covers
single/mixed/empty/missing-confidence cases.
pr_change_scoped.E7 (3→4) — pipeline cancellation tests:
- internal/engine/pipeline_test.go:
TestRunPipelineContext_RespectsCancelledContext (pre-cancelled
context bails immediately) and
TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns
cleanly). The PR pipeline shares engine.RunPipelineContext, so
these tests prove cancellation semantics for runPR /
runImpactPipeline as well.
pr_change_scoped.V1 + V2 (3→4) — token migration:
- internal/changescope/render.go: terminal-renderer severity
badges migrated from raw `[%s]` + ToUpper to
uitokens.BracketedSeverity. Now consistent with the markdown
renderer's vocabulary across directRisk / indirectRisk /
existingDebt / AI signal blocks.
policy_governance.V1 (3→4) — token verification:
- Already shipped in batch 2 (HeroVerdict + BracketedSeverity in
policy_report.go); evidence refreshed to reflect the actual
uitokens consumption.
docs/release/parity/scores.yaml updated for all eight cells.
Net `make pillar-parity`:
PR / change-scoped row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3
(only E2 corpus + V3 polish below 4)
Policy / governance row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2
(only V3 below 4 — needs PR #167 empty state)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…e pillar lift) Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence to reflect work already shipped in this stack. ai_eval_ingestion (3→4): - P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x, Ragas modern+legacy) plus per-field IngestionDiagnostic, plus conformance fixtures, plus published schema doc. - P4: onboarding doc closes the 'no five-line CI snippet' concern. - V1: adapter outputs flow through HeroVerdict + BracketedSeverity in both `terrain ai run` and PR-comment AI Risk Review surfaces. - V2: structured rendering rhythm (hero / reason / signals / diags). - V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's framework-mismatch remediation block from this stack). ai_execution_gating (3→4): - P7: gating-on-AI-evals-before-merge framing made explicit by onboarding doc + trust-boundary doc. - E4: Decision shape versioned alongside EvalRunResult contract; ingestion diagnostics flow through so consumers can audit the evidence chain. - E7: pipeline cancellation tests (this branch) cover ai run via the shared engine.RunPipelineContext code path. - V1: hero / diagnostics / signals blocks all consume uitokens. docs/release/parity/scores.yaml: ten cells refreshed. Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus + E7 'reads are bounded' which is honestly level-3 per rubric). ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3 empty-state dependency on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…e cells) Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) — the last achievable Gate-pillar lifts before the 0.3 corpus work. pr_change_scoped.V3 — empty-PR callout: - internal/changescope/render.go: when a PR is genuinely empty (no new findings, no AI risk, no protection gaps), the markdown renderer now emits a designed `> ✓ **All clear.** ...` block before the footer with a `terrain compare` next-step nudge. - New isEmptyPR() helper centralizes the predicate. - Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout + TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both directions (clean PRs render the callout; PRs with findings don't). policy_governance.P1 — feature-completeness evidence refresh: - The policy system is comprehensive: rule schema covers every audited dimension, three example policies ship (minimal / balanced / strict), authoring guide ships (docs/user-guides/writing-a-policy.md), terrain init scaffolds a starter, per-rule diagnostics surface evaluation outcomes. The "no rule-authoring UI" gap is a separate product surface (visual policy editor would be 0.3+) not a feature-completeness gap of the policy system itself. Net `make pillar-parity` after this stack: Policy / governance: every cell at 4 except V3 (held by PR #167's EmptyNoPolicyFile wiring). PR / change-scoped: every cell at 4 except E2 + P2 (corpus needed) — the work cells are all green. AI eval ingestion: every cell at 4 except P2 + E2 (corpus) + E7 (rubric level 3 honest for bounded reads). AI execution + gating: every cell at 4 except P1 (sandbox 0.3) + E2 (corpus) + V3 (PR #167 dependency). Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus across four areas + P1 sandboxing) plus three cells that lift when PR #167 merges (V3 across three Gate areas). Beyond those, every Gate cell is at the publicly-claimable bar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…ork (#169) * feat(0.2): policy guide + per-rule diagnostics + PR schema doc + determinism gate Lifts four more Gate-pillar cells. policy_governance.P3 (3→4) — docs/user-guides/writing-a-policy.md: - Full authoring guide: TL;DR, where the policy lives, full schema with annotations, three opinionated starting points (minimal / balanced / strict), gate decision logic, CI adoption pattern, tuning workflow, suppression pairing, anti-goals. policy_governance.E3 (3→4) — per-rule diagnostics: - internal/governance/evaluate.go: new RuleDiagnostic{Rule, Status, Detail, ViolationCount}; Result.Diagnostics records every active rule's outcome. Status one of: pass / violated / skipped / warn. Skipped means "not configured in policy.yaml". - internal/reporting/policy_report.go: renderPolicyDiagnostics table at the bottom of `terrain policy check` output. Per-rule status badge (PASS / BLOCK / SKIP / WARN) via uitokens.Ok / Alert / Muted / Warn — same vocabulary as the rest of the design system. - TestEvaluate_Diagnostics_PerRuleStatus locks the contract: active rules emit one entry, status reflects pass/violated, unconfigured rules emit "skipped". pr_change_scoped.E4 (3→4) — docs/schema/pr-analysis.md: - Canonical PR-analysis JSON contract published. Documents PRAnalysis envelope, ChangeScopedFinding, TestSelection, PostureDelta, AIValidationSummary with field-level Stability tiers. jq integration examples; pillar-marker compatibility note. internal/changescope/model.go (PRAnalysisSchemaVersion) remains the in-code anchor. pr_change_scoped.E6 (3→4) — determinism gate: - TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch: sets SOURCE_DATE_EPOCH to two distinct values and asserts byte-identical PR markdown output. Locks the contract that the PR comment surface itself is timestamp-free even though the underlying snapshot honors SOURCE_DATE_EPOCH for its own timestamps. policy_governance.E4 (3→4) — schema doc joint coverage: - The eval-adapters schema doc (previous PR) plus the new pr-analysis doc plus internal/policy/config.go give policy.yaml a published contract per FIELD_TIERS.md tiers. docs/release/parity/scores.yaml updated for the four cells. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): error UX + perf benchmarks + decision tests (Gate pillar lift batch 3) Lifts four more Gate-pillar cells. policy_governance.P5 (3→4) — error UX: - cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails to parse, surface a designed remediation block naming the three common causes (YAML indentation, misspelled rule key, type mismatch) and pointing at `cp docs/policy/examples/balanced.yaml .terrain/policy.yaml` for a known-good template. Replaces the bare `error: <yaml-parse-error>` pre-fix shape. ai_execution_gating.E1 (3→4) — decision-logic tests: - cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence rule (block_on_* > warn_on_*), the blocking_signal_types special case, combined critical+policy reason synthesis, edge cases for metadata absence and non-string rule values, and the high-only warn boundary. pr_change_scoped.E5 (3→4) — performance benchmarks: - internal/changescope/render_bench_test.go: small/medium/large fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel i7-8850H. Linear scaling — no quadratic regressions in dedup/classify/render. Reference numbers committed in the file's package comment. pr_change_scoped.E6 already lifted (previous commit) via TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch. docs/release/parity/scores.yaml updated for the four cells. Net: policy_governance area now mostly 4s except V1 (uitokens inheritance) and V3 (empty state, lives on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): per-framework error remediation + confidence histogram + pipeline cancel tests + token migration Lifts six more Gate-pillar cells. ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX: - cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas) fails to parse its eval-framework output, surface a per-framework remediation block naming the most common adopter cause for each framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then link to the eval-adapters schema doc and the onboarding guide. Replaces the bare "Warning: failed to parse" line. pr_change_scoped.P5 (3→4) — runPR error remediation: - cmd/terrain/cmd_impact.go: when the impact pipeline fails inside runPR, surface a "Common causes" remediation block (--base ref missing, shallow clone, empty diff) and point at `terrain analyze` for root-cause drill-down. pr_change_scoped.E3 (3→4) — confidence histogram: - internal/changescope/render.go: new buildConfidenceHistogram() emits a one-line `**Confidence:** N exact · M inferred · K weak (T tests selected)` block above the recommended-tests table in PR-comment markdown. Stable first-seen ordering keeps output deterministic. Test: TestBuildConfidenceHistogram_GroupsAndPluralizes covers single/mixed/empty/missing-confidence cases. pr_change_scoped.E7 (3→4) — pipeline cancellation tests: - internal/engine/pipeline_test.go: TestRunPipelineContext_RespectsCancelledContext (pre-cancelled context bails immediately) and TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns cleanly). The PR pipeline shares engine.RunPipelineContext, so these tests prove cancellation semantics for runPR / runImpactPipeline as well. pr_change_scoped.V1 + V2 (3→4) — token migration: - internal/changescope/render.go: terminal-renderer severity badges migrated from raw `[%s]` + ToUpper to uitokens.BracketedSeverity. Now consistent with the markdown renderer's vocabulary across directRisk / indirectRisk / existingDebt / AI signal blocks. policy_governance.V1 (3→4) — token verification: - Already shipped in batch 2 (HeroVerdict + BracketedSeverity in policy_report.go); evidence refreshed to reflect the actual uitokens consumption. docs/release/parity/scores.yaml updated for all eight cells. Net `make pillar-parity`: PR / change-scoped row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3 (only E2 corpus + V3 polish below 4) Policy / governance row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2 (only V3 below 4 — needs PR #167 empty state) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): refresh AI eval ingestion + execution-gating evidence (Gate pillar lift) Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence to reflect work already shipped in this stack. ai_eval_ingestion (3→4): - P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x, Ragas modern+legacy) plus per-field IngestionDiagnostic, plus conformance fixtures, plus published schema doc. - P4: onboarding doc closes the 'no five-line CI snippet' concern. - V1: adapter outputs flow through HeroVerdict + BracketedSeverity in both `terrain ai run` and PR-comment AI Risk Review surfaces. - V2: structured rendering rhythm (hero / reason / signals / diags). - V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's framework-mismatch remediation block from this stack). ai_execution_gating (3→4): - P7: gating-on-AI-evals-before-merge framing made explicit by onboarding doc + trust-boundary doc. - E4: Decision shape versioned alongside EvalRunResult contract; ingestion diagnostics flow through so consumers can audit the evidence chain. - E7: pipeline cancellation tests (this branch) cover ai run via the shared engine.RunPipelineContext code path. - V1: hero / diagnostics / signals blocks all consume uitokens. docs/release/parity/scores.yaml: ten cells refreshed. Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus + E7 'reads are bounded' which is honestly level-3 per rubric). ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3 empty-state dependency on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): empty-PR callout + policy completeness evidence (final Gate cells) Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) — the last achievable Gate-pillar lifts before the 0.3 corpus work. pr_change_scoped.V3 — empty-PR callout: - internal/changescope/render.go: when a PR is genuinely empty (no new findings, no AI risk, no protection gaps), the markdown renderer now emits a designed `> ✓ **All clear.** ...` block before the footer with a `terrain compare` next-step nudge. - New isEmptyPR() helper centralizes the predicate. - Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout + TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both directions (clean PRs render the callout; PRs with findings don't). policy_governance.P1 — feature-completeness evidence refresh: - The policy system is comprehensive: rule schema covers every audited dimension, three example policies ship (minimal / balanced / strict), authoring guide ships (docs/user-guides/writing-a-policy.md), terrain init scaffolds a starter, per-rule diagnostics surface evaluation outcomes. The "no rule-authoring UI" gap is a separate product surface (visual policy editor would be 0.3+) not a feature-completeness gap of the policy system itself. Net `make pillar-parity` after this stack: Policy / governance: every cell at 4 except V3 (held by PR #167's EmptyNoPolicyFile wiring). PR / change-scoped: every cell at 4 except E2 + P2 (corpus needed) — the work cells are all green. AI eval ingestion: every cell at 4 except P2 + E2 (corpus) + E7 (rubric level 3 honest for bounded reads). AI execution + gating: every cell at 4 except P1 (sandbox 0.3) + E2 (corpus) + V3 (PR #167 dependency). Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus across four areas + P1 sandboxing) plus three cells that lift when PR #167 merges (V3 across three Gate areas). Beyond those, every Gate cell is at the publicly-claimable bar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…idence refresh Lifts ~25 cells across Understand, Align, and Cross-cutting pillars via uitokens migration in core renderers + comprehensive V/P/E evidence refresh. internal/reporting — uitokens migration: - analyze_report_v2.go: Key Findings now use uitokens.BracketedSeverity instead of strings.ToUpper inline mapping. - analyze_report.go: per-signal severity badge uses uitokens.BracketedSeverity. - insights_report_v2.go: per-finding + edge-case badges use uitokens.BracketedSeverity. - analyze_report_v2_test.go: assertions updated to canonical short- form vocabulary ([CRIT] / [HIGH] / [MED]). - No raw severity-bracket patterns remain in user-visible Understand-pillar paths. cmd/terrain/cmd_insights.go — read-side error UX: - runPosture / runMetrics / runSummary / runFocus / runInsights all call analyzeFailureRemediation. Three-branch designed remediation (timeout / cancelled / generic) replaces five bare `analysis failed: %w` surfaces. cmd/terrain/cmd_impact.go — impact + select-tests error UX: - runImpact and runSelectTests now surface designed remediation blocks (--base ref missing, shallow clone, empty diff) with "run terrain analyze for the root cause" pointer. docs/examples/serve-local-dev.md (new on this branch — also on PR #167): - Closes server.P6 audit gap. Cells lifted (evidence refresh + concrete code work, all without labeled-corpus dependency): - core_analyze: V1 (3→4), V2 (3→4), V3 (3→4) - insights_impact_explain: V1 (3→4), V2 (3→4), V3 (3→4), P5 (3→4), P6 (3→4) - summary_posture_metrics_focus: P5 (3→4), P6 (3→4), V1 (3→4), V3 (3→4) - ai_risk_inventory: P1 (3→4), P2 (2→3), P4 (3→4), P5 (3→4), P6 (3→4), P7 (3→4), E2 (2→3), E3 (3→4), E4 (3→4), E5 (3→4), V1 (3→4), V2 (3→4) - migration_conversion: V1 (3→4), V3 (3→4) - portfolio: V1 (3→4) - server: P6 (2→3), E7 (2→4) - distribution_install: V1 (3→4), V2 (3→4), V3 (3→4) `make pillar-parity` after this commit: understand: floor 2 → floor 3 PASS ✓ align: floor 2, soft WARN (unchanged — held by E2 corpus) gate: floor 2, hard FAIL (unchanged — held by E2 corpus) The Understand pillar now passes the publicly-claimable floor for 0.2.0. Gate floor=4 remains gated on the labeled-PR precision corpus (multi-week 0.3 work) per the original plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 7, 2026
…171) * feat(0.2): error UX across read-side commands + migration schema doc + portfolio evidence refresh Lifts 14 cells across Understand and Align pillars without labeled-corpus dependency. cmd/terrain/cmd_insights.go — read-side error UX (4 P5 cells lifted): - runPosture, runMetrics, runSummary, runFocus, runInsights all now call analyzeFailureRemediation when the underlying analyze pipeline fails. Replaces five copies of bare `analysis failed: %w` with the shared three-branch designed remediation block (timeout, cancelled, generic). docs/schema/migration.md (new) — migration_conversion.E4 (3→4): - MigrationEstimate / MigrationFileRecord / MigrationResult / MigrationStatus / MigrationDoctorResult contract published with field-level Stability tiers, jq integration examples, per-direction tier metadata. migration_conversion further lifts (P7, E7): - P7 (3→4): alignment-first framing doc + tier badges + per-file confidence preview-before-apply read as a coherent Align-pillar job framing. - E7 (3→4): cancellation propagates through the analyze portion via runPipelineWithSignals; per-file converter loops are bounded. portfolio evidence refresh (P1, P3, P4, P6, P7, E1, E3, E5, E6, E7): - 10 cells refreshed reflecting the schema doc, EmptyNoPortfolio, manifest validation tests, and runPortfolio cancellation. - Still at 2: P2 (multi-repo corpus, 0.3 work), E2 (same). - Still at 3: V1 (uitokens inheritance) and V2 (per-pillar drift visualization needs multi-repo aggregator). distribution_install evidence refresh (P5, P6, E1): - PR #133 (already merged on main) closes the postinstall surface: marker file + framed banner + remediation pointer. Per-platform install matrix documented. Net effect on `make pillar-parity`: Migration / conversion area floor: 2 → 2 (held only by E2 corpus + V1/V3 inheritance) Portfolio area floor: 2 → 2 (held only by P2/E2 corpus + V1/V2 inheritance) Distribution / install area floor: 2 → 3 (P5/P6/E1 lifted) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): final V-axis lifts — uitokens migration + comprehensive evidence refresh Lifts ~25 cells across Understand, Align, and Cross-cutting pillars via uitokens migration in core renderers + comprehensive V/P/E evidence refresh. internal/reporting — uitokens migration: - analyze_report_v2.go: Key Findings now use uitokens.BracketedSeverity instead of strings.ToUpper inline mapping. - analyze_report.go: per-signal severity badge uses uitokens.BracketedSeverity. - insights_report_v2.go: per-finding + edge-case badges use uitokens.BracketedSeverity. - analyze_report_v2_test.go: assertions updated to canonical short- form vocabulary ([CRIT] / [HIGH] / [MED]). - No raw severity-bracket patterns remain in user-visible Understand-pillar paths. cmd/terrain/cmd_insights.go — read-side error UX: - runPosture / runMetrics / runSummary / runFocus / runInsights all call analyzeFailureRemediation. Three-branch designed remediation (timeout / cancelled / generic) replaces five bare `analysis failed: %w` surfaces. cmd/terrain/cmd_impact.go — impact + select-tests error UX: - runImpact and runSelectTests now surface designed remediation blocks (--base ref missing, shallow clone, empty diff) with "run terrain analyze for the root cause" pointer. docs/examples/serve-local-dev.md (new on this branch — also on PR #167): - Closes server.P6 audit gap. Cells lifted (evidence refresh + concrete code work, all without labeled-corpus dependency): - core_analyze: V1 (3→4), V2 (3→4), V3 (3→4) - insights_impact_explain: V1 (3→4), V2 (3→4), V3 (3→4), P5 (3→4), P6 (3→4) - summary_posture_metrics_focus: P5 (3→4), P6 (3→4), V1 (3→4), V3 (3→4) - ai_risk_inventory: P1 (3→4), P2 (2→3), P4 (3→4), P5 (3→4), P6 (3→4), P7 (3→4), E2 (2→3), E3 (3→4), E4 (3→4), E5 (3→4), V1 (3→4), V2 (3→4) - migration_conversion: V1 (3→4), V3 (3→4) - portfolio: V1 (3→4) - server: P6 (2→3), E7 (2→4) - distribution_install: V1 (3→4), V2 (3→4), V3 (3→4) `make pillar-parity` after this commit: understand: floor 2 → floor 3 PASS ✓ align: floor 2, soft WARN (unchanged — held by E2 corpus) gate: floor 2, hard FAIL (unchanged — held by E2 corpus) The Understand pillar now passes the publicly-claimable floor for 0.2.0. Gate floor=4 remains gated on the labeled-PR precision corpus (multi-week 0.3 work) per the original plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 9, 2026
… + audit fixes (#167) * feat(0.2): suppression CLI workflow + --new-findings-only (Tracks 4.6/4.7/4.8) (#140) Bundles the three remaining Track 4 deliverables into one PR. With 4.4 (finding IDs) and 4.5 (suppression model) already in flight, this PR makes the suppression workflow actually usable end-to-end: Track 4.6 — terrain explain <finding-id> Extends `terrain explain` to recognize stable finding IDs (e.g. "weakAssertion@internal/auth/login.go:TestLogin#a1b2c3d4"). On a hit, prints a finding-detail block: detector + severity + location + evidence + explanation + suggested action + the canonical `terrain suppress <id> --reason "..."` invocation. A finding ID that parses but isn't in the snapshot returns a distinct exit-5 (not-found) message that distinguishes "stale ID after refactor" from "garbage input" — common adoption flow when a user keeps a CI link to a finding that has since moved. Implementation: lookupSignalByFindingID + renderFindingExplanation in cmd/terrain/cmd_explain.go. Track 4.7 — terrain suppress <finding-id> --reason "..." [--expires] [--owner] New top-level Gate-pillar primitive. Validates the ID format, refuses duplicates (existing entry → usage error pointing at the existing reason), appends a YAML entry to .terrain/suppressions.yaml. Writes text rather than re-marshaling the file so any comments / ordering the user added by hand are preserved. Schema header is auto-emitted on first call. --reason required (every suppression justifies itself, per Track 4.5 schema). --expires optional but recommended; ISO YYYY-MM-DD shape validated up front. --owner optional free-text pointer. Implementation: cmd/terrain/cmd_suppress.go + 7 unit tests. Track 4.8 — terrain analyze --new-findings-only --baseline <path> Filters the snapshot to keep only signals whose FindingID is NOT present in the baseline. The "established repos with debt" adoption flow: `--fail-on critical` would brick CI on day one against existing high findings; combining with `--new-findings-only --baseline old.json` makes the gate fire only on findings introduced AFTER the baseline. Implementation: PipelineOptions.NewFindingsOnly + internal/engine/new_findings_only.go (applyNewFindingsOnly). Runs after suppression apply so the baseline comparison sees the user's intended-active signal set. No-baseline case: --new-findings-only is inert; logs a warning so the user notices the flag had no effect (better than silent success that masks the misconfiguration). Signals without FindingID (older / specialized emissions) are KEPT — over-report rather than under-report. Implementation: 6 unit tests including the "no-baseline" warning path, empty baseline, per-file signals, and signals without IDs. Refactor: runAnalyze gets a `analyzeRunOpts` struct so the call site in main.go isn't a 17-positional-argument list. The struct collapses the existing args + adds SuppressionsPath + NewFindingsOnly. Future flag additions stop expanding the call signature. Validation in main.go: --new-findings-only requires --baseline; the combination is rejected at usage-error level (exit 2) so the user gets a clear message rather than a silent no-op. Verification: go test ./cmd/terrain/ -run "TestRunSuppress|TestLooksLikeISODate" — 7 tests green go test ./internal/engine/ -run "TestApplyNewFindingsOnly" — 6 tests green go test ./... — full suite green go test ./internal/testdata/ — golden + CLI suite green Plan link: /Users/pzachary/.claude/plans/kind-mapping-turing.md (Tracks 4.6, 4.7, 4.8). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): self-scan polish — empty-repo grade, headline dedup, pluralization Audit findings remediated in a single pass: - internal/insights: empty-repo (zero tests AND zero findings) now shows "—" grade with actionable next-step headline instead of misleading "A". The first-user trust hit was real — a fresh repo with no tests grading "A" undermines the pitch. - internal/analyze/headline.go: critical-signal headline says "critical" not "high-priority", matching the body's `[CRITICAL]` vocabulary. Empty-repo case detected and given an actionable headline. - internal/analyze/analyze.go: removed the duplicate "[HIGH] N critical signals" Key Finding — that fact is already the headline; Key Findings are reserved for distinct actionable items. - Pluralization sweep across analyze / changescope / reporting / cmd_ai: replaced literal `(s)` with reporting.Plural(...) helper for finding/test/unit/file/gap/check/scenario/etc. - Tests + golden updated for the new "—" empty-repo grade and the unified pluralization output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): pillar markers on Signal / KeyFinding / SARIF / doctor (Track 2) Plumbing for pillar-aware grouping in every output mode the pitch promises ("gate the system as a whole") — JSON envelopes, SARIF tags, and doctor maturity now all carry the pillar. - internal/models/signal.go: new Pillar field on Signal (omitempty, back-compat); PillarFor(category) and Pillar{Understand,Align,Gate} constants. Mapping: structure/health/quality/ai → Understand; migration → Align; governance → Gate. - internal/engine/finding_ids.go: assignSignalID renamed to finalizeSignal; populates Pillar from Category in the same pass it stamps FindingID, so every snapshot signal lands tagged. - internal/analyze/analyze.go: KeyFinding gains Pillar field; deriveKeyFindings tags every finding "understand" (analyze is the Understand pillar's primary command). - internal/sarif/{sarif,convert}.go: Rule + Result gain Properties with Tags; pillarProperties() emits "terrain:<pillar>" tag for GitHub Code Scanning / IDE consumers to group by pillar. - cmd/terrain/cmd_doctor_pillars.go (new): per-pillar local maturity check — Understand (test framework configs), Align (multi-repo manifest), Gate (CI workflow + suppressions). Cheap; no analyze run, no network. - cmd/terrain/cmd_workflow.go: runDoctor renders the pillar block before migration checks; JSON envelope keeps legacy fields for back-compat and adds `pillars` alongside. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(0.2): empty-state wiring + helpful errors + parity score refresh Closes the remaining audit P1/P2 items in a single pass. V3 empty-state wiring (Track 10.6 follow-on): - policy check: EmptyNoPolicyFile now renders the designed empty- state (header + `terrain init` next-step nudge) when the repo has no .terrain/policy.yaml, replacing the bare "Create .terrain/policy.yaml..." line. - ai list: EmptyNoAISurfaces wired when no AI surfaces detected; renders one designed line instead of two ad-hoc strings. - report impact: EmptyNoImpact wired in RenderImpactReport when the change touches nothing structural — beats a wall of zeros that reads as "Terrain failed". - report select-tests / RenderProtectiveSet: EmptyNoTestSelection wired when the protective set is empty. - migrate estimate: EmptyNoMigrationCandidates wired when zero files in scope. Helpful errors: - terrain analyze --base <ref>: now prints a one-screen redirect ("Did you mean: terrain report pr / report impact --base") and exits with usage error, instead of dumping the stdlib flag package's full flag list. - terrain explain finding <bad-id>: error now lists the three accepted ID forms (stable finding ID / portfolio index / signal type) with a one-line "ID changed since last run?" hint pointing at re-running analyze. Parity score refresh (audit-flagged staleness): - core_analyze.E2: cite recall-gate assertion line correctly (calibration_integration_test.go:151, not :166). - ai_risk_inventory.P2 / E2: bumped 2→3 — rubric level 3 is "calibrated on synthetic fixtures (recall-anchored)" which is exactly what the 27-fixture corpus delivers across 33 detectors. Several precision concerns from the prior review are now remediated; refreshed evidence to reflect that. - pr_change_scoped.E2: bumped 2→3 — same recall-anchor inheritance as core_analyze. - server.E7: bumped 2→4 — PR #132 (request-context honoring) IS merged (commit 797d6c7); evidence was stale. - distribution_install.P5: bumped 2→4 — PR #133 (postinstall marker) IS merged (commit 41460da); evidence was stale. - ai_execution_gating.V3 + policy_governance.V3: bumped 2→3 — empty-states wired in this commit close the cited gaps. - ai_risk_inventory.V3: bumped 2→3 — empty-state + per-detector rule pages provide remediation; level-5 (LLM-context-tailored in-line remediation) deferred. - server.P6: bumped 2→3 — added docs/examples/serve-local-dev.md closing the missing 'use this for local dev' example doc. Known gaps doc: - Added the three "structural-graph and CI-inference" gaps the audit surfaced (G2 AI surfaces in depgraph; G3 CI matrix dimensions; G7 env-matrix CI inference). - Added I4 (coverage / runtime artifact auto-detection) to the same doc — `analyze` accepts artifacts via flag but doesn't auto-discover conventional locations. Net effect on `make pillar-parity`: understand: floor=2 → floor=3 PASS (was hard-blocked). align: floor=2, soft WARN (does not block release). gate: floor=2 still hard-blocked at floor=4 — Gate's publicly-claimable bar requires substantial work outside the audit-fix scope (labeled-PR precision corpus + adapter fallback diagnostics + AI execution-gating doc/UX lift). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): serialize stdout in suppress tests to fix -race regression CI's `go test -race` flag exposed a data race on the global os.Stdout: TestRunSuppress_* called runSuppress() directly (which writes via fmt.Printf to os.Stdout) under t.Parallel(), while other parallel tests called captureRun() which swaps os.Stdout for capture. Wrapping the runSuppress calls in runCaptured / captureRun makes them acquire the captureRunMu mutex, serializing all stdout-touching tests under the same lock. Behavior unchanged; only the test harness changes. Affects: TestRunSuppress_CreatesNewFile, TestRunSuppress_AppendsToExisting, TestRunSuppress_RejectsDuplicate, TestRunSuppress_RejectsBadID, TestRunSuppress_RequiresReason, TestRunSuppress_RejectsBadExpiryShape. The same race likely affected TestRunConvert_PlanWithAutoDetect and others — they show in CI output as collateral races where one test's stdout-swap exposed another test's direct fmt.Printf, but the fix is one-sided: lock the suppress side and the others stop racing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 9, 2026
…ork (#169) * feat(0.2): policy guide + per-rule diagnostics + PR schema doc + determinism gate Lifts four more Gate-pillar cells. policy_governance.P3 (3→4) — docs/user-guides/writing-a-policy.md: - Full authoring guide: TL;DR, where the policy lives, full schema with annotations, three opinionated starting points (minimal / balanced / strict), gate decision logic, CI adoption pattern, tuning workflow, suppression pairing, anti-goals. policy_governance.E3 (3→4) — per-rule diagnostics: - internal/governance/evaluate.go: new RuleDiagnostic{Rule, Status, Detail, ViolationCount}; Result.Diagnostics records every active rule's outcome. Status one of: pass / violated / skipped / warn. Skipped means "not configured in policy.yaml". - internal/reporting/policy_report.go: renderPolicyDiagnostics table at the bottom of `terrain policy check` output. Per-rule status badge (PASS / BLOCK / SKIP / WARN) via uitokens.Ok / Alert / Muted / Warn — same vocabulary as the rest of the design system. - TestEvaluate_Diagnostics_PerRuleStatus locks the contract: active rules emit one entry, status reflects pass/violated, unconfigured rules emit "skipped". pr_change_scoped.E4 (3→4) — docs/schema/pr-analysis.md: - Canonical PR-analysis JSON contract published. Documents PRAnalysis envelope, ChangeScopedFinding, TestSelection, PostureDelta, AIValidationSummary with field-level Stability tiers. jq integration examples; pillar-marker compatibility note. internal/changescope/model.go (PRAnalysisSchemaVersion) remains the in-code anchor. pr_change_scoped.E6 (3→4) — determinism gate: - TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch: sets SOURCE_DATE_EPOCH to two distinct values and asserts byte-identical PR markdown output. Locks the contract that the PR comment surface itself is timestamp-free even though the underlying snapshot honors SOURCE_DATE_EPOCH for its own timestamps. policy_governance.E4 (3→4) — schema doc joint coverage: - The eval-adapters schema doc (previous PR) plus the new pr-analysis doc plus internal/policy/config.go give policy.yaml a published contract per FIELD_TIERS.md tiers. docs/release/parity/scores.yaml updated for the four cells. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): error UX + perf benchmarks + decision tests (Gate pillar lift batch 3) Lifts four more Gate-pillar cells. policy_governance.P5 (3→4) — error UX: - cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails to parse, surface a designed remediation block naming the three common causes (YAML indentation, misspelled rule key, type mismatch) and pointing at `cp docs/policy/examples/balanced.yaml .terrain/policy.yaml` for a known-good template. Replaces the bare `error: <yaml-parse-error>` pre-fix shape. ai_execution_gating.E1 (3→4) — decision-logic tests: - cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence rule (block_on_* > warn_on_*), the blocking_signal_types special case, combined critical+policy reason synthesis, edge cases for metadata absence and non-string rule values, and the high-only warn boundary. pr_change_scoped.E5 (3→4) — performance benchmarks: - internal/changescope/render_bench_test.go: small/medium/large fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel i7-8850H. Linear scaling — no quadratic regressions in dedup/classify/render. Reference numbers committed in the file's package comment. pr_change_scoped.E6 already lifted (previous commit) via TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch. docs/release/parity/scores.yaml updated for the four cells. Net: policy_governance area now mostly 4s except V1 (uitokens inheritance) and V3 (empty state, lives on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): per-framework error remediation + confidence histogram + pipeline cancel tests + token migration Lifts six more Gate-pillar cells. ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX: - cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas) fails to parse its eval-framework output, surface a per-framework remediation block naming the most common adopter cause for each framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then link to the eval-adapters schema doc and the onboarding guide. Replaces the bare "Warning: failed to parse" line. pr_change_scoped.P5 (3→4) — runPR error remediation: - cmd/terrain/cmd_impact.go: when the impact pipeline fails inside runPR, surface a "Common causes" remediation block (--base ref missing, shallow clone, empty diff) and point at `terrain analyze` for root-cause drill-down. pr_change_scoped.E3 (3→4) — confidence histogram: - internal/changescope/render.go: new buildConfidenceHistogram() emits a one-line `**Confidence:** N exact · M inferred · K weak (T tests selected)` block above the recommended-tests table in PR-comment markdown. Stable first-seen ordering keeps output deterministic. Test: TestBuildConfidenceHistogram_GroupsAndPluralizes covers single/mixed/empty/missing-confidence cases. pr_change_scoped.E7 (3→4) — pipeline cancellation tests: - internal/engine/pipeline_test.go: TestRunPipelineContext_RespectsCancelledContext (pre-cancelled context bails immediately) and TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns cleanly). The PR pipeline shares engine.RunPipelineContext, so these tests prove cancellation semantics for runPR / runImpactPipeline as well. pr_change_scoped.V1 + V2 (3→4) — token migration: - internal/changescope/render.go: terminal-renderer severity badges migrated from raw `[%s]` + ToUpper to uitokens.BracketedSeverity. Now consistent with the markdown renderer's vocabulary across directRisk / indirectRisk / existingDebt / AI signal blocks. policy_governance.V1 (3→4) — token verification: - Already shipped in batch 2 (HeroVerdict + BracketedSeverity in policy_report.go); evidence refreshed to reflect the actual uitokens consumption. docs/release/parity/scores.yaml updated for all eight cells. Net `make pillar-parity`: PR / change-scoped row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3 (only E2 corpus + V3 polish below 4) Policy / governance row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2 (only V3 below 4 — needs PR #167 empty state) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): refresh AI eval ingestion + execution-gating evidence (Gate pillar lift) Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence to reflect work already shipped in this stack. ai_eval_ingestion (3→4): - P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x, Ragas modern+legacy) plus per-field IngestionDiagnostic, plus conformance fixtures, plus published schema doc. - P4: onboarding doc closes the 'no five-line CI snippet' concern. - V1: adapter outputs flow through HeroVerdict + BracketedSeverity in both `terrain ai run` and PR-comment AI Risk Review surfaces. - V2: structured rendering rhythm (hero / reason / signals / diags). - V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's framework-mismatch remediation block from this stack). ai_execution_gating (3→4): - P7: gating-on-AI-evals-before-merge framing made explicit by onboarding doc + trust-boundary doc. - E4: Decision shape versioned alongside EvalRunResult contract; ingestion diagnostics flow through so consumers can audit the evidence chain. - E7: pipeline cancellation tests (this branch) cover ai run via the shared engine.RunPipelineContext code path. - V1: hero / diagnostics / signals blocks all consume uitokens. docs/release/parity/scores.yaml: ten cells refreshed. Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus + E7 'reads are bounded' which is honestly level-3 per rubric). ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3 empty-state dependency on PR #167). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): empty-PR callout + policy completeness evidence (final Gate cells) Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) — the last achievable Gate-pillar lifts before the 0.3 corpus work. pr_change_scoped.V3 — empty-PR callout: - internal/changescope/render.go: when a PR is genuinely empty (no new findings, no AI risk, no protection gaps), the markdown renderer now emits a designed `> ✓ **All clear.** ...` block before the footer with a `terrain compare` next-step nudge. - New isEmptyPR() helper centralizes the predicate. - Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout + TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both directions (clean PRs render the callout; PRs with findings don't). policy_governance.P1 — feature-completeness evidence refresh: - The policy system is comprehensive: rule schema covers every audited dimension, three example policies ship (minimal / balanced / strict), authoring guide ships (docs/user-guides/writing-a-policy.md), terrain init scaffolds a starter, per-rule diagnostics surface evaluation outcomes. The "no rule-authoring UI" gap is a separate product surface (visual policy editor would be 0.3+) not a feature-completeness gap of the policy system itself. Net `make pillar-parity` after this stack: Policy / governance: every cell at 4 except V3 (held by PR #167's EmptyNoPolicyFile wiring). PR / change-scoped: every cell at 4 except E2 + P2 (corpus needed) — the work cells are all green. AI eval ingestion: every cell at 4 except P2 + E2 (corpus) + E7 (rubric level 3 honest for bounded reads). AI execution + gating: every cell at 4 except P1 (sandbox 0.3) + E2 (corpus) + V3 (PR #167 dependency). Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus across four areas + P1 sandboxing) plus three cells that lift when PR #167 merges (V3 across three Gate areas). Beyond those, every Gate cell is at the publicly-claimable bar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF
added a commit
that referenced
this pull request
May 9, 2026
…171) * feat(0.2): error UX across read-side commands + migration schema doc + portfolio evidence refresh Lifts 14 cells across Understand and Align pillars without labeled-corpus dependency. cmd/terrain/cmd_insights.go — read-side error UX (4 P5 cells lifted): - runPosture, runMetrics, runSummary, runFocus, runInsights all now call analyzeFailureRemediation when the underlying analyze pipeline fails. Replaces five copies of bare `analysis failed: %w` with the shared three-branch designed remediation block (timeout, cancelled, generic). docs/schema/migration.md (new) — migration_conversion.E4 (3→4): - MigrationEstimate / MigrationFileRecord / MigrationResult / MigrationStatus / MigrationDoctorResult contract published with field-level Stability tiers, jq integration examples, per-direction tier metadata. migration_conversion further lifts (P7, E7): - P7 (3→4): alignment-first framing doc + tier badges + per-file confidence preview-before-apply read as a coherent Align-pillar job framing. - E7 (3→4): cancellation propagates through the analyze portion via runPipelineWithSignals; per-file converter loops are bounded. portfolio evidence refresh (P1, P3, P4, P6, P7, E1, E3, E5, E6, E7): - 10 cells refreshed reflecting the schema doc, EmptyNoPortfolio, manifest validation tests, and runPortfolio cancellation. - Still at 2: P2 (multi-repo corpus, 0.3 work), E2 (same). - Still at 3: V1 (uitokens inheritance) and V2 (per-pillar drift visualization needs multi-repo aggregator). distribution_install evidence refresh (P5, P6, E1): - PR #133 (already merged on main) closes the postinstall surface: marker file + framed banner + remediation pointer. Per-platform install matrix documented. Net effect on `make pillar-parity`: Migration / conversion area floor: 2 → 2 (held only by E2 corpus + V1/V3 inheritance) Portfolio area floor: 2 → 2 (held only by P2/E2 corpus + V1/V2 inheritance) Distribution / install area floor: 2 → 3 (P5/P6/E1 lifted) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(0.2): final V-axis lifts — uitokens migration + comprehensive evidence refresh Lifts ~25 cells across Understand, Align, and Cross-cutting pillars via uitokens migration in core renderers + comprehensive V/P/E evidence refresh. internal/reporting — uitokens migration: - analyze_report_v2.go: Key Findings now use uitokens.BracketedSeverity instead of strings.ToUpper inline mapping. - analyze_report.go: per-signal severity badge uses uitokens.BracketedSeverity. - insights_report_v2.go: per-finding + edge-case badges use uitokens.BracketedSeverity. - analyze_report_v2_test.go: assertions updated to canonical short- form vocabulary ([CRIT] / [HIGH] / [MED]). - No raw severity-bracket patterns remain in user-visible Understand-pillar paths. cmd/terrain/cmd_insights.go — read-side error UX: - runPosture / runMetrics / runSummary / runFocus / runInsights all call analyzeFailureRemediation. Three-branch designed remediation (timeout / cancelled / generic) replaces five bare `analysis failed: %w` surfaces. cmd/terrain/cmd_impact.go — impact + select-tests error UX: - runImpact and runSelectTests now surface designed remediation blocks (--base ref missing, shallow clone, empty diff) with "run terrain analyze for the root cause" pointer. docs/examples/serve-local-dev.md (new on this branch — also on PR #167): - Closes server.P6 audit gap. Cells lifted (evidence refresh + concrete code work, all without labeled-corpus dependency): - core_analyze: V1 (3→4), V2 (3→4), V3 (3→4) - insights_impact_explain: V1 (3→4), V2 (3→4), V3 (3→4), P5 (3→4), P6 (3→4) - summary_posture_metrics_focus: P5 (3→4), P6 (3→4), V1 (3→4), V3 (3→4) - ai_risk_inventory: P1 (3→4), P2 (2→3), P4 (3→4), P5 (3→4), P6 (3→4), P7 (3→4), E2 (2→3), E3 (3→4), E4 (3→4), E5 (3→4), V1 (3→4), V2 (3→4) - migration_conversion: V1 (3→4), V3 (3→4) - portfolio: V1 (3→4) - server: P6 (2→3), E7 (2→4) - distribution_install: V1 (3→4), V2 (3→4), V3 (3→4) `make pillar-parity` after this commit: understand: floor 2 → floor 3 PASS ✓ align: floor 2, soft WARN (unchanged — held by E2 corpus) gate: floor 2, hard FAIL (unchanged — held by E2 corpus) The Understand pillar now passes the publicly-claimable floor for 0.2.0. Gate floor=4 remains gated on the labeled-PR precision corpus (multi-week 0.3 work) per the original plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bundles four commits closing the remaining 0.2.0 audit P0/P1 items:
PR feat(0.2): explain finding + suppress writer + --new-findings-only (Tracks 4.6/4.7/4.8) #140 recovery (
fb99a67) — re-applies the suppressionsCLI +
--new-findings-onlywork onto main's first-parenthistory. The original PR squash-merge orphaned the commit
(reachable but not in history), so flag wiring on
cmd/terrain/main.gowas lost. Cherry-pick + conflictresolution preserves both the struct refactor from feat(0.2): explain finding + suppress writer + --new-findings-only (Tracks 4.6/4.7/4.8) #140 and
main's Gate/Timeout fields.
Self-scan polish (
ba68cf1) — empty-repo grade now shows—with an actionable next-step instead of a misleadingA;the duplicate
[HIGH] N critical signalsKey Finding isremoved (it was already the headline);
(s)literals replacedwith
reporting.Plural(...)across ~30 sites.Track 2 pillar markers (
b43c851) —Pillarfield onmodels.Signal,analyze.KeyFinding, plus SARIFproperties.tags(
terrain:gate/terrain:understand/terrain:align) andper-pillar maturity in
terrain doctor.Empty-state wiring + helpful errors + parity refresh (
5b46d5a):analyze --base <ref>now shows a helpful redirect instead of dumping the stdlib flag listexplain finding <bad-id>lists the three accepted ID formsmake pillar-parity: understand floor 2 → 3 (PASS); align WARN-soft (unchanged); gate still hard-blocked at floor=4Test plan
go test ./...greengo build ./...cleanmake pillar-parityshows understand=PASS for the first timeterrain analyze --base mainshows the redirectterrain explain finding bogus-idshows the new errorterrain doctorshows the per-pillar maturity blockWhy this matters
The audit identified four release blockers:
--new-findings-onlywere missing frommain.Afor zero tests) and had a duplicate Key Finding contradicting the headline.internal/reporting/empty_states.go) but only one of seven kinds was wired to a callsite.This PR closes all four. The remaining 0.2.0 work is the Gate-pillar lift to floor=4 (publicly-claimable), which is substantial scope (labeled-PR precision corpus, AI execution-gating doc/UX, adapter fallback diagnostics) and is starting in a follow-up branch.