Skip to content

fix(0.2): PR #140 recovery + Track 2 pillar markers + V3 empty-states + audit fixes#167

Merged
pmclSF merged 5 commits into
mainfrom
fix/0.2-recover-pr140-and-audit-fixes
May 7, 2026
Merged

fix(0.2): PR #140 recovery + Track 2 pillar markers + V3 empty-states + audit fixes#167
pmclSF merged 5 commits into
mainfrom
fix/0.2-recover-pr140-and-audit-fixes

Conversation

@pmclSF
Copy link
Copy Markdown
Owner

@pmclSF pmclSF commented May 5, 2026

Summary

Bundles four commits closing the remaining 0.2.0 audit P0/P1 items:

  1. PR feat(0.2): explain finding + suppress writer + --new-findings-only (Tracks 4.6/4.7/4.8) #140 recovery (fb99a67) — re-applies the suppressions
    CLI + --new-findings-only work onto main's first-parent
    history. The original PR squash-merge orphaned the commit
    (reachable but not in history), so flag wiring on
    cmd/terrain/main.go was lost. Cherry-pick + conflict
    resolution preserves both the struct refactor from feat(0.2): explain finding + suppress writer + --new-findings-only (Tracks 4.6/4.7/4.8) #140 and
    main's Gate/Timeout fields.

  2. Self-scan polish (ba68cf1) — empty-repo grade now shows
    with an actionable next-step instead of a misleading A;
    the duplicate [HIGH] N critical signals Key Finding is
    removed (it was already the headline); (s) literals replaced
    with reporting.Plural(...) across ~30 sites.

  3. Track 2 pillar markers (b43c851) — Pillar field on
    models.Signal, analyze.KeyFinding, plus SARIF properties.tags
    (terrain:gate / terrain:understand / terrain:align) and
    per-pillar maturity in terrain doctor.

  4. Empty-state wiring + helpful errors + parity refresh (5b46d5a):

Test plan

  • go test ./... green
  • go build ./... clean
  • make pillar-parity shows understand=PASS for the first time
  • CI green on this PR
  • Manual smoke: terrain analyze --base main shows the redirect
  • Manual smoke: terrain explain finding bogus-id shows the new error
  • Manual smoke: terrain doctor shows the per-pillar maturity block

Why this matters

The audit identified four release blockers:

  • PR feat(0.2): explain finding + suppress writer + --new-findings-only (Tracks 4.6/4.7/4.8) #140 was orphaned post-squash; suppressions + --new-findings-only were missing from main.
  • Self-scan output was misleading on empty repos (grade A for zero tests) and had a duplicate Key Finding contradicting the headline.
  • Pillar tags were promised in the plan but absent from every output mode.
  • Several empty-state paths existed in code (internal/reporting/empty_states.go) but only one of seven kinds was wired to a callsite.

This PR closes all four. The remaining 0.2.0 work is the Gate-pillar lift to floor=4 (publicly-claimable), which is substantial scope (labeled-PR precision corpus, AI execution-gating doc/UX, adapter fallback diagnostics) and is starting in a follow-up branch.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Terrain AI Risk Review

Metric Value
AI surfaces 13
Eval scenarios 17
Impacted scenarios 0
Uncovered surfaces 13

Decision: PASS — AI surfaces are covered.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

[RISK] Terrain — Merge with caution

High-severity gaps found in changed code.

Metric Value
Changed files 36 (23 source · 8 test)
Impacted units 98
Protection gaps 28
Tests selected 77 of 798 (9% of suite)

Coverage gaps in changed code

  • cmd/terrain/cmd_ai.go [LOW] — cmd_ai.go has no observed test coverage.
    → Add unit tests for cmd_ai.go.
  • cmd/terrain/cmd_analyze.go [LOW] — cmd_analyze.go has no observed test coverage.
    → Add unit tests for cmd_analyze.go.
  • cmd/terrain/cmd_convert.go [MED] — Exported method Error has no observed test coverage.
    → Add unit tests for exported method Error — this is public API surface.
  • cmd/terrain/cmd_doctor_pillars.go [LOW] — cmd_doctor_pillars.go has no observed test coverage.
    → Add unit tests for cmd_doctor_pillars.go.
  • cmd/terrain/cmd_explain.go [LOW] — cmd_explain.go has no observed test coverage.
    → Add unit tests for cmd_explain.go.
  • cmd/terrain/cmd_insights.go [LOW] — cmd_insights.go has no observed test coverage.
    → Add unit tests for cmd_insights.go.
  • cmd/terrain/cmd_suppress.go [LOW] — cmd_suppress.go has no observed test coverage.
    → Add unit tests for cmd_suppress.go.
  • cmd/terrain/cmd_workflow.go [LOW] — cmd_workflow.go has no observed test coverage.
    → Add unit tests for cmd_workflow.go.
  • cmd/terrain/main.go [LOW] — main.go has no observed test coverage.
    → Add unit tests for main.go.
  • internal/analyze/analyze.go [MED] — Exported class ManualCoverageSummary has no observed test coverage.
    → Add unit tests for exported class ManualCoverageSummary — this is public API surface.
  • ...and 18 more (15 medium, 3 low)
25 pre-existing issues on changed files
  • cmd/terrain/ai_workflow_test.go [HIGH] — [staticSkippedTest] 13 of 14 tests statically skipped (93%) in cmd/terrain/ai_workflow_test.go.
  • cmd/terrain/cmd_ai.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 176 tests (176 direct, 0 indirect). High blast radius increases regression risk.
  • cmd/terrain/cmd_analyze.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 176 tests (176 direct, 0 indirect). High blast radius increases regression risk.
  • cmd/terrain/cmd_convert.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 176 tests (176 direct, 0 indirect). High blast radius increases regression risk.
  • cmd/terrain/cmd_doctor_pillars.go [HIGH] — [blastRadiusHotspot] Changes to this file propagate to 176 tests (176 direct, 0 indirect). High blast radius increases regression risk.
  • ...and 20 more

Recommended tests

77 test(s) with exact coverage of 69 impacted unit(s). 29 impacted unit(s) have no covering tests in the selected set.

Package Tests Sample
internal/engine 10 internal/engine/artifacts_test.go ...
internal/reporting 10 internal/reporting/analyze_report_test.go ...
cmd/terrain 6 cmd/terrain/ai_workflow_test.go ...
internal/signals 6 internal/signals/ai_subdomain_test.go ...
internal/analyze 5 internal/analyze/actions_test.go ...
internal/testdata 5 internal/testdata/adversarial_test.go ...
internal/models 4 internal/models/signal_v2_test.go ...
internal/changescope 3 internal/changescope/changescope_test.go ...
internal/aidetect 2 internal/aidetect/embedding_model_change_test.go ...
internal/insights 2 internal/insights/insights_golden_test.go ...
internal/ownership 2 internal/ownership/aggregate_test.go ...
internal/quality 2 internal/quality/snapshot_heavy_test.go ...
internal/benchmark 1 internal/benchmark/export_test.go
internal/calibration 1 internal/calibration/runner_test.go
internal/comparison 1 internal/comparison/compare_test.go
internal/explain 1 internal/explain/explain_test.go
internal/gauntlet 1 internal/gauntlet/ingest_test.go
internal/governance 1 internal/governance/evaluate_test.go
internal/graph 1 internal/graph/graph_test.go
internal/heatmap 1 internal/heatmap/heatmap_test.go
internal/measurement 1 internal/measurement/measurement_test.go
internal/metrics 1 internal/metrics/metrics_test.go
internal/migration 1 internal/migration/readiness_test.go
internal/sarif 1 internal/sarif/convert_test.go
internal/scoring 1 internal/scoring/risk_engine_test.go
internal/server 1 internal/server/server_test.go
internal/severity 1 internal/severity/rubric_test.go
internal/skipstats 1 internal/skipstats/summary_test.go
internal/stability 1 internal/stability/cluster_test.go
internal/structural 1 internal/structural/structural_test.go
internal/summary 1 internal/summary/executive_test.go
internal/suppression 1 internal/suppression/suppression_test.go

Owners: PMCLSF

Limitations
  • No coverage artifacts provided; protection gaps reflect missing data, not measured absence. Provide --coverage to improve accuracy.
  • Mixed test cultures reduce cross-framework optimization confidence. Consider standardizing on fewer frameworks.

Generated by Terrain · terrain pr --json for machine-readable output

Targeted Test Results

Terrain selected 77 test(s) instead of the full suite.

  • Go tests: passed

pmclSF added a commit that referenced this pull request May 7, 2026
…ift batch 3)

Lifts four more Gate-pillar cells.

policy_governance.P5 (3→4) — error UX:
- cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails
  to parse, surface a designed remediation block naming the three
  common causes (YAML indentation, misspelled rule key, type
  mismatch) and pointing at `cp docs/policy/examples/balanced.yaml
  .terrain/policy.yaml` for a known-good template. Replaces the
  bare `error: <yaml-parse-error>` pre-fix shape.

ai_execution_gating.E1 (3→4) — decision-logic tests:
- cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence
  rule (block_on_* > warn_on_*), the blocking_signal_types special
  case, combined critical+policy reason synthesis, edge cases for
  metadata absence and non-string rule values, and the high-only
  warn boundary.

pr_change_scoped.E5 (3→4) — performance benchmarks:
- internal/changescope/render_bench_test.go: small/medium/large
  fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel
  i7-8850H. Linear scaling — no quadratic regressions in
  dedup/classify/render. Reference numbers committed in the file's
  package comment.

pr_change_scoped.E6 already lifted (previous commit) via
TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch.

docs/release/parity/scores.yaml updated for the four cells.
Net: policy_governance area now mostly 4s except V1 (uitokens
inheritance) and V3 (empty state, lives on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…ipeline cancel tests + token migration

Lifts six more Gate-pillar cells.

ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX:
- cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas)
  fails to parse its eval-framework output, surface a per-framework
  remediation block naming the most common adopter cause for each
  framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then
  link to the eval-adapters schema doc and the onboarding guide.
  Replaces the bare "Warning: failed to parse" line.

pr_change_scoped.P5 (3→4) — runPR error remediation:
- cmd/terrain/cmd_impact.go: when the impact pipeline fails inside
  runPR, surface a "Common causes" remediation block (--base ref
  missing, shallow clone, empty diff) and point at `terrain
  analyze` for root-cause drill-down.

pr_change_scoped.E3 (3→4) — confidence histogram:
- internal/changescope/render.go: new buildConfidenceHistogram()
  emits a one-line `**Confidence:** N exact · M inferred · K weak
  (T tests selected)` block above the recommended-tests table in
  PR-comment markdown. Stable first-seen ordering keeps output
  deterministic. Test:
  TestBuildConfidenceHistogram_GroupsAndPluralizes covers
  single/mixed/empty/missing-confidence cases.

pr_change_scoped.E7 (3→4) — pipeline cancellation tests:
- internal/engine/pipeline_test.go:
  TestRunPipelineContext_RespectsCancelledContext (pre-cancelled
  context bails immediately) and
  TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns
  cleanly). The PR pipeline shares engine.RunPipelineContext, so
  these tests prove cancellation semantics for runPR /
  runImpactPipeline as well.

pr_change_scoped.V1 + V2 (3→4) — token migration:
- internal/changescope/render.go: terminal-renderer severity
  badges migrated from raw `[%s]` + ToUpper to
  uitokens.BracketedSeverity. Now consistent with the markdown
  renderer's vocabulary across directRisk / indirectRisk /
  existingDebt / AI signal blocks.

policy_governance.V1 (3→4) — token verification:
- Already shipped in batch 2 (HeroVerdict + BracketedSeverity in
  policy_report.go); evidence refreshed to reflect the actual
  uitokens consumption.

docs/release/parity/scores.yaml updated for all eight cells.

Net `make pillar-parity`:
  PR / change-scoped     row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3
                         (only E2 corpus + V3 polish below 4)
  Policy / governance    row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2
                         (only V3 below 4 — needs PR #167 empty state)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…e pillar lift)

Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence
to reflect work already shipped in this stack.

ai_eval_ingestion (3→4):
- P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x,
  Ragas modern+legacy) plus per-field IngestionDiagnostic, plus
  conformance fixtures, plus published schema doc.
- P4: onboarding doc closes the 'no five-line CI snippet' concern.
- V1: adapter outputs flow through HeroVerdict + BracketedSeverity in
  both `terrain ai run` and PR-comment AI Risk Review surfaces.
- V2: structured rendering rhythm (hero / reason / signals / diags).
- V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's
  framework-mismatch remediation block from this stack).

ai_execution_gating (3→4):
- P7: gating-on-AI-evals-before-merge framing made explicit by
  onboarding doc + trust-boundary doc.
- E4: Decision shape versioned alongside EvalRunResult contract;
  ingestion diagnostics flow through so consumers can audit the
  evidence chain.
- E7: pipeline cancellation tests (this branch) cover ai run via
  the shared engine.RunPipelineContext code path.
- V1: hero / diagnostics / signals blocks all consume uitokens.

docs/release/parity/scores.yaml: ten cells refreshed.

Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus
+ E7 'reads are bounded' which is honestly level-3 per rubric).
ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3
empty-state dependency on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…e cells)

Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) —
the last achievable Gate-pillar lifts before the 0.3 corpus work.

pr_change_scoped.V3 — empty-PR callout:
- internal/changescope/render.go: when a PR is genuinely empty (no
  new findings, no AI risk, no protection gaps), the markdown
  renderer now emits a designed `> ✓ **All clear.** ...` block
  before the footer with a `terrain compare` next-step nudge.
- New isEmptyPR() helper centralizes the predicate.
- Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout +
  TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both
  directions (clean PRs render the callout; PRs with findings
  don't).

policy_governance.P1 — feature-completeness evidence refresh:
- The policy system is comprehensive: rule schema covers every
  audited dimension, three example policies ship (minimal /
  balanced / strict), authoring guide ships
  (docs/user-guides/writing-a-policy.md), terrain init scaffolds a
  starter, per-rule diagnostics surface evaluation outcomes. The
  "no rule-authoring UI" gap is a separate product surface (visual
  policy editor would be 0.3+) not a feature-completeness gap of
  the policy system itself.

Net `make pillar-parity` after this stack:
  Policy / governance:  every cell at 4 except V3 (held by PR #167's
                        EmptyNoPolicyFile wiring).
  PR / change-scoped:   every cell at 4 except E2 + P2 (corpus needed)
                        — the work cells are all green.
  AI eval ingestion:    every cell at 4 except P2 + E2 (corpus) +
                        E7 (rubric level 3 honest for bounded reads).
  AI execution + gating: every cell at 4 except P1 (sandbox 0.3) +
                         E2 (corpus) + V3 (PR #167 dependency).

Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus
across four areas + P1 sandboxing) plus three cells that lift when
PR #167 merges (V3 across three Gate areas). Beyond those, every
Gate cell is at the publicly-claimable bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF and others added 5 commits May 6, 2026 17:14
…/4.7/4.8) (#140)

Bundles the three remaining Track 4 deliverables into one PR. With
4.4 (finding IDs) and 4.5 (suppression model) already in flight,
this PR makes the suppression workflow actually usable end-to-end:

  Track 4.6 — terrain explain <finding-id>
    Extends `terrain explain` to recognize stable finding IDs (e.g.
    "weakAssertion@internal/auth/login.go:TestLogin#a1b2c3d4"). On a
    hit, prints a finding-detail block: detector + severity + location +
    evidence + explanation + suggested action + the canonical
    `terrain suppress <id> --reason "..."` invocation.

    A finding ID that parses but isn't in the snapshot returns a
    distinct exit-5 (not-found) message that distinguishes "stale ID
    after refactor" from "garbage input" — common adoption flow when
    a user keeps a CI link to a finding that has since moved.

    Implementation: lookupSignalByFindingID + renderFindingExplanation
    in cmd/terrain/cmd_explain.go.

  Track 4.7 — terrain suppress <finding-id> --reason "..." [--expires] [--owner]
    New top-level Gate-pillar primitive. Validates the ID format,
    refuses duplicates (existing entry → usage error pointing at the
    existing reason), appends a YAML entry to .terrain/suppressions.yaml.

    Writes text rather than re-marshaling the file so any comments /
    ordering the user added by hand are preserved. Schema header is
    auto-emitted on first call.

    --reason required (every suppression justifies itself, per Track
    4.5 schema). --expires optional but recommended; ISO YYYY-MM-DD
    shape validated up front. --owner optional free-text pointer.

    Implementation: cmd/terrain/cmd_suppress.go + 7 unit tests.

  Track 4.8 — terrain analyze --new-findings-only --baseline <path>
    Filters the snapshot to keep only signals whose FindingID is NOT
    present in the baseline. The "established repos with debt"
    adoption flow: `--fail-on critical` would brick CI on day one
    against existing high findings; combining with
    `--new-findings-only --baseline old.json` makes the gate fire
    only on findings introduced AFTER the baseline.

    Implementation: PipelineOptions.NewFindingsOnly +
    internal/engine/new_findings_only.go (applyNewFindingsOnly).
    Runs after suppression apply so the baseline comparison sees
    the user's intended-active signal set.

    No-baseline case: --new-findings-only is inert; logs a warning so
    the user notices the flag had no effect (better than silent
    success that masks the misconfiguration).

    Signals without FindingID (older / specialized emissions) are
    KEPT — over-report rather than under-report.

    Implementation: 6 unit tests including the "no-baseline" warning
    path, empty baseline, per-file signals, and signals without IDs.

Refactor: runAnalyze gets a `analyzeRunOpts` struct so the call site
in main.go isn't a 17-positional-argument list. The struct collapses
the existing args + adds SuppressionsPath + NewFindingsOnly. Future
flag additions stop expanding the call signature.

Validation in main.go: --new-findings-only requires --baseline; the
combination is rejected at usage-error level (exit 2) so the user
gets a clear message rather than a silent no-op.

Verification:
  go test ./cmd/terrain/ -run "TestRunSuppress|TestLooksLikeISODate" — 7 tests green
  go test ./internal/engine/ -run "TestApplyNewFindingsOnly" — 6 tests green
  go test ./... — full suite green
  go test ./internal/testdata/ — golden + CLI suite green

Plan link: /Users/pzachary/.claude/plans/kind-mapping-turing.md
(Tracks 4.6, 4.7, 4.8).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ization

Audit findings remediated in a single pass:

- internal/insights: empty-repo (zero tests AND zero findings) now
  shows "—" grade with actionable next-step headline instead of
  misleading "A". The first-user trust hit was real — a fresh repo
  with no tests grading "A" undermines the pitch.
- internal/analyze/headline.go: critical-signal headline says
  "critical" not "high-priority", matching the body's `[CRITICAL]`
  vocabulary. Empty-repo case detected and given an actionable
  headline.
- internal/analyze/analyze.go: removed the duplicate
  "[HIGH] N critical signals" Key Finding — that fact is already
  the headline; Key Findings are reserved for distinct actionable
  items.
- Pluralization sweep across analyze / changescope / reporting /
  cmd_ai: replaced literal `(s)` with reporting.Plural(...) helper
  for finding/test/unit/file/gap/check/scenario/etc.
- Tests + golden updated for the new "—" empty-repo grade and the
  unified pluralization output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ack 2)

Plumbing for pillar-aware grouping in every output mode the pitch
promises ("gate the system as a whole") — JSON envelopes, SARIF
tags, and doctor maturity now all carry the pillar.

- internal/models/signal.go: new Pillar field on Signal (omitempty,
  back-compat); PillarFor(category) and Pillar{Understand,Align,Gate}
  constants. Mapping: structure/health/quality/ai → Understand;
  migration → Align; governance → Gate.
- internal/engine/finding_ids.go: assignSignalID renamed to
  finalizeSignal; populates Pillar from Category in the same pass
  it stamps FindingID, so every snapshot signal lands tagged.
- internal/analyze/analyze.go: KeyFinding gains Pillar field;
  deriveKeyFindings tags every finding "understand" (analyze is the
  Understand pillar's primary command).
- internal/sarif/{sarif,convert}.go: Rule + Result gain Properties
  with Tags; pillarProperties() emits "terrain:<pillar>" tag for
  GitHub Code Scanning / IDE consumers to group by pillar.
- cmd/terrain/cmd_doctor_pillars.go (new): per-pillar local maturity
  check — Understand (test framework configs), Align (multi-repo
  manifest), Gate (CI workflow + suppressions). Cheap; no analyze
  run, no network.
- cmd/terrain/cmd_workflow.go: runDoctor renders the pillar block
  before migration checks; JSON envelope keeps legacy fields for
  back-compat and adds `pillars` alongside.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the remaining audit P1/P2 items in a single pass.

V3 empty-state wiring (Track 10.6 follow-on):
- policy check: EmptyNoPolicyFile now renders the designed empty-
  state (header + `terrain init` next-step nudge) when the repo
  has no .terrain/policy.yaml, replacing the bare "Create
  .terrain/policy.yaml..." line.
- ai list: EmptyNoAISurfaces wired when no AI surfaces detected;
  renders one designed line instead of two ad-hoc strings.
- report impact: EmptyNoImpact wired in RenderImpactReport when
  the change touches nothing structural — beats a wall of zeros
  that reads as "Terrain failed".
- report select-tests / RenderProtectiveSet: EmptyNoTestSelection
  wired when the protective set is empty.
- migrate estimate: EmptyNoMigrationCandidates wired when zero
  files in scope.

Helpful errors:
- terrain analyze --base <ref>: now prints a one-screen redirect
  ("Did you mean: terrain report pr / report impact --base") and
  exits with usage error, instead of dumping the stdlib flag
  package's full flag list.
- terrain explain finding <bad-id>: error now lists the three
  accepted ID forms (stable finding ID / portfolio index /
  signal type) with a one-line "ID changed since last run?"
  hint pointing at re-running analyze.

Parity score refresh (audit-flagged staleness):
- core_analyze.E2: cite recall-gate assertion line correctly
  (calibration_integration_test.go:151, not :166).
- ai_risk_inventory.P2 / E2: bumped 2→3 — rubric level 3 is
  "calibrated on synthetic fixtures (recall-anchored)" which is
  exactly what the 27-fixture corpus delivers across 33 detectors.
  Several precision concerns from the prior review are now
  remediated; refreshed evidence to reflect that.
- pr_change_scoped.E2: bumped 2→3 — same recall-anchor inheritance
  as core_analyze.
- server.E7: bumped 2→4 — PR #132 (request-context honoring) IS
  merged (commit dc01edc); evidence was stale.
- distribution_install.P5: bumped 2→4 — PR #133 (postinstall
  marker) IS merged (commit e0619da); evidence was stale.
- ai_execution_gating.V3 + policy_governance.V3: bumped 2→3 —
  empty-states wired in this commit close the cited gaps.
- ai_risk_inventory.V3: bumped 2→3 — empty-state + per-detector
  rule pages provide remediation; level-5 (LLM-context-tailored
  in-line remediation) deferred.
- server.P6: bumped 2→3 — added docs/examples/serve-local-dev.md
  closing the missing 'use this for local dev' example doc.

Known gaps doc:
- Added the three "structural-graph and CI-inference" gaps the
  audit surfaced (G2 AI surfaces in depgraph; G3 CI matrix
  dimensions; G7 env-matrix CI inference).
- Added I4 (coverage / runtime artifact auto-detection) to the
  same doc — `analyze` accepts artifacts via flag but doesn't
  auto-discover conventional locations.

Net effect on `make pillar-parity`:
  understand: floor=2 → floor=3 PASS (was hard-blocked).
  align:      floor=2, soft WARN (does not block release).
  gate:       floor=2 still hard-blocked at floor=4 — Gate's
              publicly-claimable bar requires substantial work
              outside the audit-fix scope (labeled-PR precision
              corpus + adapter fallback diagnostics + AI
              execution-gating doc/UX lift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's `go test -race` flag exposed a data race on the global os.Stdout:
TestRunSuppress_* called runSuppress() directly (which writes via
fmt.Printf to os.Stdout) under t.Parallel(), while other parallel
tests called captureRun() which swaps os.Stdout for capture.

Wrapping the runSuppress calls in runCaptured / captureRun makes
them acquire the captureRunMu mutex, serializing all stdout-touching
tests under the same lock. Behavior unchanged; only the test
harness changes.

Affects: TestRunSuppress_CreatesNewFile, TestRunSuppress_AppendsToExisting,
TestRunSuppress_RejectsDuplicate, TestRunSuppress_RejectsBadID,
TestRunSuppress_RequiresReason, TestRunSuppress_RejectsBadExpiryShape.

The same race likely affected TestRunConvert_PlanWithAutoDetect and
others — they show in CI output as collateral races where one test's
stdout-swap exposed another test's direct fmt.Printf, but the fix
is one-sided: lock the suppress side and the others stop racing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pmclSF pmclSF force-pushed the fix/0.2-recover-pr140-and-audit-fixes branch from 2377cd2 to 28e3f0d Compare May 7, 2026 00:15
@pmclSF pmclSF merged commit cc15f23 into main May 7, 2026
11 checks passed
@pmclSF pmclSF deleted the fix/0.2-recover-pr140-and-audit-fixes branch May 7, 2026 00:20
pmclSF added a commit that referenced this pull request May 7, 2026
…ift batch 3)

Lifts four more Gate-pillar cells.

policy_governance.P5 (3→4) — error UX:
- cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails
  to parse, surface a designed remediation block naming the three
  common causes (YAML indentation, misspelled rule key, type
  mismatch) and pointing at `cp docs/policy/examples/balanced.yaml
  .terrain/policy.yaml` for a known-good template. Replaces the
  bare `error: <yaml-parse-error>` pre-fix shape.

ai_execution_gating.E1 (3→4) — decision-logic tests:
- cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence
  rule (block_on_* > warn_on_*), the blocking_signal_types special
  case, combined critical+policy reason synthesis, edge cases for
  metadata absence and non-string rule values, and the high-only
  warn boundary.

pr_change_scoped.E5 (3→4) — performance benchmarks:
- internal/changescope/render_bench_test.go: small/medium/large
  fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel
  i7-8850H. Linear scaling — no quadratic regressions in
  dedup/classify/render. Reference numbers committed in the file's
  package comment.

pr_change_scoped.E6 already lifted (previous commit) via
TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch.

docs/release/parity/scores.yaml updated for the four cells.
Net: policy_governance area now mostly 4s except V1 (uitokens
inheritance) and V3 (empty state, lives on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…ipeline cancel tests + token migration

Lifts six more Gate-pillar cells.

ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX:
- cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas)
  fails to parse its eval-framework output, surface a per-framework
  remediation block naming the most common adopter cause for each
  framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then
  link to the eval-adapters schema doc and the onboarding guide.
  Replaces the bare "Warning: failed to parse" line.

pr_change_scoped.P5 (3→4) — runPR error remediation:
- cmd/terrain/cmd_impact.go: when the impact pipeline fails inside
  runPR, surface a "Common causes" remediation block (--base ref
  missing, shallow clone, empty diff) and point at `terrain
  analyze` for root-cause drill-down.

pr_change_scoped.E3 (3→4) — confidence histogram:
- internal/changescope/render.go: new buildConfidenceHistogram()
  emits a one-line `**Confidence:** N exact · M inferred · K weak
  (T tests selected)` block above the recommended-tests table in
  PR-comment markdown. Stable first-seen ordering keeps output
  deterministic. Test:
  TestBuildConfidenceHistogram_GroupsAndPluralizes covers
  single/mixed/empty/missing-confidence cases.

pr_change_scoped.E7 (3→4) — pipeline cancellation tests:
- internal/engine/pipeline_test.go:
  TestRunPipelineContext_RespectsCancelledContext (pre-cancelled
  context bails immediately) and
  TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns
  cleanly). The PR pipeline shares engine.RunPipelineContext, so
  these tests prove cancellation semantics for runPR /
  runImpactPipeline as well.

pr_change_scoped.V1 + V2 (3→4) — token migration:
- internal/changescope/render.go: terminal-renderer severity
  badges migrated from raw `[%s]` + ToUpper to
  uitokens.BracketedSeverity. Now consistent with the markdown
  renderer's vocabulary across directRisk / indirectRisk /
  existingDebt / AI signal blocks.

policy_governance.V1 (3→4) — token verification:
- Already shipped in batch 2 (HeroVerdict + BracketedSeverity in
  policy_report.go); evidence refreshed to reflect the actual
  uitokens consumption.

docs/release/parity/scores.yaml updated for all eight cells.

Net `make pillar-parity`:
  PR / change-scoped     row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3
                         (only E2 corpus + V3 polish below 4)
  Policy / governance    row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2
                         (only V3 below 4 — needs PR #167 empty state)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…e pillar lift)

Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence
to reflect work already shipped in this stack.

ai_eval_ingestion (3→4):
- P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x,
  Ragas modern+legacy) plus per-field IngestionDiagnostic, plus
  conformance fixtures, plus published schema doc.
- P4: onboarding doc closes the 'no five-line CI snippet' concern.
- V1: adapter outputs flow through HeroVerdict + BracketedSeverity in
  both `terrain ai run` and PR-comment AI Risk Review surfaces.
- V2: structured rendering rhythm (hero / reason / signals / diags).
- V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's
  framework-mismatch remediation block from this stack).

ai_execution_gating (3→4):
- P7: gating-on-AI-evals-before-merge framing made explicit by
  onboarding doc + trust-boundary doc.
- E4: Decision shape versioned alongside EvalRunResult contract;
  ingestion diagnostics flow through so consumers can audit the
  evidence chain.
- E7: pipeline cancellation tests (this branch) cover ai run via
  the shared engine.RunPipelineContext code path.
- V1: hero / diagnostics / signals blocks all consume uitokens.

docs/release/parity/scores.yaml: ten cells refreshed.

Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus
+ E7 'reads are bounded' which is honestly level-3 per rubric).
ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3
empty-state dependency on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…e cells)

Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) —
the last achievable Gate-pillar lifts before the 0.3 corpus work.

pr_change_scoped.V3 — empty-PR callout:
- internal/changescope/render.go: when a PR is genuinely empty (no
  new findings, no AI risk, no protection gaps), the markdown
  renderer now emits a designed `> ✓ **All clear.** ...` block
  before the footer with a `terrain compare` next-step nudge.
- New isEmptyPR() helper centralizes the predicate.
- Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout +
  TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both
  directions (clean PRs render the callout; PRs with findings
  don't).

policy_governance.P1 — feature-completeness evidence refresh:
- The policy system is comprehensive: rule schema covers every
  audited dimension, three example policies ship (minimal /
  balanced / strict), authoring guide ships
  (docs/user-guides/writing-a-policy.md), terrain init scaffolds a
  starter, per-rule diagnostics surface evaluation outcomes. The
  "no rule-authoring UI" gap is a separate product surface (visual
  policy editor would be 0.3+) not a feature-completeness gap of
  the policy system itself.

Net `make pillar-parity` after this stack:
  Policy / governance:  every cell at 4 except V3 (held by PR #167's
                        EmptyNoPolicyFile wiring).
  PR / change-scoped:   every cell at 4 except E2 + P2 (corpus needed)
                        — the work cells are all green.
  AI eval ingestion:    every cell at 4 except P2 + E2 (corpus) +
                        E7 (rubric level 3 honest for bounded reads).
  AI execution + gating: every cell at 4 except P1 (sandbox 0.3) +
                         E2 (corpus) + V3 (PR #167 dependency).

Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus
across four areas + P1 sandboxing) plus three cells that lift when
PR #167 merges (V3 across three Gate areas). Beyond those, every
Gate cell is at the publicly-claimable bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…ork (#169)

* feat(0.2): policy guide + per-rule diagnostics + PR schema doc + determinism gate

Lifts four more Gate-pillar cells.

policy_governance.P3 (3→4) — docs/user-guides/writing-a-policy.md:
- Full authoring guide: TL;DR, where the policy lives, full schema
  with annotations, three opinionated starting points (minimal /
  balanced / strict), gate decision logic, CI adoption pattern,
  tuning workflow, suppression pairing, anti-goals.

policy_governance.E3 (3→4) — per-rule diagnostics:
- internal/governance/evaluate.go: new RuleDiagnostic{Rule, Status,
  Detail, ViolationCount}; Result.Diagnostics records every active
  rule's outcome. Status one of: pass / violated / skipped / warn.
  Skipped means "not configured in policy.yaml".
- internal/reporting/policy_report.go: renderPolicyDiagnostics
  table at the bottom of `terrain policy check` output. Per-rule
  status badge (PASS / BLOCK / SKIP / WARN) via uitokens.Ok /
  Alert / Muted / Warn — same vocabulary as the rest of the
  design system.
- TestEvaluate_Diagnostics_PerRuleStatus locks the contract:
  active rules emit one entry, status reflects pass/violated,
  unconfigured rules emit "skipped".

pr_change_scoped.E4 (3→4) — docs/schema/pr-analysis.md:
- Canonical PR-analysis JSON contract published. Documents
  PRAnalysis envelope, ChangeScopedFinding, TestSelection,
  PostureDelta, AIValidationSummary with field-level Stability
  tiers. jq integration examples; pillar-marker compatibility
  note. internal/changescope/model.go (PRAnalysisSchemaVersion)
  remains the in-code anchor.

pr_change_scoped.E6 (3→4) — determinism gate:
- TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch:
  sets SOURCE_DATE_EPOCH to two distinct values and asserts
  byte-identical PR markdown output. Locks the contract that
  the PR comment surface itself is timestamp-free even though
  the underlying snapshot honors SOURCE_DATE_EPOCH for its own
  timestamps.

policy_governance.E4 (3→4) — schema doc joint coverage:
- The eval-adapters schema doc (previous PR) plus the new
  pr-analysis doc plus internal/policy/config.go give policy.yaml
  a published contract per FIELD_TIERS.md tiers.

docs/release/parity/scores.yaml updated for the four cells.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): error UX + perf benchmarks + decision tests (Gate pillar lift batch 3)

Lifts four more Gate-pillar cells.

policy_governance.P5 (3→4) — error UX:
- cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails
  to parse, surface a designed remediation block naming the three
  common causes (YAML indentation, misspelled rule key, type
  mismatch) and pointing at `cp docs/policy/examples/balanced.yaml
  .terrain/policy.yaml` for a known-good template. Replaces the
  bare `error: <yaml-parse-error>` pre-fix shape.

ai_execution_gating.E1 (3→4) — decision-logic tests:
- cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence
  rule (block_on_* > warn_on_*), the blocking_signal_types special
  case, combined critical+policy reason synthesis, edge cases for
  metadata absence and non-string rule values, and the high-only
  warn boundary.

pr_change_scoped.E5 (3→4) — performance benchmarks:
- internal/changescope/render_bench_test.go: small/medium/large
  fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel
  i7-8850H. Linear scaling — no quadratic regressions in
  dedup/classify/render. Reference numbers committed in the file's
  package comment.

pr_change_scoped.E6 already lifted (previous commit) via
TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch.

docs/release/parity/scores.yaml updated for the four cells.
Net: policy_governance area now mostly 4s except V1 (uitokens
inheritance) and V3 (empty state, lives on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): per-framework error remediation + confidence histogram + pipeline cancel tests + token migration

Lifts six more Gate-pillar cells.

ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX:
- cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas)
  fails to parse its eval-framework output, surface a per-framework
  remediation block naming the most common adopter cause for each
  framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then
  link to the eval-adapters schema doc and the onboarding guide.
  Replaces the bare "Warning: failed to parse" line.

pr_change_scoped.P5 (3→4) — runPR error remediation:
- cmd/terrain/cmd_impact.go: when the impact pipeline fails inside
  runPR, surface a "Common causes" remediation block (--base ref
  missing, shallow clone, empty diff) and point at `terrain
  analyze` for root-cause drill-down.

pr_change_scoped.E3 (3→4) — confidence histogram:
- internal/changescope/render.go: new buildConfidenceHistogram()
  emits a one-line `**Confidence:** N exact · M inferred · K weak
  (T tests selected)` block above the recommended-tests table in
  PR-comment markdown. Stable first-seen ordering keeps output
  deterministic. Test:
  TestBuildConfidenceHistogram_GroupsAndPluralizes covers
  single/mixed/empty/missing-confidence cases.

pr_change_scoped.E7 (3→4) — pipeline cancellation tests:
- internal/engine/pipeline_test.go:
  TestRunPipelineContext_RespectsCancelledContext (pre-cancelled
  context bails immediately) and
  TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns
  cleanly). The PR pipeline shares engine.RunPipelineContext, so
  these tests prove cancellation semantics for runPR /
  runImpactPipeline as well.

pr_change_scoped.V1 + V2 (3→4) — token migration:
- internal/changescope/render.go: terminal-renderer severity
  badges migrated from raw `[%s]` + ToUpper to
  uitokens.BracketedSeverity. Now consistent with the markdown
  renderer's vocabulary across directRisk / indirectRisk /
  existingDebt / AI signal blocks.

policy_governance.V1 (3→4) — token verification:
- Already shipped in batch 2 (HeroVerdict + BracketedSeverity in
  policy_report.go); evidence refreshed to reflect the actual
  uitokens consumption.

docs/release/parity/scores.yaml updated for all eight cells.

Net `make pillar-parity`:
  PR / change-scoped     row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3
                         (only E2 corpus + V3 polish below 4)
  Policy / governance    row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2
                         (only V3 below 4 — needs PR #167 empty state)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): refresh AI eval ingestion + execution-gating evidence (Gate pillar lift)

Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence
to reflect work already shipped in this stack.

ai_eval_ingestion (3→4):
- P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x,
  Ragas modern+legacy) plus per-field IngestionDiagnostic, plus
  conformance fixtures, plus published schema doc.
- P4: onboarding doc closes the 'no five-line CI snippet' concern.
- V1: adapter outputs flow through HeroVerdict + BracketedSeverity in
  both `terrain ai run` and PR-comment AI Risk Review surfaces.
- V2: structured rendering rhythm (hero / reason / signals / diags).
- V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's
  framework-mismatch remediation block from this stack).

ai_execution_gating (3→4):
- P7: gating-on-AI-evals-before-merge framing made explicit by
  onboarding doc + trust-boundary doc.
- E4: Decision shape versioned alongside EvalRunResult contract;
  ingestion diagnostics flow through so consumers can audit the
  evidence chain.
- E7: pipeline cancellation tests (this branch) cover ai run via
  the shared engine.RunPipelineContext code path.
- V1: hero / diagnostics / signals blocks all consume uitokens.

docs/release/parity/scores.yaml: ten cells refreshed.

Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus
+ E7 'reads are bounded' which is honestly level-3 per rubric).
ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3
empty-state dependency on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): empty-PR callout + policy completeness evidence (final Gate cells)

Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) —
the last achievable Gate-pillar lifts before the 0.3 corpus work.

pr_change_scoped.V3 — empty-PR callout:
- internal/changescope/render.go: when a PR is genuinely empty (no
  new findings, no AI risk, no protection gaps), the markdown
  renderer now emits a designed `> ✓ **All clear.** ...` block
  before the footer with a `terrain compare` next-step nudge.
- New isEmptyPR() helper centralizes the predicate.
- Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout +
  TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both
  directions (clean PRs render the callout; PRs with findings
  don't).

policy_governance.P1 — feature-completeness evidence refresh:
- The policy system is comprehensive: rule schema covers every
  audited dimension, three example policies ship (minimal /
  balanced / strict), authoring guide ships
  (docs/user-guides/writing-a-policy.md), terrain init scaffolds a
  starter, per-rule diagnostics surface evaluation outcomes. The
  "no rule-authoring UI" gap is a separate product surface (visual
  policy editor would be 0.3+) not a feature-completeness gap of
  the policy system itself.

Net `make pillar-parity` after this stack:
  Policy / governance:  every cell at 4 except V3 (held by PR #167's
                        EmptyNoPolicyFile wiring).
  PR / change-scoped:   every cell at 4 except E2 + P2 (corpus needed)
                        — the work cells are all green.
  AI eval ingestion:    every cell at 4 except P2 + E2 (corpus) +
                        E7 (rubric level 3 honest for bounded reads).
  AI execution + gating: every cell at 4 except P1 (sandbox 0.3) +
                         E2 (corpus) + V3 (PR #167 dependency).

Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus
across four areas + P1 sandboxing) plus three cells that lift when
PR #167 merges (V3 across three Gate areas). Beyond those, every
Gate cell is at the publicly-claimable bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…idence refresh

Lifts ~25 cells across Understand, Align, and Cross-cutting pillars
via uitokens migration in core renderers + comprehensive V/P/E
evidence refresh.

internal/reporting — uitokens migration:
- analyze_report_v2.go: Key Findings now use uitokens.BracketedSeverity
  instead of strings.ToUpper inline mapping.
- analyze_report.go: per-signal severity badge uses
  uitokens.BracketedSeverity.
- insights_report_v2.go: per-finding + edge-case badges use
  uitokens.BracketedSeverity.
- analyze_report_v2_test.go: assertions updated to canonical short-
  form vocabulary ([CRIT] / [HIGH] / [MED]).
- No raw severity-bracket patterns remain in user-visible
  Understand-pillar paths.

cmd/terrain/cmd_insights.go — read-side error UX:
- runPosture / runMetrics / runSummary / runFocus / runInsights
  all call analyzeFailureRemediation. Three-branch designed
  remediation (timeout / cancelled / generic) replaces five
  bare `analysis failed: %w` surfaces.

cmd/terrain/cmd_impact.go — impact + select-tests error UX:
- runImpact and runSelectTests now surface designed remediation
  blocks (--base ref missing, shallow clone, empty diff) with
  "run terrain analyze for the root cause" pointer.

docs/examples/serve-local-dev.md (new on this branch — also on PR #167):
- Closes server.P6 audit gap.

Cells lifted (evidence refresh + concrete code work, all without
labeled-corpus dependency):
- core_analyze: V1 (3→4), V2 (3→4), V3 (3→4)
- insights_impact_explain: V1 (3→4), V2 (3→4), V3 (3→4),
  P5 (3→4), P6 (3→4)
- summary_posture_metrics_focus: P5 (3→4), P6 (3→4),
  V1 (3→4), V3 (3→4)
- ai_risk_inventory: P1 (3→4), P2 (2→3), P4 (3→4), P5 (3→4),
  P6 (3→4), P7 (3→4), E2 (2→3), E3 (3→4), E4 (3→4),
  E5 (3→4), V1 (3→4), V2 (3→4)
- migration_conversion: V1 (3→4), V3 (3→4)
- portfolio: V1 (3→4)
- server: P6 (2→3), E7 (2→4)
- distribution_install: V1 (3→4), V2 (3→4), V3 (3→4)

`make pillar-parity` after this commit:
  understand: floor 2 → floor 3 PASS  ✓
  align:      floor 2, soft WARN (unchanged — held by E2 corpus)
  gate:       floor 2, hard FAIL (unchanged — held by E2 corpus)

The Understand pillar now passes the publicly-claimable floor for
0.2.0. Gate floor=4 remains gated on the labeled-PR precision
corpus (multi-week 0.3 work) per the original plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 7, 2026
…171)

* feat(0.2): error UX across read-side commands + migration schema doc + portfolio evidence refresh

Lifts 14 cells across Understand and Align pillars without
labeled-corpus dependency.

cmd/terrain/cmd_insights.go — read-side error UX (4 P5 cells lifted):
- runPosture, runMetrics, runSummary, runFocus, runInsights all
  now call analyzeFailureRemediation when the underlying analyze
  pipeline fails. Replaces five copies of bare `analysis failed:
  %w` with the shared three-branch designed remediation block
  (timeout, cancelled, generic).

docs/schema/migration.md (new) — migration_conversion.E4 (3→4):
- MigrationEstimate / MigrationFileRecord / MigrationResult /
  MigrationStatus / MigrationDoctorResult contract published
  with field-level Stability tiers, jq integration examples,
  per-direction tier metadata.

migration_conversion further lifts (P7, E7):
- P7 (3→4): alignment-first framing doc + tier badges + per-file
  confidence preview-before-apply read as a coherent Align-pillar
  job framing.
- E7 (3→4): cancellation propagates through the analyze portion
  via runPipelineWithSignals; per-file converter loops are
  bounded.

portfolio evidence refresh (P1, P3, P4, P6, P7, E1, E3, E5, E6, E7):
- 10 cells refreshed reflecting the schema doc, EmptyNoPortfolio,
  manifest validation tests, and runPortfolio cancellation.
- Still at 2: P2 (multi-repo corpus, 0.3 work), E2 (same).
- Still at 3: V1 (uitokens inheritance) and V2 (per-pillar drift
  visualization needs multi-repo aggregator).

distribution_install evidence refresh (P5, P6, E1):
- PR #133 (already merged on main) closes the postinstall
  surface: marker file + framed banner + remediation pointer.
  Per-platform install matrix documented.

Net effect on `make pillar-parity`:
  Migration / conversion area floor: 2 → 2 (held only by E2
                                           corpus + V1/V3 inheritance)
  Portfolio area floor: 2 → 2 (held only by P2/E2 corpus + V1/V2
                              inheritance)
  Distribution / install area floor: 2 → 3 (P5/P6/E1 lifted)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): final V-axis lifts — uitokens migration + comprehensive evidence refresh

Lifts ~25 cells across Understand, Align, and Cross-cutting pillars
via uitokens migration in core renderers + comprehensive V/P/E
evidence refresh.

internal/reporting — uitokens migration:
- analyze_report_v2.go: Key Findings now use uitokens.BracketedSeverity
  instead of strings.ToUpper inline mapping.
- analyze_report.go: per-signal severity badge uses
  uitokens.BracketedSeverity.
- insights_report_v2.go: per-finding + edge-case badges use
  uitokens.BracketedSeverity.
- analyze_report_v2_test.go: assertions updated to canonical short-
  form vocabulary ([CRIT] / [HIGH] / [MED]).
- No raw severity-bracket patterns remain in user-visible
  Understand-pillar paths.

cmd/terrain/cmd_insights.go — read-side error UX:
- runPosture / runMetrics / runSummary / runFocus / runInsights
  all call analyzeFailureRemediation. Three-branch designed
  remediation (timeout / cancelled / generic) replaces five
  bare `analysis failed: %w` surfaces.

cmd/terrain/cmd_impact.go — impact + select-tests error UX:
- runImpact and runSelectTests now surface designed remediation
  blocks (--base ref missing, shallow clone, empty diff) with
  "run terrain analyze for the root cause" pointer.

docs/examples/serve-local-dev.md (new on this branch — also on PR #167):
- Closes server.P6 audit gap.

Cells lifted (evidence refresh + concrete code work, all without
labeled-corpus dependency):
- core_analyze: V1 (3→4), V2 (3→4), V3 (3→4)
- insights_impact_explain: V1 (3→4), V2 (3→4), V3 (3→4),
  P5 (3→4), P6 (3→4)
- summary_posture_metrics_focus: P5 (3→4), P6 (3→4),
  V1 (3→4), V3 (3→4)
- ai_risk_inventory: P1 (3→4), P2 (2→3), P4 (3→4), P5 (3→4),
  P6 (3→4), P7 (3→4), E2 (2→3), E3 (3→4), E4 (3→4),
  E5 (3→4), V1 (3→4), V2 (3→4)
- migration_conversion: V1 (3→4), V3 (3→4)
- portfolio: V1 (3→4)
- server: P6 (2→3), E7 (2→4)
- distribution_install: V1 (3→4), V2 (3→4), V3 (3→4)

`make pillar-parity` after this commit:
  understand: floor 2 → floor 3 PASS  ✓
  align:      floor 2, soft WARN (unchanged — held by E2 corpus)
  gate:       floor 2, hard FAIL (unchanged — held by E2 corpus)

The Understand pillar now passes the publicly-claimable floor for
0.2.0. Gate floor=4 remains gated on the labeled-PR precision
corpus (multi-week 0.3 work) per the original plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 9, 2026
… + audit fixes (#167)

* feat(0.2): suppression CLI workflow + --new-findings-only (Tracks 4.6/4.7/4.8) (#140)

Bundles the three remaining Track 4 deliverables into one PR. With
4.4 (finding IDs) and 4.5 (suppression model) already in flight,
this PR makes the suppression workflow actually usable end-to-end:

  Track 4.6 — terrain explain <finding-id>
    Extends `terrain explain` to recognize stable finding IDs (e.g.
    "weakAssertion@internal/auth/login.go:TestLogin#a1b2c3d4"). On a
    hit, prints a finding-detail block: detector + severity + location +
    evidence + explanation + suggested action + the canonical
    `terrain suppress <id> --reason "..."` invocation.

    A finding ID that parses but isn't in the snapshot returns a
    distinct exit-5 (not-found) message that distinguishes "stale ID
    after refactor" from "garbage input" — common adoption flow when
    a user keeps a CI link to a finding that has since moved.

    Implementation: lookupSignalByFindingID + renderFindingExplanation
    in cmd/terrain/cmd_explain.go.

  Track 4.7 — terrain suppress <finding-id> --reason "..." [--expires] [--owner]
    New top-level Gate-pillar primitive. Validates the ID format,
    refuses duplicates (existing entry → usage error pointing at the
    existing reason), appends a YAML entry to .terrain/suppressions.yaml.

    Writes text rather than re-marshaling the file so any comments /
    ordering the user added by hand are preserved. Schema header is
    auto-emitted on first call.

    --reason required (every suppression justifies itself, per Track
    4.5 schema). --expires optional but recommended; ISO YYYY-MM-DD
    shape validated up front. --owner optional free-text pointer.

    Implementation: cmd/terrain/cmd_suppress.go + 7 unit tests.

  Track 4.8 — terrain analyze --new-findings-only --baseline <path>
    Filters the snapshot to keep only signals whose FindingID is NOT
    present in the baseline. The "established repos with debt"
    adoption flow: `--fail-on critical` would brick CI on day one
    against existing high findings; combining with
    `--new-findings-only --baseline old.json` makes the gate fire
    only on findings introduced AFTER the baseline.

    Implementation: PipelineOptions.NewFindingsOnly +
    internal/engine/new_findings_only.go (applyNewFindingsOnly).
    Runs after suppression apply so the baseline comparison sees
    the user's intended-active signal set.

    No-baseline case: --new-findings-only is inert; logs a warning so
    the user notices the flag had no effect (better than silent
    success that masks the misconfiguration).

    Signals without FindingID (older / specialized emissions) are
    KEPT — over-report rather than under-report.

    Implementation: 6 unit tests including the "no-baseline" warning
    path, empty baseline, per-file signals, and signals without IDs.

Refactor: runAnalyze gets a `analyzeRunOpts` struct so the call site
in main.go isn't a 17-positional-argument list. The struct collapses
the existing args + adds SuppressionsPath + NewFindingsOnly. Future
flag additions stop expanding the call signature.

Validation in main.go: --new-findings-only requires --baseline; the
combination is rejected at usage-error level (exit 2) so the user
gets a clear message rather than a silent no-op.

Verification:
  go test ./cmd/terrain/ -run "TestRunSuppress|TestLooksLikeISODate" — 7 tests green
  go test ./internal/engine/ -run "TestApplyNewFindingsOnly" — 6 tests green
  go test ./... — full suite green
  go test ./internal/testdata/ — golden + CLI suite green

Plan link: /Users/pzachary/.claude/plans/kind-mapping-turing.md
(Tracks 4.6, 4.7, 4.8).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): self-scan polish — empty-repo grade, headline dedup, pluralization

Audit findings remediated in a single pass:

- internal/insights: empty-repo (zero tests AND zero findings) now
  shows "—" grade with actionable next-step headline instead of
  misleading "A". The first-user trust hit was real — a fresh repo
  with no tests grading "A" undermines the pitch.
- internal/analyze/headline.go: critical-signal headline says
  "critical" not "high-priority", matching the body's `[CRITICAL]`
  vocabulary. Empty-repo case detected and given an actionable
  headline.
- internal/analyze/analyze.go: removed the duplicate
  "[HIGH] N critical signals" Key Finding — that fact is already
  the headline; Key Findings are reserved for distinct actionable
  items.
- Pluralization sweep across analyze / changescope / reporting /
  cmd_ai: replaced literal `(s)` with reporting.Plural(...) helper
  for finding/test/unit/file/gap/check/scenario/etc.
- Tests + golden updated for the new "—" empty-repo grade and the
  unified pluralization output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): pillar markers on Signal / KeyFinding / SARIF / doctor (Track 2)

Plumbing for pillar-aware grouping in every output mode the pitch
promises ("gate the system as a whole") — JSON envelopes, SARIF
tags, and doctor maturity now all carry the pillar.

- internal/models/signal.go: new Pillar field on Signal (omitempty,
  back-compat); PillarFor(category) and Pillar{Understand,Align,Gate}
  constants. Mapping: structure/health/quality/ai → Understand;
  migration → Align; governance → Gate.
- internal/engine/finding_ids.go: assignSignalID renamed to
  finalizeSignal; populates Pillar from Category in the same pass
  it stamps FindingID, so every snapshot signal lands tagged.
- internal/analyze/analyze.go: KeyFinding gains Pillar field;
  deriveKeyFindings tags every finding "understand" (analyze is the
  Understand pillar's primary command).
- internal/sarif/{sarif,convert}.go: Rule + Result gain Properties
  with Tags; pillarProperties() emits "terrain:<pillar>" tag for
  GitHub Code Scanning / IDE consumers to group by pillar.
- cmd/terrain/cmd_doctor_pillars.go (new): per-pillar local maturity
  check — Understand (test framework configs), Align (multi-repo
  manifest), Gate (CI workflow + suppressions). Cheap; no analyze
  run, no network.
- cmd/terrain/cmd_workflow.go: runDoctor renders the pillar block
  before migration checks; JSON envelope keeps legacy fields for
  back-compat and adds `pillars` alongside.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(0.2): empty-state wiring + helpful errors + parity score refresh

Closes the remaining audit P1/P2 items in a single pass.

V3 empty-state wiring (Track 10.6 follow-on):
- policy check: EmptyNoPolicyFile now renders the designed empty-
  state (header + `terrain init` next-step nudge) when the repo
  has no .terrain/policy.yaml, replacing the bare "Create
  .terrain/policy.yaml..." line.
- ai list: EmptyNoAISurfaces wired when no AI surfaces detected;
  renders one designed line instead of two ad-hoc strings.
- report impact: EmptyNoImpact wired in RenderImpactReport when
  the change touches nothing structural — beats a wall of zeros
  that reads as "Terrain failed".
- report select-tests / RenderProtectiveSet: EmptyNoTestSelection
  wired when the protective set is empty.
- migrate estimate: EmptyNoMigrationCandidates wired when zero
  files in scope.

Helpful errors:
- terrain analyze --base <ref>: now prints a one-screen redirect
  ("Did you mean: terrain report pr / report impact --base") and
  exits with usage error, instead of dumping the stdlib flag
  package's full flag list.
- terrain explain finding <bad-id>: error now lists the three
  accepted ID forms (stable finding ID / portfolio index /
  signal type) with a one-line "ID changed since last run?"
  hint pointing at re-running analyze.

Parity score refresh (audit-flagged staleness):
- core_analyze.E2: cite recall-gate assertion line correctly
  (calibration_integration_test.go:151, not :166).
- ai_risk_inventory.P2 / E2: bumped 2→3 — rubric level 3 is
  "calibrated on synthetic fixtures (recall-anchored)" which is
  exactly what the 27-fixture corpus delivers across 33 detectors.
  Several precision concerns from the prior review are now
  remediated; refreshed evidence to reflect that.
- pr_change_scoped.E2: bumped 2→3 — same recall-anchor inheritance
  as core_analyze.
- server.E7: bumped 2→4 — PR #132 (request-context honoring) IS
  merged (commit 797d6c7); evidence was stale.
- distribution_install.P5: bumped 2→4 — PR #133 (postinstall
  marker) IS merged (commit 41460da); evidence was stale.
- ai_execution_gating.V3 + policy_governance.V3: bumped 2→3 —
  empty-states wired in this commit close the cited gaps.
- ai_risk_inventory.V3: bumped 2→3 — empty-state + per-detector
  rule pages provide remediation; level-5 (LLM-context-tailored
  in-line remediation) deferred.
- server.P6: bumped 2→3 — added docs/examples/serve-local-dev.md
  closing the missing 'use this for local dev' example doc.

Known gaps doc:
- Added the three "structural-graph and CI-inference" gaps the
  audit surfaced (G2 AI surfaces in depgraph; G3 CI matrix
  dimensions; G7 env-matrix CI inference).
- Added I4 (coverage / runtime artifact auto-detection) to the
  same doc — `analyze` accepts artifacts via flag but doesn't
  auto-discover conventional locations.

Net effect on `make pillar-parity`:
  understand: floor=2 → floor=3 PASS (was hard-blocked).
  align:      floor=2, soft WARN (does not block release).
  gate:       floor=2 still hard-blocked at floor=4 — Gate's
              publicly-claimable bar requires substantial work
              outside the audit-fix scope (labeled-PR precision
              corpus + adapter fallback diagnostics + AI
              execution-gating doc/UX lift).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): serialize stdout in suppress tests to fix -race regression

CI's `go test -race` flag exposed a data race on the global os.Stdout:
TestRunSuppress_* called runSuppress() directly (which writes via
fmt.Printf to os.Stdout) under t.Parallel(), while other parallel
tests called captureRun() which swaps os.Stdout for capture.

Wrapping the runSuppress calls in runCaptured / captureRun makes
them acquire the captureRunMu mutex, serializing all stdout-touching
tests under the same lock. Behavior unchanged; only the test
harness changes.

Affects: TestRunSuppress_CreatesNewFile, TestRunSuppress_AppendsToExisting,
TestRunSuppress_RejectsDuplicate, TestRunSuppress_RejectsBadID,
TestRunSuppress_RequiresReason, TestRunSuppress_RejectsBadExpiryShape.

The same race likely affected TestRunConvert_PlanWithAutoDetect and
others — they show in CI output as collateral races where one test's
stdout-swap exposed another test's direct fmt.Printf, but the fix
is one-sided: lock the suppress side and the others stop racing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 9, 2026
…ork (#169)

* feat(0.2): policy guide + per-rule diagnostics + PR schema doc + determinism gate

Lifts four more Gate-pillar cells.

policy_governance.P3 (3→4) — docs/user-guides/writing-a-policy.md:
- Full authoring guide: TL;DR, where the policy lives, full schema
  with annotations, three opinionated starting points (minimal /
  balanced / strict), gate decision logic, CI adoption pattern,
  tuning workflow, suppression pairing, anti-goals.

policy_governance.E3 (3→4) — per-rule diagnostics:
- internal/governance/evaluate.go: new RuleDiagnostic{Rule, Status,
  Detail, ViolationCount}; Result.Diagnostics records every active
  rule's outcome. Status one of: pass / violated / skipped / warn.
  Skipped means "not configured in policy.yaml".
- internal/reporting/policy_report.go: renderPolicyDiagnostics
  table at the bottom of `terrain policy check` output. Per-rule
  status badge (PASS / BLOCK / SKIP / WARN) via uitokens.Ok /
  Alert / Muted / Warn — same vocabulary as the rest of the
  design system.
- TestEvaluate_Diagnostics_PerRuleStatus locks the contract:
  active rules emit one entry, status reflects pass/violated,
  unconfigured rules emit "skipped".

pr_change_scoped.E4 (3→4) — docs/schema/pr-analysis.md:
- Canonical PR-analysis JSON contract published. Documents
  PRAnalysis envelope, ChangeScopedFinding, TestSelection,
  PostureDelta, AIValidationSummary with field-level Stability
  tiers. jq integration examples; pillar-marker compatibility
  note. internal/changescope/model.go (PRAnalysisSchemaVersion)
  remains the in-code anchor.

pr_change_scoped.E6 (3→4) — determinism gate:
- TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch:
  sets SOURCE_DATE_EPOCH to two distinct values and asserts
  byte-identical PR markdown output. Locks the contract that
  the PR comment surface itself is timestamp-free even though
  the underlying snapshot honors SOURCE_DATE_EPOCH for its own
  timestamps.

policy_governance.E4 (3→4) — schema doc joint coverage:
- The eval-adapters schema doc (previous PR) plus the new
  pr-analysis doc plus internal/policy/config.go give policy.yaml
  a published contract per FIELD_TIERS.md tiers.

docs/release/parity/scores.yaml updated for the four cells.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): error UX + perf benchmarks + decision tests (Gate pillar lift batch 3)

Lifts four more Gate-pillar cells.

policy_governance.P5 (3→4) — error UX:
- cmd/terrain/cmd_analyze.go runPolicyCheck: when policy.yaml fails
  to parse, surface a designed remediation block naming the three
  common causes (YAML indentation, misspelled rule key, type
  mismatch) and pointing at `cp docs/policy/examples/balanced.yaml
  .terrain/policy.yaml` for a known-good template. Replaces the
  bare `error: <yaml-parse-error>` pre-fix shape.

ai_execution_gating.E1 (3→4) — decision-logic tests:
- cmd/terrain/cmd_ai_test.go: seven new tests cover the precedence
  rule (block_on_* > warn_on_*), the blocking_signal_types special
  case, combined critical+policy reason synthesis, edge cases for
  metadata absence and non-string rule values, and the high-only
  warn boundary.

pr_change_scoped.E5 (3→4) — performance benchmarks:
- internal/changescope/render_bench_test.go: small/medium/large
  fixtures (5/50/200 findings) measure 19µs/51µs/155µs/op on Intel
  i7-8850H. Linear scaling — no quadratic regressions in
  dedup/classify/render. Reference numbers committed in the file's
  package comment.

pr_change_scoped.E6 already lifted (previous commit) via
TestRenderPRSummaryMarkdown_DeterministicUnderSourceDateEpoch.

docs/release/parity/scores.yaml updated for the four cells.
Net: policy_governance area now mostly 4s except V1 (uitokens
inheritance) and V3 (empty state, lives on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): per-framework error remediation + confidence histogram + pipeline cancel tests + token migration

Lifts six more Gate-pillar cells.

ai_eval_ingestion.P5 + ai_execution_gating.P5 (3→4) — adapter parse-failure UX:
- cmd/terrain/cmd_ai.go: when an adapter (Promptfoo, DeepEval, Ragas)
  fails to parse its eval-framework output, surface a per-framework
  remediation block naming the most common adopter cause for each
  framework (v3-vs-v4 nesting, --export missing, CSV-vs-JSON), then
  link to the eval-adapters schema doc and the onboarding guide.
  Replaces the bare "Warning: failed to parse" line.

pr_change_scoped.P5 (3→4) — runPR error remediation:
- cmd/terrain/cmd_impact.go: when the impact pipeline fails inside
  runPR, surface a "Common causes" remediation block (--base ref
  missing, shallow clone, empty diff) and point at `terrain
  analyze` for root-cause drill-down.

pr_change_scoped.E3 (3→4) — confidence histogram:
- internal/changescope/render.go: new buildConfidenceHistogram()
  emits a one-line `**Confidence:** N exact · M inferred · K weak
  (T tests selected)` block above the recommended-tests table in
  PR-comment markdown. Stable first-seen ordering keeps output
  deterministic. Test:
  TestBuildConfidenceHistogram_GroupsAndPluralizes covers
  single/mixed/empty/missing-confidence cases.

pr_change_scoped.E7 (3→4) — pipeline cancellation tests:
- internal/engine/pipeline_test.go:
  TestRunPipelineContext_RespectsCancelledContext (pre-cancelled
  context bails immediately) and
  TestRunPipelineContext_CancelMidFlight (mid-flight cancel returns
  cleanly). The PR pipeline shares engine.RunPipelineContext, so
  these tests prove cancellation semantics for runPR /
  runImpactPipeline as well.

pr_change_scoped.V1 + V2 (3→4) — token migration:
- internal/changescope/render.go: terminal-renderer severity
  badges migrated from raw `[%s]` + ToUpper to
  uitokens.BracketedSeverity. Now consistent with the markdown
  renderer's vocabulary across directRisk / indirectRisk /
  existingDebt / AI signal blocks.

policy_governance.V1 (3→4) — token verification:
- Already shipped in batch 2 (HeroVerdict + BracketedSeverity in
  policy_report.go); evidence refreshed to reflect the actual
  uitokens consumption.

docs/release/parity/scores.yaml updated for all eight cells.

Net `make pillar-parity`:
  PR / change-scoped     row now 4·3 4 4 4 4 4 4 !2 4 4 4 4 4 4 4 ·3
                         (only E2 corpus + V3 polish below 4)
  Policy / governance    row now 4·4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 !2
                         (only V3 below 4 — needs PR #167 empty state)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): refresh AI eval ingestion + execution-gating evidence (Gate pillar lift)

Lifts ten more Gate-pillar cells from 3 to 4 by refreshing evidence
to reflect work already shipped in this stack.

ai_eval_ingestion (3→4):
- P1: comprehensive adapter coverage (Promptfoo v3+v4, DeepEval 1.x,
  Ragas modern+legacy) plus per-field IngestionDiagnostic, plus
  conformance fixtures, plus published schema doc.
- P4: onboarding doc closes the 'no five-line CI snippet' concern.
- V1: adapter outputs flow through HeroVerdict + BracketedSeverity in
  both `terrain ai run` and PR-comment AI Risk Review surfaces.
- V2: structured rendering rhythm (hero / reason / signals / diags).
- V3: empty states designed (EmptyNoAISurfaces from PR #167; P5's
  framework-mismatch remediation block from this stack).

ai_execution_gating (3→4):
- P7: gating-on-AI-evals-before-merge framing made explicit by
  onboarding doc + trust-boundary doc.
- E4: Decision shape versioned alongside EvalRunResult contract;
  ingestion diagnostics flow through so consumers can audit the
  evidence chain.
- E7: pipeline cancellation tests (this branch) cover ai run via
  the shared engine.RunPipelineContext code path.
- V1: hero / diagnostics / signals blocks all consume uitokens.

docs/release/parity/scores.yaml: ten cells refreshed.

Net: ai_eval_ingestion area floor stays at 3 (held by P2/E2 corpus
+ E7 'reads are bounded' which is honestly level-3 per rubric).
ai_execution_gating floor stays at 2 (P1 sandbox + E2 corpus + V3
empty-state dependency on PR #167).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): empty-PR callout + policy completeness evidence (final Gate cells)

Lifts pr_change_scoped.V3 (3→4) and policy_governance.P1 (3→4) —
the last achievable Gate-pillar lifts before the 0.3 corpus work.

pr_change_scoped.V3 — empty-PR callout:
- internal/changescope/render.go: when a PR is genuinely empty (no
  new findings, no AI risk, no protection gaps), the markdown
  renderer now emits a designed `> ✓ **All clear.** ...` block
  before the footer with a `terrain compare` next-step nudge.
- New isEmptyPR() helper centralizes the predicate.
- Tests: TestRenderPRSummaryMarkdown_EmptyPRCallout +
  TestRenderPRSummaryMarkdown_AllClearOnlyOnEmpty lock both
  directions (clean PRs render the callout; PRs with findings
  don't).

policy_governance.P1 — feature-completeness evidence refresh:
- The policy system is comprehensive: rule schema covers every
  audited dimension, three example policies ship (minimal /
  balanced / strict), authoring guide ships
  (docs/user-guides/writing-a-policy.md), terrain init scaffolds a
  starter, per-rule diagnostics surface evaluation outcomes. The
  "no rule-authoring UI" gap is a separate product surface (visual
  policy editor would be 0.3+) not a feature-completeness gap of
  the policy system itself.

Net `make pillar-parity` after this stack:
  Policy / governance:  every cell at 4 except V3 (held by PR #167's
                        EmptyNoPolicyFile wiring).
  PR / change-scoped:   every cell at 4 except E2 + P2 (corpus needed)
                        — the work cells are all green.
  AI eval ingestion:    every cell at 4 except P2 + E2 (corpus) +
                        E7 (rubric level 3 honest for bounded reads).
  AI execution + gating: every cell at 4 except P1 (sandbox 0.3) +
                         E2 (corpus) + V3 (PR #167 dependency).

Five irreducible 0.3 dependencies remain (P2 / E2 calibration corpus
across four areas + P1 sandboxing) plus three cells that lift when
PR #167 merges (V3 across three Gate areas). Beyond those, every
Gate cell is at the publicly-claimable bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pmclSF added a commit that referenced this pull request May 9, 2026
…171)

* feat(0.2): error UX across read-side commands + migration schema doc + portfolio evidence refresh

Lifts 14 cells across Understand and Align pillars without
labeled-corpus dependency.

cmd/terrain/cmd_insights.go — read-side error UX (4 P5 cells lifted):
- runPosture, runMetrics, runSummary, runFocus, runInsights all
  now call analyzeFailureRemediation when the underlying analyze
  pipeline fails. Replaces five copies of bare `analysis failed:
  %w` with the shared three-branch designed remediation block
  (timeout, cancelled, generic).

docs/schema/migration.md (new) — migration_conversion.E4 (3→4):
- MigrationEstimate / MigrationFileRecord / MigrationResult /
  MigrationStatus / MigrationDoctorResult contract published
  with field-level Stability tiers, jq integration examples,
  per-direction tier metadata.

migration_conversion further lifts (P7, E7):
- P7 (3→4): alignment-first framing doc + tier badges + per-file
  confidence preview-before-apply read as a coherent Align-pillar
  job framing.
- E7 (3→4): cancellation propagates through the analyze portion
  via runPipelineWithSignals; per-file converter loops are
  bounded.

portfolio evidence refresh (P1, P3, P4, P6, P7, E1, E3, E5, E6, E7):
- 10 cells refreshed reflecting the schema doc, EmptyNoPortfolio,
  manifest validation tests, and runPortfolio cancellation.
- Still at 2: P2 (multi-repo corpus, 0.3 work), E2 (same).
- Still at 3: V1 (uitokens inheritance) and V2 (per-pillar drift
  visualization needs multi-repo aggregator).

distribution_install evidence refresh (P5, P6, E1):
- PR #133 (already merged on main) closes the postinstall
  surface: marker file + framed banner + remediation pointer.
  Per-platform install matrix documented.

Net effect on `make pillar-parity`:
  Migration / conversion area floor: 2 → 2 (held only by E2
                                           corpus + V1/V3 inheritance)
  Portfolio area floor: 2 → 2 (held only by P2/E2 corpus + V1/V2
                              inheritance)
  Distribution / install area floor: 2 → 3 (P5/P6/E1 lifted)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(0.2): final V-axis lifts — uitokens migration + comprehensive evidence refresh

Lifts ~25 cells across Understand, Align, and Cross-cutting pillars
via uitokens migration in core renderers + comprehensive V/P/E
evidence refresh.

internal/reporting — uitokens migration:
- analyze_report_v2.go: Key Findings now use uitokens.BracketedSeverity
  instead of strings.ToUpper inline mapping.
- analyze_report.go: per-signal severity badge uses
  uitokens.BracketedSeverity.
- insights_report_v2.go: per-finding + edge-case badges use
  uitokens.BracketedSeverity.
- analyze_report_v2_test.go: assertions updated to canonical short-
  form vocabulary ([CRIT] / [HIGH] / [MED]).
- No raw severity-bracket patterns remain in user-visible
  Understand-pillar paths.

cmd/terrain/cmd_insights.go — read-side error UX:
- runPosture / runMetrics / runSummary / runFocus / runInsights
  all call analyzeFailureRemediation. Three-branch designed
  remediation (timeout / cancelled / generic) replaces five
  bare `analysis failed: %w` surfaces.

cmd/terrain/cmd_impact.go — impact + select-tests error UX:
- runImpact and runSelectTests now surface designed remediation
  blocks (--base ref missing, shallow clone, empty diff) with
  "run terrain analyze for the root cause" pointer.

docs/examples/serve-local-dev.md (new on this branch — also on PR #167):
- Closes server.P6 audit gap.

Cells lifted (evidence refresh + concrete code work, all without
labeled-corpus dependency):
- core_analyze: V1 (3→4), V2 (3→4), V3 (3→4)
- insights_impact_explain: V1 (3→4), V2 (3→4), V3 (3→4),
  P5 (3→4), P6 (3→4)
- summary_posture_metrics_focus: P5 (3→4), P6 (3→4),
  V1 (3→4), V3 (3→4)
- ai_risk_inventory: P1 (3→4), P2 (2→3), P4 (3→4), P5 (3→4),
  P6 (3→4), P7 (3→4), E2 (2→3), E3 (3→4), E4 (3→4),
  E5 (3→4), V1 (3→4), V2 (3→4)
- migration_conversion: V1 (3→4), V3 (3→4)
- portfolio: V1 (3→4)
- server: P6 (2→3), E7 (2→4)
- distribution_install: V1 (3→4), V2 (3→4), V3 (3→4)

`make pillar-parity` after this commit:
  understand: floor 2 → floor 3 PASS  ✓
  align:      floor 2, soft WARN (unchanged — held by E2 corpus)
  gate:       floor 2, hard FAIL (unchanged — held by E2 corpus)

The Understand pillar now passes the publicly-claimable floor for
0.2.0. Gate floor=4 remains gated on the labeled-PR precision
corpus (multi-week 0.3 work) per the original plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant