Skip to content

Conalh/GovVerdict

Repository files navigation

GovVerdict

npm License: MIT Language: TypeScript Local-only

A merged verdict layer for the agent-gov toolchain. GovVerdict reads canonical JSON reports from ScopeTrail, PolicyMesh, CapabilityEcho, TaskBound, SessionTrail, and related tools, dedupes their findings, and renders one review.

One PR can produce five useful reports. Five separate comments are easy to ignore. GovVerdict collapses them into a single severity-ranked verdict so reviewers see the worst cross-tool signal first.

flowchart LR
    Scope["ScopeTrail<br/>config drift"] --> Gov
    Policy["PolicyMesh<br/>policy contradictions"] --> Gov
    Echo["CapabilityEcho<br/>code capability drift"] --> Gov
    Bound["TaskBound<br/>scope creep"] --> Gov
    Session["SessionTrail<br/>runtime behavior"] --> Gov
    Gov[("GovVerdict<br/>merge + dedupe + render")] --> Review["One PR review<br/>terminal · markdown · JSON"]

    classDef input fill:#1e293b,stroke:#334155,color:#e2e8f0
    classDef engine fill:#0f172a,stroke:#1e293b,color:#e2e8f0,stroke-width:2px
    classDef output fill:#0c4a6e,stroke:#0369a1,color:#e0f2fe
    class Scope,Policy,Echo,Bound,Session input
    class Gov engine
    class Review output
Loading

See also: agent-gov-core for the shared Finding schema · agent-gov-demo for the end-to-end sample PR.

Why this exists

When every detector posts separately, reviewers tune out. The same widened permission can appear in ScopeTrail and PolicyMesh. A low-severity transcript oddity can sit above the critical workflow permission that actually matters. Invalid reports can disappear into CI logs.

GovVerdict exists to make the suite feel like one reviewer: read every canonical report, dedupe stable fingerprints, surface invalid inputs, roll up the aggregate rating, and render one decision-ready summary.

What it merges

Input Role
ScopeTrail PR-level agent config drift.
PolicyMesh Contradictory current policy surfaces.
CapabilityEcho New executable capability in code, manifests, workflows, and Dockerfiles.
TaskBound Stated task vs. actual diff.
SessionTrail Runtime behavior from JSONL transcripts.
AgentPulse / future tools Any compatible canonical Report envelope.

Quickstart

# Assumes the consumer tools already wrote canonical reports to ./reports/
npx govverdict@latest review --reports "reports/*.json" --format term

For the GitHub Action wiring, see examples/agent-gov-review.yml. It runs the suite tools, collects their reports, and hands the glob to Conalh/GovVerdict@v0.2.1.

Example output

Run against the fixture reports in this repo (test/fixtures/*-report.json):

$ npx govverdict review --reports "test/fixtures/*-report.json" --format term

GovVerdict: CRITICAL
====================

Sources: 5 report(s) — capability_echo, policy_mesh, scope_trail, session_trail, task_bound
Findings: 6 unique (deduped 0, dropped below threshold 0)

[CRITICAL] 1
  - capability_echo.workflow_permission_write: CI workflow grants contents: write. (.github/workflows/ci.yml:5)

[HIGH] 1
  - scope_trail.permission_allow_widened: Claude permission allowlist now includes Bash(npm *). (.claude/settings.json:12)

[MEDIUM] 3
  - capability_echo.high_capability_dep_added: New dependency puppeteer at version 22.0.0. (package.json:18)
  - policy_mesh.mcp_command_mismatch: MCP server fileserver disagrees across .mcp.json and .claude/mcp.json. (.mcp.json:4)
  - session_trail.unusual_runtime_tool: Agent invoked WebFetch 8 times — outlier for this conversation. (session.jsonl:142)

[LOW] 1
  - task_bound.out_of_scope_file: Modified docs/CHANGELOG.md outside declared scope src/auth/** (docs/CHANGELOG.md:1)

--format json emits a validated MergedReport envelope; --format md produces a collapsible Markdown summary ready for $GITHUB_STEP_SUMMARY.

How it works

  • Local-only. Reads JSON files off disk, writes one file or stdout. No network, no telemetry, no API keys. The only runtime dependency is agent-gov-core.
  • Substrate, not orchestrator. Wires existing agent-gov-core helpers: validateReport, mergeFindings, applyExceptions, generateWorkflowSummary, emitFindingAnnotation, anyAtOrAbove.
  • Dedup by fingerprint. Each consumer tool sets a stable Finding.fingerprint; identical findings collapse to one row with the duplicateCollapsed counter on the merged report.
  • Never silently drop. Unreadable files, invalid envelopes, and individual malformed findings surface on invalidReports[] / invalidFindings[].
  • Exceptions with expiry. --exceptions baseline.jsonc suppresses active rules; expired rules re-surface as [EXPIRED WHITELIST] low-severity findings so baselines cannot quietly rot.
  • GitHub-aware. Under $GITHUB_ACTIONS=true, emits ::warning / ::error annotations and writes rating, findings-count, invalid-reports-count, merged-report-path to $GITHUB_OUTPUT.

Design choices worth flagging

  • One observable outcome per input. A bad report is itself a reportable condition, not invisible noise.
  • Worst finding wins. The aggregate rating follows the highest surviving severity after thresholding and exceptions.
  • Dedup without erasing context. Fingerprints collapse duplicate findings while keeping counts so repeated tool agreement remains visible.
  • Thin layer. Cross-tool primitives belong in agent-gov-core; GovVerdict is the merge/render layer.

Options

CLI flags (govverdict review --reports <glob> [options]):

Flag Required Default Description
--reports <glob> yes Glob of report JSON files. * and ? in basename only. A directory path expands to its *.json children.
--threshold <sev> no Drop findings below this severity. One of low, medium, high, critical.
--exceptions <path> no JSONC baseline. Either an array or { exceptions: [...] }.
--workflow-name <str> no Propagated onto the merged report. Cross-walks to OpenTelemetry gen_ai.workflow.name.
--format <fmt> no term One of term, md, json. The GitHub Action defaults to md for $GITHUB_STEP_SUMMARY.
--output <path> no stdout Write to file instead of stdout.
--fail-on <sev|none> no none Exit 1 when any surviving finding meets or exceeds this severity.

GitHub Action inputs mirror the CLI flags one-to-one; see action.yml.

Compatibility

GovVerdict consumes the canonical Report envelope (schemaVersion: "1.0") from agent-gov-core@^1.0.0. The consumer tools all emit that envelope starting at the versions below; older releases used pre-canonical shapes and are not supported:

Tool Minimum version
agent-gov-core 1.0.0
ScopeTrail v0.2.0
PolicyMesh v0.5.0
CapabilityEcho v0.2.1
TaskBound v0.7.0
SessionTrail v0.6.1

Part of the agent-gov suite

Local-only OSS tools that review AI-agent PRs and coding sessions for config drift, policy mismatches, and scope creep. Every tool emits the same canonical Finding schema so GovVerdict can merge them.

Repo What it catches
ScopeTrail Agent config drift between PR base and head.
PolicyMesh Contradictory agent instructions and config drift that make behavior non-reproducible.
CapabilityEcho Capability drift introduced by code, manifests, workflows, and Dockerfiles.
TaskBound Scope creep between the stated task and the actual diff.
SessionTrail Risky runtime behavior in Cursor / Claude Code / Codex session transcripts.
AgentPulse Live local trajectory verdicts for active agent sessions.
GovVerdict (this repo) Merges JSON reports from the tools above into one deduped review.
agent-gov-core Shared parsers, the canonical Finding schema, and mergeFindings.
agent-gov-demo Demo sandbox with a deliberately rogue PR that fires every tool.

See the suite end-to-end on a real PR: agent-gov-demo#1.

MIT.

About

One consolidated review across every agent-gov tool — dedupes findings from ScopeTrail, PolicyMesh, CapabilityEcho, TaskBound, and SessionTrail into a single PR verdict. Local-only, MIT.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors