English · 简体中文 · Español · Português · Français · Deutsch · 日本語 · Русский · العربية
The human-in-the-loop for AI coding — automated.
Catch when your AI agent lies, leaves typos, or skips work — then make it prove the work before it says "done."
npx @veltiq/groundtruth setupYour agent says "Done! I added a rateLimiter to src/server.ts, fixed the timeout, and added tests." You commit and move on. Two weeks later production breaks — the rate limiter was never written. The summary lied, and nothing checked it against the diff.
groundtruth is the reviewer that does, on every turn — deterministically, with zero LLM calls for the check:
|
When the summary lies — every claim here is a phantom (the whole "codebase" was one README edit):
|
When it's honest — the same kind of summary, each claim backed by the real diff:
|
Left unsupervised, AI agents confidently report work they never did. A 2026 study of 23,247 agentic pull requests (Gong et al., MSR'26) found that descriptions claiming changes that were never implemented are the single most common message-vs-code inconsistency (45.4%) — and those PRs were accepted 51.7% less often. Tests catch code that's wrong; nothing catches code that was simply never written but reported as done. That's the gap — and the faster agents code, the more slips through.
groundtruth closes it in two stages:
- Verify the claims. It reads the agent's end-of-turn summary, extracts each concrete claim, and grades it against the ground truth — which files changed, which symbols appear in the diff, whether tests or installs actually ran. Built on one rule: the diff doesn't lie.
- Make the agent prove it works (opt-in verify loop). Before finishing, the agent must run / screenshot / test the change against your original request, hunt for its own mistakes, and fix-and-recheck until it holds up.
→ higher-quality output you don't have to babysit. (How it compares to tests, manual review, and AI code reviewers.)
Requires Node ≥ 20. One command wires the Stop hook + verify loop + status line, idempotently:
npx @veltiq/groundtruth setupRestart Claude Code (or run /hooks) and it checks every turn automatically.
Try it in 30 seconds · manual install · plugin
# See it catch a phantom change against a canned transcript — no install, no config:
npx @veltiq/groundtruth verify --transcript examples/phantom-change.jsonl --no-git
# Check the current session without installing anything:
npx @veltiq/groundtruth verify
# Just the claim-check hook (no loop), this project or globally:
npx @veltiq/groundtruth install
npx @veltiq/groundtruth install --globalPrefer plugins?
/plugin marketplace add veltiq/groundtruth
/plugin install groundtruth
The loop can never trap you: a per-session round cap always lets a turn finish, and
GROUNDTRUTH_NO_LOOP=1instantly pauses it.
transcript ─▶ Turn ─▶ ( Evidence + Claims ) ─▶ Verdicts ─▶ Report
summary diff prose per-claim
+ tools ground truth parse check
| Verdict | Meaning |
|---|---|
| ✅ verified | Concrete evidence in the diff backs the claim. |
| ❌ unsupported | Concretely checkable and zero matching evidence — a phantom change. |
| Vague or semantic ("fixed the bug") — shown for attention, never a failure. |
A deliberate bias toward silence: false alarms get a tool like this uninstalled, so a claim is only unsupported when it's unambiguously checkable and nothing supports it. Everything fuzzy becomes review. It would rather miss a claim than wrongly accuse a correct one. → docs/how-it-works.md · docs/design.md
The claim check grades a turn's words; the loop grades its behavior. With it on (setup enables it, or GROUNDTRUTH_LOOP=1), a turn that changed something is held at the Stop event and the agent must verify by the kind of work — open the page in a browser and read a screenshot (web), run the command (CLI), hit the endpoint (API), run the tests (library) — check it against your original request, fix any mistakes, and only finish once it passes. It never judges the work itself (no false positives of its own) and a round cap means it can't loop forever. → docs/verify-loop.md
CLI usage & flags
groundtruth verify # check the latest session for this project
groundtruth verify --transcript x.jsonl # a specific transcript
groundtruth verify --markdown # markdown (great as a PR comment)
groundtruth verify --json | --sarif # machine-readable / GitHub code scanning
groundtruth verify --strict # exit non-zero if anything is unsupported
groundtruth stats [--all] # local tally: verified / unsupported / review
groundtruth install --events Stop,SubagentStop,SessionEnd --statuslineBy default the hook is non-blocking — it prints a report and gets out of the way. --strict (or GROUNDTRUTH_STRICT=1) makes it block on unsupported claims.
What it checks
| Claim | Example | Verified when… |
|---|---|---|
| file | "updated src/auth.ts" |
that file was touched this turn |
| symbol | "added validateInput" |
the identifier appears in the added/removed code |
| test | "added tests" | a test file changed or a test command ran |
| dependency | "installed zod" |
a manifest changed or an install command ran |
| command | "ran the build" | a matching command ran via Bash (advisory) |
| action | "fixed the timeout bug" | not machine-checkable → flagged for review |
Full details in docs/claim-types.md.
Use in CI · commit messages · pre-commit
Grade a PR description against its diff as a sticky comment (works on any PR, zero agent setup):
# .github/workflows/groundtruth.yml
name: groundtruth
on: pull_request
permissions: { contents: read, pull-requests: write }
jobs:
claim-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with: { fetch-depth: 0 }
- uses: veltiq/groundtruth@v0.6.1 # add with: { strict: true } to gate mergesVerify a commit message against the staged diff — drop in .git/hooks/commit-msg, or via pre-commit:
repos:
- repo: https://github.com/veltiq/groundtruth
rev: v0.6.1
hooks:
- id: groundtruthOther agents · config · library API
verify reads other agents' transcripts too — the claim engine is agent-neutral:
groundtruth verify --agent codex|gemini|cursor|opencode|aider|autoOptional .groundtruthrc.json (or a "groundtruth" key in package.json):
{
"strict": false,
"ignore": ["CHANGELOG.md", "*.generated.ts"],
"ignoreKinds": ["command"],
"loop": { "enabled": false, "maxRounds": 6 }
}ignore is your escape hatch for any false positive. Use as a library:
import { runPipeline, renderMarkdown } from "@veltiq/groundtruth";
const report = runPipeline({ transcriptPath: "session.jsonl", cwd: process.cwd() });
console.log(renderMarkdown(report));Privacy & honest limitations
- Runs entirely locally. Reads your transcript and
git, writes nothing except oninstall. Zero network calls, zero runtime deps. The local tally (~/.groundtruth/ledger.jsonl) stores counts only — never code or prompts. - It verifies claimed work exists in the diff, not that it's correct — that's what tests (and the verify loop) are for.
- Extraction favors precision over recall: it misses vague claims rather than risk a false accusation.
Issues and PRs welcome — especially new claim patterns, agent adapters, and false-positive reports (those are gold). See CONTRIBUTING.md.
If groundtruth ever catches your agent in a lie, a ⭐ helps others find it.

