groundtruth

English · 简体中文 · Español · Português · Français · Deutsch · 日本語 · Русский · العربية

groundtruth

The human-in-the-loop for AI coding — automated.
Catch when your AI agent lies, leaves typos, or skips work — then make it prove the work before it says "done."

npx @veltiq/groundtruth setup

Your agent says "Done! I added a rateLimiter to src/server.ts, fixed the timeout, and added tests." You commit and move on. Two weeks later production breaks — the rate limiter was never written. The summary lied, and nothing checked it against the diff.

groundtruth is the reviewer that does, on every turn — deterministically, with zero LLM calls for the check:

When the summary lies — every claim here is a phantom (the whole "codebase" was one README edit):

groundtruth flags three claims the diff doesn't support

When it's honest — the same kind of summary, each claim backed by the real diff:

groundtruth verifies four honest claims against the diff

Why

Left unsupervised, AI agents confidently report work they never did. A 2026 study of 23,247 agentic pull requests (Gong et al., MSR'26) found that descriptions claiming changes that were never implemented are the single most common message-vs-code inconsistency (45.4%) — and those PRs were accepted 51.7% less often. Tests catch code that's wrong; nothing catches code that was simply never written but reported as done. That's the gap — and the faster agents code, the more slips through.

groundtruth closes it in two stages:

Verify the claims. It reads the agent's end-of-turn summary, extracts each concrete claim, and grades it against the ground truth — which files changed, which symbols appear in the diff, whether tests or installs actually ran. Built on one rule: the diff doesn't lie.
Make the agent prove it works (opt-in verify loop). Before finishing, the agent must run / screenshot / test the change against your original request, hunt for its own mistakes, and fix-and-recheck until it holds up.

→ higher-quality output you don't have to babysit. (How it compares to tests, manual review, and AI code reviewers.)

Install

Requires Node ≥ 20. One command wires the Stop hook + verify loop + status line, idempotently:

npx @veltiq/groundtruth setup

Restart Claude Code (or run /hooks) and it checks every turn automatically.

Try it in 30 seconds · manual install · plugin

# See it catch a phantom change against a canned transcript — no install, no config:
npx @veltiq/groundtruth verify --transcript examples/phantom-change.jsonl --no-git

# Check the current session without installing anything:
npx @veltiq/groundtruth verify

# Just the claim-check hook (no loop), this project or globally:
npx @veltiq/groundtruth install
npx @veltiq/groundtruth install --global

Prefer plugins?

/plugin marketplace add veltiq/groundtruth
/plugin install groundtruth

The loop can never trap you: a per-session round cap always lets a turn finish, and GROUNDTRUTH_NO_LOOP=1 instantly pauses it.

How it works

transcript ─▶ Turn ─▶ ( Evidence + Claims ) ─▶ Verdicts ─▶ Report
            summary       diff      prose       per-claim
            + tools    ground truth  parse        check

Verdict	Meaning
✅ verified	Concrete evidence in the diff backs the claim.
❌ unsupported	Concretely checkable and zero matching evidence — a phantom change.
⚠️ review	Vague or semantic ("fixed the bug") — shown for attention, never a failure.

A deliberate bias toward silence: false alarms get a tool like this uninstalled, so a claim is only unsupported when it's unambiguously checkable and nothing supports it. Everything fuzzy becomes review. It would rather miss a claim than wrongly accuse a correct one. → docs/how-it-works.md · docs/design.md

Verify loop — make the agent prove it (opt-in)

The claim check grades a turn's words; the loop grades its behavior. With it on (setup enables it, or GROUNDTRUTH_LOOP=1), a turn that changed something is held at the Stop event and the agent must verify by the kind of work — open the page in a browser and read a screenshot (web), run the command (CLI), hit the endpoint (API), run the tests (library) — check it against your original request, fix any mistakes, and only finish once it passes. It never judges the work itself (no false positives of its own) and a round cap means it can't loop forever. → docs/verify-loop.md

More

CLI usage & flags

groundtruth verify                       # check the latest session for this project
groundtruth verify --transcript x.jsonl  # a specific transcript
groundtruth verify --markdown            # markdown (great as a PR comment)
groundtruth verify --json | --sarif      # machine-readable / GitHub code scanning
groundtruth verify --strict              # exit non-zero if anything is unsupported
groundtruth stats [--all]                # local tally: verified / unsupported / review
groundtruth install --events Stop,SubagentStop,SessionEnd --statusline

By default the hook is non-blocking — it prints a report and gets out of the way. --strict (or GROUNDTRUTH_STRICT=1) makes it block on unsupported claims.

What it checks

Claim	Example	Verified when…
file	"updated `src/auth.ts`"	that file was touched this turn
symbol	"added `validateInput`"	the identifier appears in the added/removed code
test	"added tests"	a test file changed or a test command ran
dependency	"installed `zod`"	a manifest changed or an install command ran
command	"ran the build"	a matching command ran via Bash (advisory)
action	"fixed the timeout bug"	not machine-checkable → flagged for review

Full details in docs/claim-types.md.

Use in CI · commit messages · pre-commit

Grade a PR description against its diff as a sticky comment (works on any PR, zero agent setup):

# .github/workflows/groundtruth.yml
name: groundtruth
on: pull_request
permissions: { contents: read, pull-requests: write }
jobs:
  claim-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
        with: { fetch-depth: 0 }
      - uses: veltiq/groundtruth@v0.6.1   # add  with: { strict: true }  to gate merges

Verify a commit message against the staged diff — drop in .git/hooks/commit-msg, or via pre-commit:

repos:
  - repo: https://github.com/veltiq/groundtruth
    rev: v0.6.1
    hooks:
      - id: groundtruth

→ docs/github-action.md

Other agents · config · library API

verify reads other agents' transcripts too — the claim engine is agent-neutral:

groundtruth verify --agent codex|gemini|cursor|opencode|aider|auto

Optional .groundtruthrc.json (or a "groundtruth" key in package.json):

{
  "strict": false,
  "ignore": ["CHANGELOG.md", "*.generated.ts"],
  "ignoreKinds": ["command"],
  "loop": { "enabled": false, "maxRounds": 6 }
}

ignore is your escape hatch for any false positive. Use as a library:

import { runPipeline, renderMarkdown } from "@veltiq/groundtruth";
const report = runPipeline({ transcriptPath: "session.jsonl", cwd: process.cwd() });
console.log(renderMarkdown(report));

Privacy & honest limitations

Runs entirely locally. Reads your transcript and git, writes nothing except on install. Zero network calls, zero runtime deps. The local tally (~/.groundtruth/ledger.jsonl) stores counts only — never code or prompts.
It verifies claimed work exists in the diff, not that it's correct — that's what tests (and the verify loop) are for.
Extraction favors precision over recall: it misses vague claims rather than risk a false accusation.

Contributing

Issues and PRs welcome — especially new claim patterns, agent adapters, and false-positive reports (those are gold). See CONTRIBUTING.md.

If groundtruth ever catches your agent in a lie, a ⭐ helps others find it.

License

MIT © Veltiq

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.claude-plugin		.claude-plugin
.github		.github
assets		assets
docs		docs
examples		examples
hooks		hooks
scripts		scripts
src		src
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
biome.json		biome.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

groundtruth

Why

Install

How it works

Verify loop — make the agent prove it (opt-in)

More

Contributing

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

groundtruth

Why

Install

How it works

Verify loop — make the agent prove it (opt-in)

More

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages