A small CLI for agent-callable code review.
diffwarden lets coding agents request a review of local changes, a branch diff, or a
single commit, then receive Markdown or structured JSON findings. The CLI owns target
resolution, review prompting, parsing, validation, and rendering; reviewer SDKs and CLIs
stay behind adapters.
Install from the GitHub source release or a local checkout:
git clone https://github.com/aurokin/diffwarden.git
cd diffwarden
git checkout v0.2.7
pnpm install
pnpm build
pnpm link --global
diffwarden --versionFor local development without installing the binary:
pnpm install
pnpm build
pnpm dev -- --target uncommitted --reviewer fakeThe project requires Node >=22.19.0.
diffwarden --target uncommitted --reviewer fake
diffwarden --target base:main --reviewer cursor
diffwarden --target base:main --reviewer claude --model sonnet --effort high
diffwarden --target base:main --reviewer pi --model anthropic/claude-sonnet-4-5
diffwarden --target base:main --reviewer droid-cli --model claude-opus-4-7
diffwarden --target base:main --reviewer-set 2
diffwarden --target base:main --reviewer cursor --reviewer pi:openrouter-high
diffwarden --target commit:abc123 --format json
diffwarden --target base:main --reviewer-set 2 --report
diffwarden --target base:main --reviewer-set 2 --fail-on-findings P2Verify reviewer runtime, auth, model, and effort settings without reviewing a diff:
diffwarden doctor --reviewer cursor --model composer-2.5
diffwarden doctor --reviewer claude --model sonnet --effort high
diffwarden doctor --reviewer pi --model anthropic/claude-sonnet-4-5Supported v1 targets:
uncommittedbase:<branch>commit:<sha>custom:<text>
custom:<text> is for repository-scoped review instructions rather than a precomputed
patch. It still runs reviewer preflight, prompt assembly, parsing, schema validation,
path validation, aggregation, and rendering, but it does not collect a diff, populate
changed_files, embed a patch fence in the prompt, or validate findings against
changed-line overlap.
When no --reviewer or --reviewer-set is provided, config must define
defaultReviewerSet; otherwise the CLI exits with a config-required error. For local
development and credential-free tests, pass --reviewer fake explicitly.
Create a starter user config with:
diffwarden init--format selects how results are written to stdout:
| Format | Stable machine contract? | What stdout receives |
|---|---|---|
markdown (default) |
Human-readable, not a parsing contract | One rendered report after every reviewer finishes |
json |
Yes | One final ReviewArtifact JSON object after every reviewer finishes |
ndjson |
Yes (versioned event stream) | Newline-delimited review events as work progresses |
markdown and json are final-result-only and unchanged: stdout stays empty until
aggregation completes. They remain the right choice when you only need the finished
artifact.
ndjson streams typed review events for incremental consumers (agents, CI). Each line is
one JSON event carrying schema_version: 2:
diffwarden --target base:main --reviewer-set 2 --format ndjson{"schema_version":2,"type":"run_started","cwd":"…","target":{…},"reviewers":[{"id":"pi","engine":"pi"}]}
{"schema_version":2,"type":"preflight_started","reviewer_id":"pi"}
{"schema_version":2,"type":"preflight_finished","reviewer_id":"pi","ok":true,"timing_ms":120}
{"schema_version":2,"type":"reviewer_started","reviewer_id":"pi"}
{"schema_version":2,"type":"reviewer_result","reviewer_id":"pi","provisional":true,"artifact":{…}}
{"schema_version":2,"type":"final_result","artifact":{…}}Event-stream guarantees:
- Once
run_startedis emitted, the stream always ends with exactly one terminal frame:final_result(authoritative aggregatedReviewArtifact) orerror(an expected terminal failure such as all reviewers failing or a strict-mode violation). reviewer_resultevents are provisional (provisional: true): their findings are pre-aggregation and are not yet deduplicated or merged across reviewers. Onlyfinal_result.artifactis authoritative — treat it as the equivalent of--format json.- Under concurrency,
reviewer_result/reviewer_failedarrive in completion order, but thereviewersarray infinal_result.artifactalways follows selection order. --out,--report, and--fail-on-findingsoperate on the final artifact and behave identically across formats. Inndjsonmode a terminalerrorframe is emitted and the process exits non-zero without throwing, so the stream stays a clean sequence of frames.--verboseonly shapesmarkdownand is rejected together with--format ndjson.
Human progress (not a contract): when stdout is markdown or json and stderr is a
TTY, diffwarden prints per-reviewer progress lines to stderr so long multi-reviewer
runs are not silent. This is purely informational, is suppressed when stderr is not a TTY
(pipes, CI), and never appears in ndjson mode. Only stdout carries the stable contracts.
Reports are opt-in. Use --report to persist an analysis-friendly JSON record of a run:
diffwarden --target base:main --reviewer-set 2 --report
diffwarden --target custom:"Review auth paths" --reviewer pi --report --report-scope repo
diffwarden --target uncommitted --reviewer fake --report --report-dir ./tmp/reportsReports include the cwd, target mode, custom instructions for custom:<text> targets,
Diffwarden version, invocation options, config path/hash when a config is loaded, requested
and resolved reviewers, reviewer engine/transport/model metadata, adapter/preflight metadata,
adapter usage data when available, per-reviewer elapsed time and findings, failure summaries,
and precomputed finding counts. Diff-backed reports store a stable SHA-256 hash and byte count
for the reviewed patch; the patch text itself is not persisted in report provenance.
The default global store is under the user state directory; repo-scoped reports go under
.diffwarden/reports/. Reports may contain review text that echoes source or diff content,
so they are never written unless explicitly enabled by CLI or config. --out still writes one
requested ReviewArtifact; --report appends durable history.
Diffwarden includes a reusable skill for agents that want to call the installed CLI from another repository:
skills/diffwarden/
This skill is for agents using Diffwarden, not for agents developing this repo. Consumers should install it with the Skills CLI so their agent-specific skill directories and lockfiles stay consistent:
npx skills add aurokin/diffwarden --global --skill diffwarden --agent codex claude-code --full-depthFor local Diffwarden development, symlink the checkout skill into the local agent skill directories instead. This keeps skill edits live without reinstalling from a release:
pnpm install:skillThe local installer links skills/diffwarden/ into ~/.agents/skills/diffwarden and
~/.claude/skills/diffwarden. If ~/code/custom_skills exists, it also adds
diffwarden to .skills.local.json preserveGlobalSkillNames so that repo's global sync
does not remove the manually linked development skill.
Implemented:
- TypeScript CLI scaffold.
- Git target resolution for uncommitted, base branch, and single-commit reviews.
- Custom instruction targets for repository-scoped reviews.
- Fake reviewer for credential-free development.
- Review parsing, rendering, validation, and aggregation.
- Opt-in review history reports.
- Project/user
diffwarden.config.jsondiscovery. - Reviewer sets and
engine[:profile]reviewer specs. - Cursor, Claude, Pi, and Droid Agent SDK adapters.
- Experimental Codex app-server transport with shared server reuse and ephemeral read-only threads.
- Thin CLI transport adapters for Codex, Gemini, OpenCode, Grok, Antigravity, and CLI variants of Cursor, Claude, Pi, and Droid.
Not implemented:
- Publishing review comments to external services.
- npm publishing. GitHub source releases are available.
Read from top to bottom until you have enough detail:
README.md- quickstart, current status, and common commands.docs/comparisons.md- Codex review comparison and SDK vs CLI transport tradeoffs.docs/features.md- supported reviewer feature matrix.docs/configuration.md- config files, reviewer sets, and environment defaults.docs/adapters.md- SDK and CLI reviewer adapter behavior.docs/macos.md- macOS executable trust and performance triage.QUALITY.md- lint, typecheck, test, coverage, complexity, and e2e commands.SPEC.md- full product and architecture specification.REFERENCES.md- upstream documentation and source-of-truth links.
- Simple CLI first: agents call one command and get a review.
- SDK-agnostic internals: adapter differences stay out of core review logic.
- Codex-style review semantics and output schema.
- Structured review results first; readable Markdown by default.
- Read-only behavior by default.
- Adapter read-only guarantees must be documented explicitly.
- External comment publishing and write-capable tools are permanently out of scope.
- Avoid stale docs: link to upstream SDK docs instead of copying API details here.
