The unit of trust is not consensus — it is surviving adversarial scrutiny by identified agents whose history is on the public record.
Tribunal is a methodology and tooling for shipping LLM-assisted code without inheriting LLM failure modes. It composes three layers:
- A process backbone — state machine + spec-driven gates + role separation + git discipline.
- A correctness toolkit — adversarial multi-model review on top of cooperative-parallel lens review, plus a verification pyramid.
- An on-chain incentive layer — a soulbound reputation token on Burnt XION (CosmWasm) that tracks per-agent finding outcomes over time.
The methodology is a synthesis of two existing patterns:
reliqlabs/colosseum— adversarial spec review + Rust-flavored verification pyramid.btspoony/mstar-harness— multi-role agent harness with spec-driven gates and lens-diverse QC trio.
The novel piece neither has: reputation-weighted findings. Agents have verifiable identities (ed25519 keypairs), every finding is signed and recorded, outcomes are settled by PMs/QA, and reputation deltas land on-chain.
Live: tribunal.mabus.ai — methodology, sample audits, and the on-chain leaderboard that queries the deployed contract on xion-testnet-2 from the browser.
As of v0.3.4+ (ADR-0002, 2026-05-17), Tribunal's lens-parallel review stage can run through clawpatch as a subprocess. The trust/discovery split:
- Clawpatch owns discovery — heuristic + agent-based feature mapping, per-feature LLM review, fix-and-revalidate loops.
- Tribunal owns trust — agent identity, ed25519-signed findings, adversarial multi-model review, PM/QA-settled outcomes, on-chain reputation.
Run tribunal review --via-clawpatch and the lens stage dispatches via clawpatch instead of expecting skill-trio reports on disk. Findings come back through internal/clawpatch/translate.go, get signed by Tribunal-orchestrator, and land in the existing ledger. Two upstream PRs (#64 --prompt-file, #65 --export-tribunal-ledger) added the integration hooks; both merged 2026-05-18. tribunal fix --finding <id> and tribunal revalidate round-trip state back to clawpatch via signed triage events.
Requires Go 1.23+. Optional: an Anthropic API key (
ANTHROPIC_API_KEY) for the Claude adversary panel; v0.3+ adds Burnt XION on-chain settlement.
go install github.com/dpdanpittman/tribunal/cmd/tribunal@latest
tribunal init # scaffold .tribunal/ in the project
cp tribunal.yaml.example tribunal.yaml # tune panels + verify stack to taste
# Verification pyramid (runs build / fmt / vet / test / ... per stack).
tribunal verify .
# Adversary review stage (lens-parallel trio is dispatched by your host
# harness; this command runs the adversary panel + writes signed findings
# to .tribunal/ledger.jsonl).
ANTHROPIC_API_KEY=sk-... tribunal review --plan P-42
# Inspect what's in the ledger.
tribunal ledger summary
tribunal ledger leaderboard
# v0.3: settle to Burnt XION. Deploy once, then sync per plan.
./scripts/deploy-contract.sh # produces a chain.yaml snippet
tribunal chain init --chain-id xion-testnet-2 --contract cosmwasm1... ...
tribunal chain register claude-adversary
tribunal chain sync --plan P-42
tribunal chain query leaderboardThis repo is in active development. v0.1 ships the methodology, CLI, skills/agents, local ledger, and a Go fizzbuzz example. v0.2 adds multi-model adversarial dispatch and the real verification pyramid. v0.3 adds the CosmWasm contract and on-chain settlement. v0.5 adds the temporal lens, trajectory PBT, and cross-plan findings. v0.5.8 makes the reproducibility field on every finding mandatory (exploit path / trigger sequence / workload+numbers / manifesting cycle / PoC) so downstream maintainers can distinguish real threats from style violations in under 30 seconds.
See CHANGELOG.md for what's released, and docs/methodology.md for the design.
tribunal.mabus.ai — landing, the methodology rendered with sidebar nav, the P-v032-audit case study (Tribunal reviewing its own release), and a live on-chain leaderboard that queries the deployed contract on xion-testnet-2 client-side. Source under site/.
LLMs are fast, broad, and characteristically unreliable. Code review, tests, and audit are human-bottlenecked and scale linearly with reviewer attention. LLM output scales 10–100× faster. That mismatch is the trust gap.
Most multi-agent systems being built today default to cooperative patterns: agents that help, vote, and converge. That is the wrong primitive for correctness. Cooperation amplifies shared mistakes. Adversaries hunt them.
But adversaries that are never held accountable hallucinate findings as freely as cooperators hallucinate code. Tribunal's bet: trust is a function of three things — surviving adversarial scrutiny, by agents with verifiable identity, whose history of findings is on the public record.
- Methodology — the load-bearing design doc
- Incentive mechanism — reputation math
- On-chain protocol — CosmWasm contract surface (v0.3)
- Installation — per-host setup
- ADRs — architecture decisions
GNU AGPLv3 or later. Open-source, copyleft for network use — anyone running this as a service must publish their modifications under the same license.