Figure/table to path, command, and seeds. Aligned with PAPER_CLAIMS. Update when the paper is written.
| Figure / table | Path (in package-release or repo) | Command | Seeds |
|---|---|---|---|
| Throughput vs violations (Pareto) | FIGURES/throughput_vs_violations.png (and .svg) |
labtrust package-release --profile paper_v0.1 --out <dir> (plots from _study via make_plots) |
study: seed_base (default 100) |
| Trust cost vs p95 TAT | FIGURES/trust_cost_vs_p95_tat.png (and .svg) |
same | same |
| Throughput box by condition | FIGURES/throughput_box_by_condition.png |
same | same |
| Metrics overview (dashboard) | FIGURES/metrics_overview.png |
same | same |
| Table 1 (summary by condition) | TABLES/summary.csv, TABLES/summary.md, TABLES/paper_table.md |
same; paper profile runs summarize over _baselines + _study | same |
| Official baseline table (canonical) | benchmarks/baselines_official/v0.2/summary.csv (or summary_v0.2.csv, summary_v0.3.csv) |
Pre-generated; or labtrust generate-official-baselines --out benchmarks/baselines_official/v0.2/ --episodes 200 --seed 123 --force |
--seed 123 |
Paper profile (all-in-one): labtrust package-release --profile paper_v0.1 --seed-base 100 --out <dir>. This runs: (1) generate-official-baselines into <dir>/_baselines/, (2) insider_key_misuse strict_signatures ablation study into <dir>/_study/, (3) summarize into <dir>/TABLES/, (4) representative runs and receipts per task, (5) SECURITY/, TRANSPARENCY_LOG/, SAFETY_CASE/, (6) make_plots(_study) → copies figures to <dir>/FIGURES/, (7) BENCHMARK_CARD, COORDINATION_CARD, MANIFEST.
Standalone study + plots: labtrust run-study --spec <spec.yaml> --out <study_dir> then labtrust make-plots --run <study_dir>. Example spec: policy/studies/trust_ablations.v0.1.yaml (throughput_sla trust/dual ablations) or the inline spec used by paper profile (insider_key_misuse, strict_signatures [False, True]).
Verification: labtrust verify-bundle --bundle <dir>/receipts/<task>/EvidenceBundle.v0.1 per task, or labtrust verify-release --release-dir <dir> for the full release. For strict fingerprint checks: labtrust verify-release --release-dir <dir> --strict-fingerprints.
A single tarball (e.g. from GitHub Release or Zenodo) should contain or point to:
- Wheel/sdist:
pip install labtrust-gym[env,plots] - Policy: bundled in wheel or
policy/in repo - This provenance map:
docs/benchmarks/paper/README.md - CONTRACTS:
docs/contracts/frozen_contracts.md(if present) - PAPER_CLAIMS:
docs/benchmarks/PAPER_CLAIMS.md
Verification: run quick-eval, package-release paper_v0.1, verify-release on the produced directory.
The paper profile is the reference demonstration for external reviewers: the same commands, seeds, and verification steps produce identical artifacts so anyone can reproduce the result.
- Build the paper-ready artifact:
labtrust package-release --profile paper_v0.1 --seed-base 100 --out <dir> - Verify the release:
labtrust verify-release --release-dir <dir> --strict-fingerprints
Success: verify-release exits 0; all EvidenceBundles and RELEASE_MANIFEST validate. See Quick demos and Trust verification.