This guide is for organizations that fork LabTrust-Gym to run coordination benchmarks, security and safety suites, and determine the best coordination technique at scale. It covers (1) getting started: clone, customize policy, run everything; and (2) pipeline and extension: out-of-the-box flow, partner overlays, and how to extend.
One-command quickstart: From a clean clone, run labtrust forker-quickstart --out <dir> (or bash scripts/forker_quickstart.sh [<dir>] / scripts/forker_quickstart.ps1 on Windows). This runs validate-policy, coordination security pack, build-lab-coordination-report, and export-risk-register. See Troubleshooting if something fails. For a table of canonical demo commands and minimal end-to-end stories, see Quick demos.
How-to guides: Add a coordination method, add a risk injection, tune the selection policy, interpret security/gate failures.
- Python 3.11+ (3.12 recommended).
- Git (fork on GitHub/GitLab, then clone your fork).
- Windows: Use PowerShell for the scripts below. Avoid repo paths with accented characters; clone to a path like
C:\LabTrust-Gymif needed. See Installation.
git clone https://github.com/YOUR_ORG/LabTrust-Gym.git
cd LabTrust-Gym
python -m venv .venv
# Windows: .\.venv\Scripts\Activate.ps1
# Linux/macOS: source .venv/bin/activate
pip install -e ".[dev,env,plots]"
labtrust --versionPolicy is read from the repo policy/ when developing from source. Override with LABTRUST_POLICY_DIR if needed.
All of the following are data-driven under policy/. No engine code change is required for typical lab customization.
| What | Where |
|---|---|
| Partner overlay | policy/partners/<partner_id>/; register in policy/partners/partners_index.v0.1.yaml |
| Zones and layout | policy/zones/zone_layout_policy.v0.1.yaml |
| Catalogue | policy/catalogue/ |
| Coordination methods | policy/coordination/coordination_methods.v0.1.yaml |
| Scale configs | policy/coordination/scale_configs.v0.1.yaml |
| Coordination study spec | policy/coordination/coordination_study_spec.v0.1.yaml |
| Selection policy | policy/coordination/coordination_selection_policy.v0.1.yaml |
| Risk registry | policy/risks/risk_registry.v0.1.yaml |
| RBAC | policy/rbac/rbac_policy.v0.1.yaml |
| Reason codes | policy/reason_codes/reason_code_registry.v0.1.yaml |
| Invariants | policy/invariants/ |
| Golden scenarios | policy/golden/golden_scenarios.v0.1.yaml |
Partner overlay (recommended): Add an entry in policy/partners/partners_index.v0.1.yaml, create policy/partners/<partner_id>/ with overrides (e.g. copy from policy/partners/hsl_like/), then run labtrust validate-policy --partner <partner_id> and use --partner <partner_id> on benchmark and forker commands.
Path B (no fork): Depend on labtrust-gym as a library and ship your own pip-installable package; use Extension development and Lab profile reference.
labtrust validate-policy
# With partner: labtrust validate-policy --partner hsl_like
pytest -q
# Or: make testForker quickstart (recommended after customizing policy):
labtrust forker-quickstart --out labtrust_runs/forker_quickstartOutputs go to labtrust_runs/ or --out. Key commands: validate-policy, quick-eval, bench-smoke, run-benchmark, eval-agent, export-receipts, export-fhir, verify-bundle, verify-release, run-security-suite, safety-case, export-risk-register, run-coordination-security-pack, build-lab-coordination-report, run-coordination-study, summarize-coordination, run-study, make-plots, reproduce, package-release, run-official-pack. See the main Getting started CLI table for full list.
Two minimal sequences you can run from a clean clone to reproduce a full pipeline in under 15 minutes.
Story 1 (default policy)
Run from repo root:
labtrust validate-policylabtrust quick-eval --seed 42 --out-dir labtrust_runs/demolabtrust run-coordination-security-pack --out labtrust_runs/demo/pack --matrix-preset hospital_lablabtrust export-risk-register --out labtrust_runs/demo/risk_out --runs labtrust_runs/demo/pack
Expected outputs: Exit 0 at each step. You should see: labtrust_runs/demo/quick_eval_*/summary.md; labtrust_runs/demo/pack/pack_summary.csv and pack_gate.md; labtrust_runs/demo/pack/summary/sota_leaderboard.md, sota_leaderboard_full.md, method_class_comparison.md (when the pack is summarized); labtrust_runs/demo/risk_out/RISK_REGISTER_BUNDLE.v0.1.json. Optional: if a run produced receipts, run labtrust verify-bundle --bundle <path> on one EvidenceBundle under receipts/.../EvidenceBundle.v0.1.
Story 2 (HSL-like partner)
Same flow with --partner hsl_like on every command: validate-policy --partner hsl_like, quick-eval --seed 42 --out-dir labtrust_runs/demo --partner hsl_like, run-coordination-security-pack --out labtrust_runs/demo/pack --matrix-preset hospital_lab --partner hsl_like, export-risk-register --out labtrust_runs/demo/risk_out --runs labtrust_runs/demo/pack. The only difference is that the partner overlay is used; outputs have the same structure, with partner_id present in the bundle where applicable.
Treat each partner as a concrete lab instance; run the same pipeline with --partner <id>.
| Partner | Commands to run | Tasks/scales | Success looks like |
|---|---|---|---|
| hsl_like | labtrust validate-policy --partner hsl_like; labtrust quick-eval --partner hsl_like --seed 42 --out-dir <dir>; labtrust run-coordination-security-pack --out <pack_dir> --matrix-preset hospital_lab --partner hsl_like; labtrust export-risk-register --out <risk_dir> --runs <pack_dir> |
Default from pack (hospital_lab) | (a) verify-release passes when run on the produced release (e.g. after building a release from that dir or running package-release and pointing to it). (b) Gate verdicts visible in pack_gate.md: PASS / FAIL / not_supported as expected per cell. |
A partner lab cloned the repo, added the HSL-like partner overlay (already present in the repo), and ran the forker path to produce benchmarks and a risk register. Outcome: benchmarks ran, the coordination pack produced pack_gate.md with verdicts per cell, and the risk register bundle was generated and validated.
Commands (synthetic journey):
- Clone and install:
git clone ...,pip install -e ".[dev,env,plots]",labtrust --version labtrust validate-policy --partner hsl_likelabtrust forker-quickstart --out labtrust_runs/forker_quickstartlabtrust run-official-pack --out labtrust_runs/official_pack --seed-base 100 --include-coordination-packlabtrust export-risk-register --out labtrust_runs/risk_out --runs labtrust_runs/official_pack
Result: one output tree with baselines, SECURITY/, coordination pack outputs, and RISK_REGISTER_BUNDLE.v0.1.json suitable for audit or further verification.
See Troubleshooting and Installation.
Replace <dir>, <dir2>, <dir3> with actual paths.
- Validate policy:
labtrust validate-policy(or--partner hsl_like). - Run coordination security pack:
labtrust run-coordination-security-pack --out <dir> --matrix-preset hospital_lab. - Build lab coordination report:
labtrust build-lab-coordination-report --pack-dir <dir> [--out <dir>]. - Use the decision: Open
COORDINATION_DECISION.v0.1.jsonorCOORDINATION_DECISION.mdfor chosen method per scale;LAB_COORDINATION_REPORT.mdfor the full story. - Export risk register:
labtrust export-risk-register --out <dir2> --runs <dir>. - Optional (official pack):
labtrust run-official-pack --out <dir3> --seed-base 42thenlabtrust export-risk-register --out <dir2> --runs <dir3>(or--include-official-pack <dir3>).
- Edit
policy/partners/partners_index.v0.1.yaml: addpartner_id,description,overlay_path: "policy/partners/<partner_id>". - Create
policy/partners/<partner_id>/with overlay files (seehsl_like/). - Validate:
labtrust validate-policy --partner <partner_id>. - Use
--partner <partner_id>onrun-benchmark,validate-policy,quick-eval,reproduce,run-coordination-security-pack,build-lab-coordination-report,run-coordination-study,run-official-pack,export-risk-register.
Partner overlays can also provide risk registry, security attack suite, benchmark pack, and coordination study spec when those files exist under the partner path.
- Methods:
policy/coordination/coordination_methods.v0.1.yaml. Forkers can add or tune methods within the schema. - Scale configs:
policy/coordination/scale_configs.v0.1.yaml(e.g.small_smoke,medium_stress_signed_bus,corridor_heavy). - Study matrix:
policy/coordination/coordination_study_spec.v0.1.yaml. - Best method at scale: Decided by
recommend-coordination-methodusingpolicy/coordination/coordination_selection_policy.v0.1.yaml(objective, constraints, per-scale rules).
The coordination security pack produces pack_gate.md (PASS/FAIL/not_supported per cell). Gate rules: policy/coordination/coordination_security_pack_gate.v0.1.yaml.
- When the gate fails: Any cell FAIL sets the coordination decision verdict to security_gate_failed. Resolve before deploying.
- Check before release:
labtrust check-security-gate --run <dir>(exit 0 if all PASS or not_supported).
- COORDINATION_DECISION.v0.1.json / COORDINATION_DECISION.md: Chosen method per scale (or "no admissible method" / "security_gate_failed").
- LAB_COORDINATION_REPORT.md: Stakeholder report (gate, risk matrix, leaderboard, decision, next steps).
- pack_gate.md: Pass/fail per cell.
- SECURITY/coordination_risk_matrix.csv (and .md): Method x injection outcomes.
policy/coordination/coordination_selection_policy.v0.1.yaml: objective (e.g. maximize_overall_score), constraints (violations, attack success rate, cost ceiling), per_scale_rules. Copy and edit for your risk appetite.
- Path A (policy + partner only): Same engine and tasks; add partner overlay, scale configs, coordination methods, or injections.
- Path B (policy + custom tasks): Fork and add a new task in
src/labtrust_gym/benchmarks/tasks.py(subclassBenchmarkTask, register in_TASK_REGISTRY).
- Full release dir:
labtrust verify-release --release-dir <dir> [--strict-fingerprints]. - Single EvidenceBundle:
labtrust verify-bundle --bundle <path>(path underreceipts/.../EvidenceBundle.v0.1). - E2E:
bash scripts/ci_e2e_artifacts_chain.sh(package-release minimal, export-risk-register, build-release-manifest, verify-release --strict-fingerprints).
export-risk-register builds the bundle from policy and run dirs passed with --runs. Run dirs should contain e.g. pack_summary.csv, SECURITY/, summary/, COORDINATION_DECISION.v0.1.json. See Risk register.