The UI must not depend on raw internal logs or ad-hoc shapes. It consumes ui-export output as the primary input. This document specifies expected folder layouts, required files, relationships, and schema version handling.
Run:
labtrust ui-export --run <dir> --out <ui_bundle.zip>The bundle contains normalized, UI-ready JSON:
| File | Description |
|---|---|
| index.json | Episodes, tasks, baselines; file refs (results path, log path, receipts path) per episode. When the run dir contains coordination pack or lab report output, includes coordination_artifacts: list of { path, label }; those files are also included in the zip under coordination/. |
| events.json | All step outcomes in one array: normalized gate fields (status, blocked_reason_code, violations, emits, token_consumed, t_s, agent_id, action_type, event_id). Optionally chunked by episode in future. |
| receipts_index.json | List of receipt locations: task/label → path and list of receipt filenames (e.g. receipt_specimen_S1.v0.1.json). |
| reason_codes.json | Full reason code registry (code → namespace, severity, description, etc.) so UI does not parse policy YAML. |
Acceptance: UI can depend on ui-export output as primary input, not raw internal logs.
--run <dir> accepts either a labtrust_runs run or a package-release output directory.
Typical path: labtrust_runs/quick_eval_YYYYMMDD_HHMMSS/.
| Path | Description |
|---|---|
throughput_sla.json, adversarial_disruption.json, multi_site_stat.json |
Results files (schema results.v0.2). One file per task; each may contain multiple episodes. |
logs/throughput_sla.jsonl, logs/adversarial_disruption.jsonl, logs/multi_site_stat.jsonl |
Episode log (JSONL): one line per step; same order as steps in run. |
summary.md |
Human-readable summary (optional). |
Relationships:
- For each
X.jsonin run root, there may belogs/X.jsonl(task id derived from filename: e.g.throughput_sla.json→ taskthroughput_sla). - Episodes in
X.jsonare ordered; the i-th episode corresponds to the same run that produced the lines inlogs/X.jsonl(when num_episodes is 1, the whole JSONL is one episode). - No receipts directory in plain quick-eval;
receipts_index.jsonin the ui-export will be empty or omit this run’s receipts.
Typical path: <out>/ from labtrust package-release --profile paper_v0.1 --out <out>.
| Path | Description |
|---|---|
_baselines/ |
Official baselines: results/*.json, summary.csv, summary.md, metadata.json. |
_study/ |
Study run: manifest.json, results/, logs/ (per condition), figures/. |
_repr/<task>/ |
Representative run per task: episodes.jsonl, results.json. |
receipts/<task>/ |
Receipts and EvidenceBundle.v0.1 per task (e.g. receipts/throughput_sla/EvidenceBundle.v0.1/, receipts/throughput_sla/receipt_*.v0.1.json). |
FIGURES/, TABLES/ |
Plots and summary tables. |
metadata.json, RELEASE_NOTES.md |
Run metadata. |
Relationships:
- Episodes / tasks: From
_repr/<task>/results.jsonand_repr/<task>/episodes.jsonl; from_baselines/results/*.json(task from filename); from_study/results/and_study/logs/(condition_ids from manifest). - Receipts: For each
receipts/<task>/, listEvidenceBundle.v0.1/*.jsonand anyreceipt_*.v0.1.jsoninreceipts/<task>/; link to episode by task (and optionally condition_id for study). - Baselines: From
_baselines/results/*.jsonand_baselines/metadata.json; baseline names from metadata or filenames.
Inferring relationships:
- Task → results:
results.jsonor<TaskName>.json; schema version inschema_version(e.g.0.2). - Task → log: Same directory as results:
episodes.jsonlorlogs/<TaskName>.jsonl. - Task → receipts:
receipts/<task>/; receipt files matchreceipt_*.v0.1.jsonor live insideEvidenceBundle.v0.1/. - Event → episode: Events in
events.jsoncan carryepisode_key(e.g.task+episode_index) so UI can group by episode.
When --run <dir> is a directory that contains coordination security pack output or a lab report (e.g. from labtrust run-coordination-security-pack plus labtrust build-lab-coordination-report, or labtrust run-official-pack --include-coordination-pack which writes into coordination_pack/), ui-export scans for these artifacts and adds them to the bundle.
| Path (relative to run dir) | Description |
|---|---|
pack_summary.csv |
One row per cell (scale x method x injection). |
pack_gate.md |
PASS/FAIL/not_supported per cell. |
SECURITY/coordination_risk_matrix.csv, .md |
Method x injection x phase outcomes. |
LAB_COORDINATION_REPORT.md |
Single lab report with scope, decision, artifact table. |
COORDINATION_DECISION.v0.1.json, .md |
Chosen method per scale. |
summary/sota_leaderboard.md |
SOTA leaderboard (main): compact table with key metrics and optional run metadata. |
summary/sota_leaderboard_full.md |
SOTA leaderboard (full metrics): all aggregated numeric columns. |
summary/sota_leaderboard_full.csv |
Full leaderboard in CSV form for programmatic use. |
summary/method_class_comparison.md |
Method-class comparison (throughput, violations, blocks, resilience, attack_success_rate, stealth, n_cells). |
graphs/sota_key_metrics.html |
Primary chart: key metrics by method (throughput, resilience, safety, security) in one view. |
graphs/throughput_by_method.html |
Throughput (mean) by method. |
graphs/violations_by_method.html |
Violations (mean) by method. |
graphs/resilience_by_method.html |
Resilience score by method. |
graphs/method_class_comparison.html |
Method-class comparison (throughput and resilience by class). |
When present, index.json includes coordination_artifacts: a list of { "path": "<rel>", "label": "..." } for each found file. Graph HTML files are generated at export time from pack_summary.csv (or summary_coord.csv) and included under coordination/graphs/. Paths may be under coordination_pack/ when the run is an official pack with --include-coordination-pack. The same files are included in the zip under the prefix coordination/ (e.g. coordination/pack_summary.csv, coordination/summary/sota_leaderboard_full.md) so the UI can link to or load them without reading the raw run dir.
- Main leaderboard (
summary/sota_leaderboard.md,.csv): Single table with the most important hospital-lab metrics per method: throughput_mean, throughput_std, violations_mean, blocks_mean, resilience_score_mean, resilience_score_std, p95_tat_mean, on_time_rate_mean, critical_compliance_mean, attack_success_rate_mean, stealth_success_rate_mean, n_cells. Whenpack_manifest.jsonexists, the Markdown includes a Run metadata line (seed_base, git_sha) at the top. - Full leaderboard (
summary/sota_leaderboard_full.md,.csv): All aggregated numeric metrics per method; columns depend on the data source (pack_summary vs summary_coord). Use for detailed analysis (security detection/containment, comm, LLM economics). See Hospital lab key metrics. - Method-class comparison (
summary/method_class_comparison.md,.csv): Same metrics aggregated by coordination class (e.g. kernel_schedulers, centralized, llm), including blocks_mean and attack_success_rate_mean. The UI may show the main leaderboard by default and link to the full leaderboard and method-class comparison for drill-down.
When the run contains pack_summary.csv (or summary_coord.csv), ui-export generates self-contained HTML charts (Chart.js via CDN) and adds them to the bundle under coordination/graphs/:
- Primary:
graphs/sota_key_metrics.html— one state-of-the-art chart with four normalized metrics (throughput, resilience, safety, security) per method for at-a-glance comparison. - Additional:
graphs/throughput_by_method.html,graphs/violations_by_method.html,graphs/resilience_by_method.html,graphs/method_class_comparison.htmlfor single-metric and method-class views.
| Need | Source |
|---|---|
| List of tasks | From result filenames (e.g. throughput_sla.json) or from _repr/, _baselines/results/, _study/results/. |
| Episodes per task | From results.json / TaskX.json → episodes array; length = number of episodes. |
| Step-level outcomes | From episode log JSONL; each line = one step. ui-export normalizes these into events.json with stable field names. |
| Receipts per task | From receipts/<task>/ and EvidenceBundle.v0.1/ contents; list in receipts_index.json. |
| Reason code labels | From reason_codes.json (exported from policy); key = code, value = { namespace, severity, description, ... }. |
index.json (logical shape):
ui_bundle_version: string (e.g."0.1"). Always present.run_type:"quick_eval"|"package_release"|"full_pipeline". Always present.tasks: list of task ids. Always present (may be empty).episodes: list of episode objects. Always present (may be empty).baselines: list of baseline ids. Always present (may be empty).coordination_artifacts(optional): list of{ "path": "<rel>", "label": "..." }when run dir contains pack_summary.csv, LAB_COORDINATION_REPORT.md, or related files; paths are relative to run dir; files are also in the zip undercoordination/.pipeline_mode,llm_backend_id,llm_model_id,allow_network(optional): present when run is from official pack or full pipeline.receipts_note(optional): present forfull_pipelinewhen there are no receipts (explains why receipts_index is empty).coord_telemetry(optional): present when episode logs have coord_decisions.jsonl.
Episode object (each entry in episodes):
task: string. Required.episode_index: number. Required.episode_key: string (e.g."<task>_<episode_index>"). Optional but emitted by backend.results_ref: string (path relative to run dir). Required.log_ref: string or null (path to episode log JSONL, or null when no log). Must accept null for full_pipeline and quick_eval without logs.receipts_ref: string or null (path to receipts dir, or null). Must accept null for runs without receipts.
Frontend validation: The UI bundle loader must treat log_ref and receipts_ref as optional or nullable (string | null). Do not require them to be non-empty strings, or validation will fail for bundles from full_pipeline or LLM live official pack runs.
events.json:
- Array of normalized events; each has:
t_s,agent_id,action_type,status,blocked_reason_code,emits,violations,token_consumed,event_id(if present), and optionalepisode_key/task/episode_indexfor grouping.
receipts_index.json:
- Array of
{ "task", "path", "receipt_files": [...] };pathis relative to run or bundle root;receipt_filesare filenames (e.g.receipt_specimen_S1.v0.1.json).
reason_codes.json:
{ "version": "0.1", "codes": { "<code>": { "namespace", "severity", "description", ... } } }. Same shape as registry; UI uses it for display and validation.
- UI bundle schema: The ui-export output (index, events, receipts_index, reason_codes) is versioned. Current version: 0.1. The bundle MAY include a top-level
ui_bundle_version(e.g. in index.json) so the UI can reject unknown versions. - Results: Results files follow
results.v0.2(or v0.3). UI must acceptschema_versionand ignore extra fields; do not assume fields beyond the contract. - Receipts: Receipt files follow
receipt.v0.1; EvidenceBundle followsevidence_bundle_manifest.v0.1. UI must not rely on internal log shapes—only on ui-export’s receipts_index.json and the receipt schema for displayed fields. - Extensible only: New schema versions (e.g. results.v0.3) add optional fields only; required fields and semantics of v0.2 remain. UI should be tolerant of missing optional fields.
- Stable field names: Normalized gate outcomes in
events.jsonuse fixed names (status, blocked_reason_code, violations, emits, token_consumed). New gate fields are added as optional keys; existing keys are not renamed or removed in v0.1.
- Run layouts: labtrust_runs (quick_eval_*) and package-release (paper_v0.1) are the two supported run directory shapes.
- Relationships: Task → results file, task → log file, task → receipts dir; episodes from results
episodesarray; steps from JSONL → normalized into events.json. - Schema rules: UI bundle v0.1; results v0.2/v0.3 extensible only; stable event field names; reason_codes and receipts_index supplied so UI does not parse policy or raw logs.
- Acceptance: UI uses ui-export output as primary input; raw internal logs are not part of the UI contract.