Skip to content

feat(bench): implement result writer and --resume support#2841

Merged
bug-ops merged 1 commit intomainfrom
bench-result-writer-resume
Apr 8, 2026
Merged

feat(bench): implement result writer and --resume support#2841
bug-ops merged 1 commit intomainfrom
bench-result-writer-resume

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Apr 8, 2026

Summary

  • Adds results module to zeph-bench with BenchRun, ScenarioResult, Aggregate, and RunStatus types (Serialize/Deserialize)
  • ResultWriter writes results.json (leaderboard-compatible, superset of LongMemEval submission format) and summary.md (Markdown table) via atomic temp-file rename
  • --resume support: ResultWriter::load_existing() returns a partial BenchRun; callers use completed_ids() to skip already-done scenarios and append new results

Closes #2833, #2835

Test plan

  • cargo nextest run -p zeph-bench --lib — 24 tests, all pass
  • cargo clippy --workspace -- -D warnings — clean
  • cargo +nightly fmt --check — clean
  • cargo nextest run --workspace --lib --bins — 7760 tests, all pass
  • Live: single-scenario run produces valid results.json and summary.md (tracked in coverage-status.md — status: Untested)
  • Live: --resume on partial results file skips completed scenarios and produces merged output

Add `results` module to `zeph-bench` with:
- `BenchRun`, `ScenarioResult`, `Aggregate`, `RunStatus` (Serialize/Deserialize)
- `ResultWriter` writing `results.json` and `summary.md` via atomic rename
- `BenchRun::recompute_aggregate()` for in-place aggregate updates
- `ResultWriter::load_existing()` for resume: callers load partial run,
  call `completed_ids()` to skip done scenarios, append and re-write

Partial/interrupted runs are persisted with `status: interrupted`.
Output directory is created automatically (single level).
Schema is a superset of the LongMemEval leaderboard submission format.

Closes #2833, #2835
@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes dependencies Dependency updates enhancement New feature or request size/L Large PR (201-500 lines) labels Apr 8, 2026
@bug-ops bug-ops enabled auto-merge (squash) April 8, 2026 13:13
@bug-ops bug-ops merged commit c57cdb9 into main Apr 8, 2026
29 checks passed
@bug-ops bug-ops deleted the bench-result-writer-resume branch April 8, 2026 13:16
@bug-ops bug-ops linked an issue Apr 8, 2026 that may be closed by this pull request
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Dependency updates documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(bench): implement --resume for interrupted benchmark runs feat(bench): implement structured JSON and Markdown result writer

1 participant