Skip to content

feat(bench): add zeph-bench crate scaffold, CLI subcommand, and deterministic mode#2840

Merged
bug-ops merged 3 commits intomainfrom
zeph-bench-scaffold
Apr 8, 2026
Merged

feat(bench): add zeph-bench crate scaffold, CLI subcommand, and deterministic mode#2840
bug-ops merged 3 commits intomainfrom
zeph-bench-scaffold

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Apr 8, 2026

Summary

  • New optional crate crates/zeph-bench/ gated on bench feature (not in full) implementing BenchmarkChannel for headless benchmark execution
  • zeph bench CLI subcommand (list, download, run, show) with correct exit codes
  • Deterministic mode: forces temperature=0, seed=0 via GenerationOverrides before agent construction; --no-deterministic flag to opt out
  • 16 unit tests (channel behavior, dataset registry, deterministic override including skip-branch)

Closes #2828, #2829, #2831. Part of epic #2827.

Test plan

  • cargo build --features bench compiles clean
  • cargo nextest run -p zeph-bench --lib — 16 passed
  • zeph bench list prints all 5 dataset names
  • zeph bench run --dataset unknown exits 1 with message
  • bench feature absent from full bundle

@github-actions github-actions bot added documentation Improvements or additions to documentation rust Rust code changes dependencies Dependency updates enhancement New feature or request size/XL Extra large PR (500+ lines) labels Apr 8, 2026
…ministic mode

Closes #2828, #2829, #2831. Part of epic #2827.

- New optional crate `crates/zeph-bench/` gated on `bench` feature flag
  (excluded from `full`). Implements `BenchmarkChannel` satisfying the
  `Channel` trait for headless benchmark execution: scripted prompt queue,
  response capture buffer, token usage recording, auto-confirm, elicit
  returns Declined, send_tool_output is a no-op.
- `DatasetRegistry` with 5 built-in datasets: LongMemEval, LOCOMO, FRAMES,
  tau-bench, GAIA.
- `zeph bench` CLI subcommand (list, download, run, show) gated on
  `cfg(feature = "bench")`. Unknown dataset and missing cache exit 1 with
  diagnostic message.
- Deterministic mode: applies `GenerationOverrides { temperature: 0.0,
  seed: 0 }` before agent construction; disabled with `--no-deterministic`.
- 16 unit tests covering channel behavior, dataset registry, and
  deterministic override (including skip-branch).
@bug-ops bug-ops enabled auto-merge (squash) April 8, 2026 12:42
@bug-ops bug-ops force-pushed the zeph-bench-scaffold branch from 7aef871 to 104fb75 Compare April 8, 2026 12:42
@github-actions github-actions bot added the ci CI/CD configuration label Apr 8, 2026
@bug-ops bug-ops merged commit 7522068 into main Apr 8, 2026
29 checks passed
@bug-ops bug-ops deleted the zeph-bench-scaffold branch April 8, 2026 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CI/CD configuration dependencies Dependency updates documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

1 participant