feat(bench): add zeph-bench crate scaffold, CLI subcommand, and deterministic mode#2840
Merged
feat(bench): add zeph-bench crate scaffold, CLI subcommand, and deterministic mode#2840
Conversation
This was
linked to
issues
Apr 8, 2026
…ministic mode Closes #2828, #2829, #2831. Part of epic #2827. - New optional crate `crates/zeph-bench/` gated on `bench` feature flag (excluded from `full`). Implements `BenchmarkChannel` satisfying the `Channel` trait for headless benchmark execution: scripted prompt queue, response capture buffer, token usage recording, auto-confirm, elicit returns Declined, send_tool_output is a no-op. - `DatasetRegistry` with 5 built-in datasets: LongMemEval, LOCOMO, FRAMES, tau-bench, GAIA. - `zeph bench` CLI subcommand (list, download, run, show) gated on `cfg(feature = "bench")`. Unknown dataset and missing cache exit 1 with diagnostic message. - Deterministic mode: applies `GenerationOverrides { temperature: 0.0, seed: 0 }` before agent construction; disabled with `--no-deterministic`. - 16 unit tests covering channel behavior, dataset registry, and deterministic override (including skip-branch).
7aef871 to
104fb75
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crates/zeph-bench/gated onbenchfeature (not infull) implementingBenchmarkChannelfor headless benchmark executionzeph benchCLI subcommand (list, download, run, show) with correct exit codestemperature=0, seed=0viaGenerationOverridesbefore agent construction;--no-deterministicflag to opt outCloses #2828, #2829, #2831. Part of epic #2827.
Test plan
cargo build --features benchcompiles cleancargo nextest run -p zeph-bench --lib— 16 passedzeph bench listprints all 5 dataset nameszeph bench run --dataset unknownexits 1 with messagebenchfeature absent fromfullbundle