Skip to content

feat(bench): implement deterministic mode (temperature=0 override) #2831

@bug-ops

Description

@bug-ops

Description

Force temperature=0 and a fixed seed for all LLM calls during a benchmark run, regardless of what the user config specifies, to ensure reproducibility.

Part of epic #2827. See spec: .local/specs/zeph-bench/spec.md FR-003, NFR-002, US-003.

Scope

  • DeterministicLayer implementing RuntimeLayer (from zeph-core) that intercepts before_chat and sets temperature=0 and seed=0 on the outgoing request
  • Alternatively, a config-level override applied in the bench runner before constructing the agent (preferred if RuntimeLayer hook does not have access to request params at that point — verify)
  • --no-deterministic flag disables this behavior
  • Unit test: verify that with DeterministicLayer active, the temperature field in the serialized request is 0.0

Acceptance Criteria

  • All LLM requests during a bench run have temperature=0 in debug request JSON
  • --no-deterministic flag restores config-specified temperature
  • Two runs on the same model/scenario produce identical responses
  • Unit test passes without a live LLM

Metadata

Metadata

Assignees

Labels

P2High value, medium complexityenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions