Fix benchmark scoring call signature by Bortlesboat · Pull Request #1 · elder-plinius/AutoTemp

Bortlesboat · 2026-04-14T21:42:39Z

Summary

pass both the original prompt and extracted best output into evaluate_output() during benchmark runs
add an offline regression test that stubs external imports so the benchmark path can be verified without live API calls

Why

benchmark() currently calls evaluate_output() with the wrong arguments, which raises a TypeError and gets swallowed by the broad exception handler. That quietly turns otherwise valid benchmark rows into 0.0 overall scores.

Verification

python -m unittest tests.test_benchmark -v

fix: pass prompt context into benchmark evaluation

369dbeb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix benchmark scoring call signature#1

Fix benchmark scoring call signature#1
Bortlesboat wants to merge 1 commit intoelder-plinius:mainfrom
Bortlesboat:codex/autotemp-benchmark-evaluate-output-fix

Bortlesboat commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bortlesboat commented Apr 14, 2026

Summary

Why

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant