Fix(eval): evaluate in :task_cleanroom images by klieret · Pull Request #42 · facebookresearch/ProgramBench

klieret · 2026-06-18T02:35:36Z

Submissions are now evaluated in the artifact-free cleanroom image by default instead of the full :task build environment, so a submission can't rely on build artifacts leaked into :task. --image-tag stays as an explicit override (pass --image-tag task for the full build env).

This avoids drifts between the inference image and evaluation image.

Internal-reference: fdbd6657

Closes #9

Submissions are now evaluated in the artifact-free cleanroom image by default instead of the full :task build environment, so a submission can't rely on build artifacts leaked into :task. --image-tag stays as an explicit override (pass --image-tag task for the full build env). This avoids drifts between the inference image and evaluation image. Internal-reference: fdbd6657

Copilot

Pull request overview

This PR changes ProgramBench evaluation to run submissions in the artifact-free :task_cleanroom Docker image by default, aligning evaluation with inference and preventing reliance on leaked build artifacts. It keeps --image-tag as an explicit override to use the full :task environment when desired.

Changes:

Switch default image_tag from task to task_cleanroom in the core evaluator (eval.py) and batch evaluator (eval_batch.py).
Update the CLI programbench eval default --image-tag to task_cleanroom.
Expand CLI help text to explain why task_cleanroom is the default and how to override back to task.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`src/programbench/eval/eval.py`	Updates the evaluator’s default Docker image tag to `task_cleanroom`.
`src/programbench/eval/eval_batch.py`	Updates batch evaluation defaults to use `task_cleanroom` unless overridden.
`src/programbench/cli/main.py`	Updates CLI default and help text for `--image-tag` to default to `task_cleanroom`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 18, 2026

klieret changed the title ~~Change(eval): evaluate in :task_cleanroom images~~ Fix(eval): evaluate in :task_cleanroom images Jun 18, 2026

klieret requested a review from Copilot June 18, 2026 02:37

Copilot started reviewing on behalf of klieret June 18, 2026 02:37 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

klieret merged commit 68e3da2 into main Jun 18, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(eval): evaluate in :task_cleanroom images#42

Fix(eval): evaluate in :task_cleanroom images#42
klieret merged 1 commit into
mainfrom
feat/eval-cleanroom-default

klieret commented Jun 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klieret commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

klieret commented Jun 18, 2026 •

edited

Loading