Fix(eval): default to v6 docker images by klieret · Pull Request #46 · facebookresearch/ProgramBench

klieret · 2026-06-18T20:17:11Z

Make the v6 image set the default everywhere eval resolves a tag: the eval CLI --image-tag, the Evaluator.evaluate default, and both eval_batch entry points now default to task_cleanroom_v6 (was task_cleanroom). Update docs/README.md to point inference users at the task_cleanroom_v6 / task_v6 tags.

Tags are otherwise unchanged: --image-tag task_v6 still selects the full build environment, and explicit overrides are passed through verbatim.

Internal-reference-commit: a92ae6227464d6c1dbff015e6003fb70790760db
Internal-reference-commit: 22e92e67b0a399d7b9dc2f612c5f5eedec1c80cd
Internal-reference-commit: 46518a07432af49a4b86777bc2ee8b50f7548bc7

Closes #45
Closes #14
In reference to #44

Make the v6 image set the default everywhere eval resolves a tag: the `eval` CLI `--image-tag`, the `Evaluator.evaluate` default, and both `eval_batch` entry points now default to `task_cleanroom_v6` (was `task_cleanroom`). Update docs/README.md to point inference users at the `task_cleanroom_v6` / `task_v6` tags. Tags are otherwise unchanged: `--image-tag task_v6` still selects the full build environment, and explicit overrides are passed through verbatim. Internal-reference-commit: a92ae6227464d6c1dbff015e6003fb70790760db Internal-reference-commit: 22e92e67b0a399d7b9dc2f612c5f5eedec1c80cd Internal-reference-commit: 46518a07432af49a4b86777bc2ee8b50f7548bc7

Copilot

Pull request overview

This PR updates ProgramBench’s evaluation tooling and usage docs to default to the v6 Docker image tags (notably task_cleanroom_v6) wherever an image tag is implicitly selected, aligning defaults with the hardened cleanroom images referenced in issues #45/#14.

Changes:

Default eval image tag switched from task_cleanroom to task_cleanroom_v6 in Evaluator, eval_batch, and the programbench eval CLI option default.
CLI help text updated to reference task_v6 as the explicit “full build environment” override.
Docs updated to point inference users at the task_cleanroom_v6 / task_v6 tags.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
src/programbench/eval/eval.py	Updates `Evaluator` default `image_tag` to `task_cleanroom_v6`.
src/programbench/eval/eval_batch.py	Updates `eval_batch` entrypoint defaults to `task_cleanroom_v6`.
src/programbench/cli/main.py	Updates `--image-tag` default and adjusts help text toward v6 tags.
docs/README.md	Updates user-facing docs/links to v6 tags for inference/evaluation guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

klieret requested a review from Copilot June 18, 2026 20:17

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 18, 2026

Copilot started reviewing on behalf of klieret June 18, 2026 20:17 View session

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread src/programbench/cli/main.py

Comment thread docs/README.md

klieret merged commit 3f57100 into main Jun 18, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(eval): default to v6 docker images#46

Fix(eval): default to v6 docker images#46
klieret merged 1 commit into
mainfrom
default-v6-images

klieret commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

klieret commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants