Skip to content

Fix(eval): default to v6 docker images#46

Merged
klieret merged 1 commit into
mainfrom
default-v6-images
Jun 18, 2026
Merged

Fix(eval): default to v6 docker images#46
klieret merged 1 commit into
mainfrom
default-v6-images

Conversation

@klieret

@klieret klieret commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Make the v6 image set the default everywhere eval resolves a tag: the eval CLI --image-tag, the Evaluator.evaluate default, and both eval_batch entry points now default to task_cleanroom_v6 (was task_cleanroom). Update docs/README.md to point inference users at the task_cleanroom_v6 / task_v6 tags.

Tags are otherwise unchanged: --image-tag task_v6 still selects the full build environment, and explicit overrides are passed through verbatim.

Internal-reference-commit: a92ae6227464d6c1dbff015e6003fb70790760db
Internal-reference-commit: 22e92e67b0a399d7b9dc2f612c5f5eedec1c80cd
Internal-reference-commit: 46518a07432af49a4b86777bc2ee8b50f7548bc7

Closes #45
Closes #14
In reference to #44

Make the v6 image set the default everywhere eval resolves a tag: the
`eval` CLI `--image-tag`, the `Evaluator.evaluate` default, and both
`eval_batch` entry points now default to `task_cleanroom_v6` (was
`task_cleanroom`). Update docs/README.md to point inference users at the
`task_cleanroom_v6` / `task_v6` tags.

Tags are otherwise unchanged: `--image-tag task_v6` still selects the full
build environment, and explicit overrides are passed through verbatim.

Internal-reference-commit: a92ae6227464d6c1dbff015e6003fb70790760db
Internal-reference-commit: 22e92e67b0a399d7b9dc2f612c5f5eedec1c80cd
Internal-reference-commit: 46518a07432af49a4b86777bc2ee8b50f7548bc7
@klieret klieret requested a review from Copilot June 18, 2026 20:17
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 18, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates ProgramBench’s evaluation tooling and usage docs to default to the v6 Docker image tags (notably task_cleanroom_v6) wherever an image tag is implicitly selected, aligning defaults with the hardened cleanroom images referenced in issues #45/#14.

Changes:

  • Default eval image tag switched from task_cleanroom to task_cleanroom_v6 in Evaluator, eval_batch, and the programbench eval CLI option default.
  • CLI help text updated to reference task_v6 as the explicit “full build environment” override.
  • Docs updated to point inference users at the task_cleanroom_v6 / task_v6 tags.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/programbench/eval/eval.py Updates Evaluator default image_tag to task_cleanroom_v6.
src/programbench/eval/eval_batch.py Updates eval_batch entrypoint defaults to task_cleanroom_v6.
src/programbench/cli/main.py Updates --image-tag default and adjusts help text toward v6 tags.
docs/README.md Updates user-facing docs/links to v6 tags for inference/evaluation guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/programbench/cli/main.py
Comment thread docs/README.md
@klieret klieret merged commit 3f57100 into main Jun 18, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agents able to exploit readable executable copy at /tmp bellard_1776_quickjs.d7ae12a:task_cleanroom contains readable executable

2 participants