Fix CSV header handling in benchmark_generators; pathlib throughout#1684
Fix CSV header handling in benchmark_generators; pathlib throughout#1684finbarrtimbers wants to merge 7 commits into
Conversation
…hroughout Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 24a6b7ae42
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request fixes CSV header handling in benchmark_generators.py by using csvfile.tell() == 0 to ensure headers are written exactly once when appending to files. It also migrates standard open() calls to pathlib.Path.open() and adds regression tests for these fixes. Feedback suggests ensuring the parent directory exists before opening files for better robustness and warns about potential column misalignment if CSV field names change across different execution runs.
…rs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…us 4.7 <noreply@anthropic.com>
…parent dir exists Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…platforms Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ai#1683) * Bundle small correctness fixes from allenai#1615/allenai#1618/allenai#1619/allenai#1623/allenai#1625/allenai#1646/allenai#1651/allenai#1655 with regression tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address PR review: inline write_header, pricing per 1M tokens, parameterize tests, dedupe if_functions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Trim branch to IFEval-only changes; other fixes moved to allenai#1684/allenai#1685/allenai#1686 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address review: use word-boundary regex in validate_choice to avoid substring false positives Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Revert deletion of scripts/eval_constraints/if_functions.py (handled in separate PR) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Revert validate_choice regex change; add TODO comment with suggested word-boundary fix Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update if_functions.py
Fix duplicate / missing CSV header writes in
save_completion_lengthsandsave_benchmark_results_to_csv, adds a test, and convertsbenchmark_generators.pyto usepathlib.Combines #1619 and #1623.