Fix _get_batch_logps NaN on fully-masked sequences (DPO) by finbarrtimbers · Pull Request #1685 · allenai/open-instruct

finbarrtimbers · 2026-05-12T15:43:13Z

Clamp the divisor in _get_batch_logps when average_log_prob=True so a sequence with every label masked (-100) returns 0.0 instead of NaN (supersedes #1625).

…hored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request fixes a division-by-zero issue in _get_batch_logps within open_instruct/dpo_utils.py by clamping the denominator to 1 when calculating average log probabilities for fully masked sequences. The changes also include regression tests and a changelog update. Feedback suggests using masked_fill to more robustly handle potential -inf values in masked positions, which could otherwise lead to NaN results during multiplication.

gemini-code-assist · 2026-05-12T15:45:40Z


    if average_log_prob:
-        return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1)
+        return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1).clamp(min=1)


While clamping the denominator prevents division by zero, the expression (per_token_logps * loss_mask) can still produce NaN if any masked token has a log probability of -inf (since -inf * 0 = NaN in PyTorch). Although rare with standard log_softmax outputs, using masked_fill is a more robust way to ensure masked positions are zeroed out regardless of their value.

Suggested change

return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1).clamp(min=1)

return per_token_logps.masked_fill(~loss_mask, 0.0).sum(-1) / loss_mask.sum(-1).clamp(min=1)

…1686 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ai#1683) * Bundle small correctness fixes from allenai#1615/allenai#1618/allenai#1619/allenai#1623/allenai#1625/allenai#1646/allenai#1651/allenai#1655 with regression tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address PR review: inline write_header, pricing per 1M tokens, parameterize tests, dedupe if_functions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Trim branch to IFEval-only changes; other fixes moved to allenai#1684/allenai#1685/allenai#1686 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address review: use word-boundary regex in validate_choice to avoid substring false positives Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Revert deletion of scripts/eval_constraints/if_functions.py (handled in separate PR) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Revert validate_choice regex change; add TODO comment with suggested word-boundary fix Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update if_functions.py

Fix _get_batch_logps NaN when a sequence is fully label-masked Co-Aut…

3d891b1

…hored-By: Claude Opus 4.7 <noreply@anthropic.com>

finbarrtimbers enabled auto-merge May 12, 2026 15:44

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

finbarrtimbers added a commit that referenced this pull request May 12, 2026

Trim branch to IFEval-only changes; other fixes moved to #1684/#1685/#…

268d04f

…1686 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

finbarrtimbers mentioned this pull request May 12, 2026

Fix IFEval correctness bugs in if_functions and IFEvalVerifier #1683

Merged

Merge branch 'main' into finbarr/dpo-utils-fix-fully-masked

ed18de2

farhatkevin self-requested a review May 12, 2026 18:29

farhatkevin approved these changes May 12, 2026

View reviewed changes

finbarrtimbers added this pull request to the merge queue May 12, 2026

Merged via the queue into main with commit 94a17b9 May 12, 2026
6 of 7 checks passed

finbarrtimbers deleted the finbarr/dpo-utils-fix-fully-masked branch May 12, 2026 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix _get_batch_logps NaN on fully-masked sequences (DPO)#1685

Fix _get_batch_logps NaN on fully-masked sequences (DPO)#1685
finbarrtimbers merged 2 commits into
mainfrom
finbarr/dpo-utils-fix-fully-masked

finbarrtimbers commented May 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1).clamp(min=1)
	return per_token_logps.masked_fill(~loss_mask, 0.0).sum(-1) / loss_mask.sum(-1).clamp(min=1)

Conversation

finbarrtimbers commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented May 12, 2026 •

edited

Loading