Skip to content

Fix _get_batch_logps NaN on fully-masked sequences (DPO)#1685

Merged
finbarrtimbers merged 2 commits into
mainfrom
finbarr/dpo-utils-fix-fully-masked
May 12, 2026
Merged

Fix _get_batch_logps NaN on fully-masked sequences (DPO)#1685
finbarrtimbers merged 2 commits into
mainfrom
finbarr/dpo-utils-fix-fully-masked

Conversation

@finbarrtimbers
Copy link
Copy Markdown
Collaborator

@finbarrtimbers finbarrtimbers commented May 12, 2026

Clamp the divisor in _get_batch_logps when average_log_prob=True so a sequence with every label masked (-100) returns 0.0 instead of NaN (supersedes #1625).

…hored-By: Claude Opus 4.7 <noreply@anthropic.com>
@finbarrtimbers finbarrtimbers enabled auto-merge May 12, 2026 15:44
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a division-by-zero issue in _get_batch_logps within open_instruct/dpo_utils.py by clamping the denominator to 1 when calculating average log probabilities for fully masked sequences. The changes also include regression tests and a changelog update. Feedback suggests using masked_fill to more robustly handle potential -inf values in masked positions, which could otherwise lead to NaN results during multiplication.


if average_log_prob:
return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1)
return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1).clamp(min=1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While clamping the denominator prevents division by zero, the expression (per_token_logps * loss_mask) can still produce NaN if any masked token has a log probability of -inf (since -inf * 0 = NaN in PyTorch). Although rare with standard log_softmax outputs, using masked_fill is a more robust way to ensure masked positions are zeroed out regardless of their value.

Suggested change
return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1).clamp(min=1)
return per_token_logps.masked_fill(~loss_mask, 0.0).sum(-1) / loss_mask.sum(-1).clamp(min=1)

finbarrtimbers added a commit that referenced this pull request May 12, 2026
…1686 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
pull Bot pushed a commit to nagyist/allenai.open-instruct that referenced this pull request May 12, 2026
…ai#1683)

* Bundle small correctness fixes from allenai#1615/allenai#1618/allenai#1619/allenai#1623/allenai#1625/allenai#1646/allenai#1651/allenai#1655 with regression tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address PR review: inline write_header, pricing per 1M tokens, parameterize tests, dedupe if_functions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Trim branch to IFEval-only changes; other fixes moved to allenai#1684/allenai#1685/allenai#1686 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Address review: use word-boundary regex in validate_choice to avoid substring false positives Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Revert deletion of scripts/eval_constraints/if_functions.py (handled in separate PR) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Revert validate_choice regex change; add TODO comment with suggested word-boundary fix Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update if_functions.py
@farhatkevin farhatkevin self-requested a review May 12, 2026 18:29
@finbarrtimbers finbarrtimbers added this pull request to the merge queue May 12, 2026
Merged via the queue into main with commit 94a17b9 May 12, 2026
6 of 7 checks passed
@finbarrtimbers finbarrtimbers deleted the finbarr/dpo-utils-fix-fully-masked branch May 12, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants