Fix _get_batch_logps NaN on fully-masked sequences (DPO)#1685
Conversation
…hored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request fixes a division-by-zero issue in _get_batch_logps within open_instruct/dpo_utils.py by clamping the denominator to 1 when calculating average log probabilities for fully masked sequences. The changes also include regression tests and a changelog update. Feedback suggests using masked_fill to more robustly handle potential -inf values in masked positions, which could otherwise lead to NaN results during multiplication.
|
|
||
| if average_log_prob: | ||
| return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1) | ||
| return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1).clamp(min=1) |
There was a problem hiding this comment.
While clamping the denominator prevents division by zero, the expression (per_token_logps * loss_mask) can still produce NaN if any masked token has a log probability of -inf (since -inf * 0 = NaN in PyTorch). Although rare with standard log_softmax outputs, using masked_fill is a more robust way to ensure masked positions are zeroed out regardless of their value.
| return (per_token_logps * loss_mask).sum(-1) / loss_mask.sum(-1).clamp(min=1) | |
| return per_token_logps.masked_fill(~loss_mask, 0.0).sum(-1) / loss_mask.sum(-1).clamp(min=1) |
…ai#1683) * Bundle small correctness fixes from allenai#1615/allenai#1618/allenai#1619/allenai#1623/allenai#1625/allenai#1646/allenai#1651/allenai#1655 with regression tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address PR review: inline write_header, pricing per 1M tokens, parameterize tests, dedupe if_functions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Trim branch to IFEval-only changes; other fixes moved to allenai#1684/allenai#1685/allenai#1686 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address review: use word-boundary regex in validate_choice to avoid substring false positives Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Revert deletion of scripts/eval_constraints/if_functions.py (handled in separate PR) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Revert validate_choice regex change; add TODO comment with suggested word-boundary fix Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update if_functions.py
Clamp the divisor in
_get_batch_logpswhenaverage_log_prob=Trueso a sequence with every label masked (-100) returns 0.0 instead of NaN (supersedes #1625).