Add --force_math_sdp flag and test_numerical_stability command#198
Open
Add --force_math_sdp flag and test_numerical_stability command#198
Conversation
Adds a --force_math_sdp flag to IndexConfig that disables flash and memory-efficient SDPA backends, forcing the math-only kernel. Some models produce inconsistent per-example gradients across different padding lengths when using optimized attention backends. Also adds `bergson test_numerical_stability` which automatically tests escalating configurations (default → math SDP → fp32 → both) and reports the minimum flags needed for gradient consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Use device_map instead of .to(device) for model loading - Assert row is dict for HF dataset iteration type narrowing - Fix docstring line length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the --force_math_sdp flag and bergson test_numerical_stability command with performance benchmarks on Pythia-160M, OLMo-2-1B, and OLMo-2-7B showing overhead varies from +0.8% to +43.2% depending on model and precision. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Benchmarking the two defensive interventions
fp32 generally seems more expensive than math sdpa. We could just default to math sdpa |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--force_math_sdpflag toIndexConfig, available onbuild,score,trackstar, and other commands. Disables flash and memory-efficient SDPA backends to ensure gradient consistency across different padding lengths.bergson test_numerical_stabilityCLI command that automatically tests escalating configurations (default →--force_math_sdp→--precision fp32→ both) and reports the minimum flags needed for a given model.collect_gradientsand that gradients are consistent when using math-only SDPA.Test plan
apply_force_math_sdp(enable/disable)force_math_sdppersists aftercollect_gradients, gradients consistent across batch compositionpre-commit run --all-filespassesbergson test_numerical_stability --model EleutherAI/pythia-14m🤖 Generated with Claude Code
Example: