Decrease register usage for Xe2 MoE GEMM#718
Closed
sanchitintel wants to merge 4 commits intomainfrom
Closed
Conversation
This change the MXFP4/MXFP8 with MoE GEMM example for MXFP8. Xe2 doesn't natively support these datatypes, so the weights are converted to FP16/BF16, and scales are converted to FP32, thereby increasing register pressure. This PR decreases register usage to boost MXFP4/MXFP8 MoE GEMM performance.
Antonyvance
approved these changes
Feb 18, 2026
tdeng5
approved these changes
Feb 27, 2026
tdeng5
approved these changes
Mar 10, 2026
tdeng5
approved these changes
Mar 16, 2026
Author
|
Will combine several changes in one PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This change is needed for the PR that adds the MXFP4/MXFP8 MoE GEMM example.
Xe2 doesn't natively support these datatypes, so the weights are converted to FP16/BF16, and scales are converted to FP32 (technically, we can keep them in FP16/BF16, and then do elementwise multiplication while applying scales, but that entails a bit more compute), thereby increasing register pressure. This PR decreases register pressure to boost MXFP4/MXFP8 MoE GEMM performance (which use features from a couple of other existing PRs. Will update description shortly).
MoE GEMM computation has 3 components -
Type
Testing
Performance
measured with MXFP4/MXFP8 scaledMM
References
Fixes #
Checklist