Skip to content

Decrease register usage for Xe2 MoE GEMM#718

Closed
sanchitintel wants to merge 4 commits intomainfrom
sanchitj/use_fewer_registers_with_moe_gemm
Closed

Decrease register usage for Xe2 MoE GEMM#718
sanchitintel wants to merge 4 commits intomainfrom
sanchitj/use_fewer_registers_with_moe_gemm

Conversation

@sanchitintel
Copy link
Copy Markdown

@sanchitintel sanchitintel commented Jan 30, 2026

Description

This change is needed for the PR that adds the MXFP4/MXFP8 MoE GEMM example.
Xe2 doesn't natively support these datatypes, so the weights are converted to FP16/BF16, and scales are converted to FP32 (technically, we can keep them in FP16/BF16, and then do elementwise multiplication while applying scales, but that entails a bit more compute), thereby increasing register pressure. This PR decreases register pressure to boost MXFP4/MXFP8 MoE GEMM performance (which use features from a couple of other existing PRs. Will update description shortly).

MoE GEMM computation has 3 components -

  1. Tiling schemes (can also help decrease register pressure)
  2. vanilla GEMM kernel (for MXFP4/MXFP8, it's the corresponding scaledMM)
  3. tile scheduler for various output blocks/tiles --> this PR modifies it, and also assumes that igc will optimize out some unused data structures, and they won't take up unnecessary register space during GEMM computation.

Type

  • Bug - [ ] Feature - [x] Performance - [ ] Refactor

Testing

  • Tests pass - [ ] Xe12 - [ ] Xe20

Performance

measured with MXFP4/MXFP8 scaledMM

Metric Before After

References

Fixes #

Checklist

  • Copyright - [ ] Co-pilot Review - [ ] Deprecated APIs not used

This change the MXFP4/MXFP8 with MoE GEMM example for MXFP8.

Xe2 doesn't natively support these datatypes, so the weights are converted to FP16/BF16, and scales are converted to FP32, thereby increasing register pressure. This PR decreases register usage to boost MXFP4/MXFP8 MoE GEMM performance.
@tdeng5 tdeng5 self-requested a review February 27, 2026 06:11
@tdeng5 tdeng5 self-requested a review March 10, 2026 00:39
@tdeng5 tdeng5 self-requested a review March 16, 2026 05:43
@sanchitintel
Copy link
Copy Markdown
Author

Will combine several changes in one PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants