[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

a4lg · 2025-12-09T06:40:01Z

Purpose

This is a refactoring PR with no functional changes (unless the GGUF file is broken).

In the process of FusedMoE weight data materialization from GGUF files, there is a magic number and some intents are not clear enough.

This commit clarifies some of them:

GGUF (currently) requires 3D tensor(s) (i.e. full_load) for FusedMoE layer weights.
w1 and w3 are merged per expert, i.e. the next dimension after the expert ID is to be doubled to store both w1 and w3.
- Expert ID is the first dimension (as in the code right after the if is_gguf_weight... block).
- That means, the second dimension's size (final_shape[1]) should be doubled for w1 and w3, improving the clarity.

... and makes some minor adjustments.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request is a refactoring that clarifies the weight materialization process for GGUF FusedMoE layers. The changes introduce an assertion to ensure that GGUF weights for FusedMoE are loaded as 3D tensors, which makes an implicit assumption explicit and improves robustness. Additionally, comments have been added to explain the logic behind handling merged weights (w1 and w3), and a minor style improvement was made by using a set for membership testing. These changes improve the code's clarity and maintainability without altering the core functionality. The implementation is correct and I have no further recommendations.

Isotr0py · 2025-12-09T15:27:04Z

vllm/model_executor/layers/fused_moe/layer.py

+        # Materialize GGUF UninitializedParameter accounting merged weights
        if is_gguf_weight and isinstance(param, UninitializedParameter):
+            # GGUF currently requires full load (3D tensors).
+            assert full_load


Perhaps add a message to clarify what's happening when assertion is false?

Hmm, rather, it would be better to improve the comment (i.e. what is truly necessary). I'll consider changing around this line.

I changed the comment to note why full_load is required.

Before:

# GGUF currently requires full load (3D tensors).

After:

# To materialize a tensor, we must have full shape including # number of experts, making this portion to require `full_load`.

vllm/model_executor/layers/fused_moe/layer.py

In the process of FusedMoE weight data materialization from GGUF files, there is a magic number and some intents are not clear enough. This commit clarifies some of them: 1. GGUF (currently) requires 3D tensor(s) for FusedMoE layer weights as we have to know full tensor shape to materialize the parameter (including number of experts). 2. w1 and w3 are merged per expert, i.e. the next dimension after the expert ID is to be doubled to store both w1 and w3. ... and makes some minor adjustments. Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>

…erialization (vllm-project#30310) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>

…erialization (vllm-project#30310) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>

a4lg requested review from mgoin and pavanimajety as code owners December 9, 2025 06:40

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

Isotr0py self-assigned this Dec 9, 2025

Isotr0py reviewed Dec 9, 2025

View reviewed changes

Isotr0py approved these changes Dec 9, 2025

View reviewed changes

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025

a4lg force-pushed the gguf-support-moe-weight-loader-refactor-20251209 branch from 0fac684 to 7524b0f Compare December 13, 2025 03:22

Isotr0py merged commit fdc135d into vllm-project:main Dec 13, 2025
52 checks passed

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025

[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight mat…

9cf7120

…erialization (vllm-project#30310) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

Uh oh!

a4lg commented Dec 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Isotr0py Dec 9, 2025

Uh oh!

a4lg Dec 9, 2025

Uh oh!

a4lg Dec 13, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization #30310

[Misc][Quantization] Clarify the intent of GGUF FusedMoE weight materialization #30310

Uh oh!

Conversation

a4lg commented Dec 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Isotr0py Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

a4lg Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

a4lg Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

[Misc][Quantization] Clarify the intent of GGUF `FusedMoE` weight materialization #30310

a4lg commented Dec 9, 2025 •

edited by github-actions bot

Loading