Skip to content

Conversation

@a4lg
Copy link
Contributor

@a4lg a4lg commented Dec 9, 2025

Purpose

This is a refactoring PR with no functional changes (unless the GGUF file is broken).

In the process of FusedMoE weight data materialization from GGUF files, there is a magic number and some intents are not clear enough.

This commit clarifies some of them:

  1. GGUF (currently) requires 3D tensor(s) (i.e. full_load) for FusedMoE layer weights.
  2. w1 and w3 are merged per expert, i.e. the next dimension after the expert ID is to be doubled to store both w1 and w3.
    • Expert ID is the first dimension (as in the code right after the if is_gguf_weight... block).
    • That means, the second dimension's size (final_shape[1]) should be doubled for w1 and w3, improving the clarity.

... and makes some minor adjustments.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a refactoring that clarifies the weight materialization process for GGUF FusedMoE layers. The changes introduce an assertion to ensure that GGUF weights for FusedMoE are loaded as 3D tensors, which makes an implicit assumption explicit and improves robustness. Additionally, comments have been added to explain the logic behind handling merged weights (w1 and w3), and a minor style improvement was made by using a set for membership testing. These changes improve the code's clarity and maintainability without altering the core functionality. The implementation is correct and I have no further recommendations.

@Isotr0py Isotr0py self-assigned this Dec 9, 2025
# Materialize GGUF UninitializedParameter accounting merged weights
if is_gguf_weight and isinstance(param, UninitializedParameter):
# GGUF currently requires full load (3D tensors).
assert full_load
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add a message to clarify what's happening when assertion is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, rather, it would be better to improve the comment (i.e. what is truly necessary). I'll consider changing around this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the comment to note why full_load is required.

Before:

# GGUF currently requires full load (3D tensors).

After:

# To materialize a tensor, we must have full shape including
# number of experts, making this portion to require `full_load`.

@Isotr0py Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 9, 2025
In the process of FusedMoE weight data materialization from GGUF files,
there is a magic number and some intents are not clear enough.

This commit clarifies some of them:

1.  GGUF (currently) requires 3D tensor(s) for FusedMoE layer weights
    as we have to know full tensor shape to materialize the parameter
    (including number of experts).
2.  w1 and w3 are merged per expert, i.e. the next dimension after
    the expert ID is to be doubled to store both w1 and w3.

... and makes some minor adjustments.

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
@a4lg a4lg force-pushed the gguf-support-moe-weight-loader-refactor-20251209 branch from 0fac684 to 7524b0f Compare December 13, 2025 03:22
@Isotr0py Isotr0py merged commit fdc135d into vllm-project:main Dec 13, 2025
52 checks passed
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Dec 15, 2025
…erialization (vllm-project#30310)

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
joa-stdn pushed a commit to joa-stdn/vllm that referenced this pull request Dec 15, 2025
…erialization (vllm-project#30310)

Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants