Skip to content

Commit 7524b0f

Browse files
committed
Clarify the intent of GGUF FusedMoE weight materialization
In the process of FusedMoE weight data materialization from GGUF files, there is a magic number and some intents are not clear enough. This commit clarifies some of them: 1. GGUF (currently) requires 3D tensor(s) for FusedMoE layer weights as we have to know full tensor shape to materialize the parameter (including number of experts). 2. w1 and w3 are merged per expert, i.e. the next dimension after the expert ID is to be doubled to store both w1 and w3. ... and makes some minor adjustments. Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
1 parent 8f8fda2 commit 7524b0f

File tree

1 file changed

+6
-2
lines changed
  • vllm/model_executor/layers/fused_moe

1 file changed

+6
-2
lines changed

vllm/model_executor/layers/fused_moe/layer.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1200,10 +1200,14 @@ def weight_loader(
12001200
if full_load:
12011201
shard_dim += 1
12021202

1203-
# Materialize GGUF UninitializedParameter
1203+
# Materialize GGUF UninitializedParameter accounting merged weights
12041204
if is_gguf_weight and isinstance(param, UninitializedParameter):
1205+
# To materialize a tensor, we must have full shape including
1206+
# number of experts, making this portion to require `full_load`.
1207+
assert full_load
12051208
final_shape = list(loaded_weight.shape)
1206-
if shard_id in ["w1", "w3"]:
1209+
# w1 and w3 are merged per expert.
1210+
if shard_id in {"w1", "w3"}:
12071211
final_shape[1] *= 2
12081212
final_shape[shard_dim] = final_shape[shard_dim] // self.tp_size
12091213
param.materialize(final_shape, dtype=loaded_weight.dtype)

0 commit comments

Comments
 (0)