Commit 7524b0f
committed
Clarify the intent of GGUF FusedMoE weight materialization
In the process of FusedMoE weight data materialization from GGUF files,
there is a magic number and some intents are not clear enough.
This commit clarifies some of them:
1. GGUF (currently) requires 3D tensor(s) for FusedMoE layer weights
as we have to know full tensor shape to materialize the parameter
(including number of experts).
2. w1 and w3 are merged per expert, i.e. the next dimension after
the expert ID is to be doubled to store both w1 and w3.
... and makes some minor adjustments.
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>1 parent 8f8fda2 commit 7524b0f
1 file changed
+6
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1200 | 1200 | | |
1201 | 1201 | | |
1202 | 1202 | | |
1203 | | - | |
| 1203 | + | |
1204 | 1204 | | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
1205 | 1208 | | |
1206 | | - | |
| 1209 | + | |
| 1210 | + | |
1207 | 1211 | | |
1208 | 1212 | | |
1209 | 1213 | | |
| |||
0 commit comments