Prevent the sum of the dequantized activation in q8_1 from overflowing by bartowski1182 · Pull Request #21652 · ggml-org/llama.cpp

bartowski1182 · 2026-04-09T00:51:57Z

Overview

During Mistral 4 small quantization and subsequent testing, I found that the PPL of Q4_1 ended up with NaN

When testing the reason, it only happened when later FFN_DOWN layers were quantized to Q4_1, IE:

llama-quantize ./Mistral-Small-4-119B-2603-bf16.gguf Mistral-Small-4-119B-2603-Q4_0.gguf Q4_0

Works as expected, but:

llama-quantize --tensor-type ffn_down=q4_1 ./Mistral-Small-4-119B-2603-bf16.gguf Mistral-Small-4-119B-2603-Q4_0.gguf Q4_0

(note the --tensor-type ffn_down=q4_1) gets NaN with PPL

After digging around with Claude and debug code, found that 16 Q8_1 blocks have s = Inf because the fp16 value is overflowing

In Claude's words:

Q8_1's s field stores sum * d in fp16 (max 65504), but when activation values in a 32-element block are large and same-sign, sum * d ≈ 32 * amax can exceed 65504. The max finite |s| is only 410, so the 16 overflowing blocks are massive outliers — their activations must be ~160x larger than typical.

Additional information

I ran the same model with the updated activation code and yielded a PPL of 5.5535 +/- 0.1235

For completeness, also tested with ignoring the pre-computed s value and recalculating the results as f32, and got a PPL of 5.5725 +/- 0.12469

Note that in either case, the PPL without this change was NaN, so while this clamping is lossy, it does result in a model that produces literally anything at all instead of failing spectacularly

Note that this only updates the reference, AVX2, AVX1, and CUDA implementations, not familiar enough with the other archs to touch those

Mistral 4 small PPL before these changes

[1]nan,[2]nan,[3]nan,[4]nan,[5]nan,[6]nan,[7]nan,[8]nan,[9]nan,[10]nan,[11]nan,[12]nan,[13]nan,[14]nan,[15]nan,[16]nan,[17]nan,[18]nan,[19]nan,[20]nan,[21]nan,[22]nan,[23]nan,[24]nan,[25]nan,[26]nan,[27]nan,[28]nan,[29]nan,[30]nan,[31]nan,[32]nan,[33]nan,[34]nan,[35]nan,[36]nan,[37]nan,[38]nan,[39]nan,[40]nan,[41]nan,[42]nan,[43]nan,[44]nan,[45]nan,[46]nan,[47]nan,[48]nan,[49]nan,[50]nan,[51]nan,[52]nan,[53]nan,[54]nan,[55]nan,[56]nan,[57]nan,[58]nan,[59]nan,[60]nan,[61]nan,[62]nan,[63]nan,[64]nan,[65]nan,[66]nan,[67]nan,[68]nan,[69]nan,[70]nan,[71]nan,[72]nan,[73]nan,[74]nan,[75]nan,[76]nan,[77]nan,[78]nan,[79]nan,[80]nan,[81]nan,[82]nan,[83]nan,[84]nan,[85]nan,[86]nan,[87]nan,[88]nan,[89]nan,[90]nan,[91]nan,[92]nan,[93]nan,[94]nan,[95]nan,[96]nan,[97]nan,[98]nan,[99]nan,[100]nan,
Unexpected negative standard deviation of log(prob)

Mistral 4 small PPL after these changes

[1]3.4955,[2]5.1043,[3]4.3632,[4]4.0977,[5]4.2305,[6]4.4037,[7]4.5087,[8]4.5073,[9]4.4639,[10]4.5297,[11]4.5263,[12]4.5587,[13]4.7890,[14]4.8800,[15]4.9211,[16]5.1019,[17]4.9447,[18]5.0835,[19]5.3203,[20]5.2572,[21]5.2755,[22]5.2618,[23]5.2392,[24]5.0943,[25]4.9508,[26]4.8844,[27]4.7809,[28]4.7516,[29]4.6887,[30]4.6612,[31]4.7435,[32]4.7837,[33]4.9112,[34]4.9299,[35]4.9606,[36]5.0317,[37]5.1810,[38]5.2880,[39]5.2647,[40]5.3024,[41]5.3430,[42]5.3550,[43]5.3820,[44]5.4165,[45]5.3988,[46]5.3975,[47]5.4033,[48]5.4926,[49]5.5758,[50]5.5696,[51]5.5633,[52]5.5685,[53]5.5910,[54]5.6138,[55]5.6804,[56]5.6624,[57]5.7382,[58]5.7408,[59]5.7636,[60]5.8286,[61]5.8487,[62]5.8500,[63]5.8483,[64]5.8840,[65]5.9183,[66]5.9979,[67]6.0463,[68]6.0607,[69]6.0864,[70]6.0978,[71]6.1055,[72]6.0793,[73]6.1318,[74]6.1260,[75]6.1347,[76]6.1301,[77]6.1563,[78]6.1131,[79]6.1374,[80]6.0724,[81]6.0041,[82]5.9799,[83]5.9689,[84]5.9874,[85]5.9820,[86]5.9677,[87]5.9715,[88]6.0430,[89]6.0806,[90]6.0899,[91]6.0997,[92]6.0917,[93]6.1328,[94]6.1264,[95]6.1512,[96]6.1638,[97]6.1756,[98]6.1676,[99]6.1591,[100]6.1809,
Final estimate: PPL = 6.1809 +/- 0.09843

Also tested on a Q4_1 quant of Qwen 3.5 9B and got identical PPL results both with and without this change

Qwen 3.5 9B before these changes

[1]5.4693,[2]7.8183,[3]7.9967,[4]7.6863,[5]7.6045,[6]7.8830,[7]8.1620,[8]8.6953,[9]9.0948,[10]9.4159,[11]9.2208,[12]9.2591,[13]9.7531,[14]9.2597,[15]9.1784,[16]9.2925,[17]8.7051,[18]8.7208,[19]8.6739,[20]8.6143,[21]8.3104,[22]8.2161,[23]7.9049,[24]7.5473,[25]7.4064,[26]7.2133,[27]7.0963,[28]7.0035,[29]6.9969,[30]6.9612,[31]6.9099,[32]6.9075,[33]6.8637,[34]6.9363,[35]7.0285,[36]7.1741,[37]7.2542,[38]7.2405,[39]7.2368,[40]7.2920,[41]7.3035,[42]7.3447,[43]7.3416,[44]7.3447,[45]7.4416,[46]7.4029,[47]7.5285,[48]7.5930,[49]7.5287,[50]7.5751,[51]7.5716,[52]7.6133,[53]7.6466,[54]7.6818,[55]7.6809,[56]7.6989,[57]7.7229,[58]7.7238,[59]7.7321,[60]7.7508,[61]7.7775,[62]7.8264,[63]7.8687,[64]7.9271,[65]7.9943,[66]8.0362,[67]8.1292,[68]8.1672,[69]8.1757,[70]8.1486,[71]8.2084,[72]8.2052,[73]8.2490,[74]8.2449,[75]8.2189,[76]8.2017,[77]8.2362,[78]8.2535,[79]8.1724,[80]8.1116,[81]8.0884,[82]8.1005,[83]8.1097,[84]8.1072,[85]8.1208,[86]8.1595,[87]8.1614,[88]8.1653,[89]8.1234,[90]8.0978,[91]8.0926,[92]8.0734,[93]8.0991,[94]8.1069,[95]8.1173,[96]8.1096,[97]8.0955,[98]8.0777,[99]8.0775,[100]8.0963,
Final estimate: PPL = 8.0963 +/- 0.12933

Qwen 3.5 9B before these changes

[1]5.4693,[2]7.8183,[3]7.9967,[4]7.6863,[5]7.6045,[6]7.8830,[7]8.1620,[8]8.6953,[9]9.0948,[10]9.4159,[11]9.2208,[12]9.2591,[13]9.7531,[14]9.2597,[15]9.1784,[16]9.2925,[17]8.7051,[18]8.7208,[19]8.6739,[20]8.6143,[21]8.3104,[22]8.2161,[23]7.9049,[24]7.5473,[25]7.4064,[26]7.2133,[27]7.0963,[28]7.0035,[29]6.9969,[30]6.9612,[31]6.9099,[32]6.9075,[33]6.8637,[34]6.9363,[35]7.0285,[36]7.1741,[37]7.2542,[38]7.2405,[39]7.2368,[40]7.2920,[41]7.3035,[42]7.3447,[43]7.3416,[44]7.3447,[45]7.4416,[46]7.4029,[47]7.5285,[48]7.5930,[49]7.5287,[50]7.5751,[51]7.5716,[52]7.6133,[53]7.6466,[54]7.6818,[55]7.6809,[56]7.6989,[57]7.7229,[58]7.7238,[59]7.7321,[60]7.7508,[61]7.7775,[62]7.8264,[63]7.8687,[64]7.9271,[65]7.9943,[66]8.0362,[67]8.1292,[68]8.1672,[69]8.1757,[70]8.1486,[71]8.2084,[72]8.2052,[73]8.2490,[74]8.2449,[75]8.2189,[76]8.2017,[77]8.2362,[78]8.2535,[79]8.1724,[80]8.1116,[81]8.0884,[82]8.1005,[83]8.1097,[84]8.1072,[85]8.1208,[86]8.1595,[87]8.1614,[88]8.1653,[89]8.1234,[90]8.0978,[91]8.0926,[92]8.0734,[93]8.0991,[94]8.1069,[95]8.1173,[96]8.1096,[97]8.0955,[98]8.0777,[99]8.0775,[100]8.0963,
Final estimate: PPL = 8.0963 +/- 0.12933

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, Claude was used extensively for discovering the issue through trial/error and debugging code

ggerganov · 2026-04-09T08:30:18Z

Can you dump the BF16 values of the problematic tensor?

I also noticed some irregularities in this specific model in #20668 (comment)

To me it looks like the model data is not sound, so I don't think patching the code is warranted.

bartowski1182 · 2026-04-09T17:54:27Z

@ggerganov Yeah sure, and it's the same tensor that you noted in that eval bug.

I added the debugging code back so you can see this:

Q8_1 FP16 OVERFLOW: dst=ffn_moe_down-32 src0=blk.32.ffn_down_exps.weight src1=ffn_moe_swiglu-32 total_blocks=131072 s_inf=16 s_nan=0 max_finite_|s|=410

For the BF16 weights, ran a similar command to what you ran in the linked report:

./build/bin/llama-debug -m Mistral-Small-4-bf16.gguf -p "[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST]" -n 1 --tensor-filter "ffn_moe_weighted-32

common_debug_cb_eval:      ffn_moe_weighted-32 = (f32)        MUL(ffn_moe_down-32{4096, 128, 2, 1}, ffn_moe_weights_norm-32 (reshaped){1, 128, 2, 1}}) = {4096, 128, 2, 1}
    [
        [
            [     -0.0511,      -0.0220,       0.0349,    ...,      -0.0577,      -0.2088,      -0.0310  ],
            [      0.0186,       0.0214,       0.0072,    ...,       0.0049,      -0.0257,      -0.0022  ],
            [     -0.0089,      -0.0244,      -0.0031,    ...,       0.0025,       0.0092,      -0.0025  ],
            ..., 
            [     -0.0008,       0.0003,       0.0007,    ...,       0.0005,       0.0009,       0.0015  ],
            [      0.0001,       0.0002,      -0.0002,    ...,      -0.0004,       0.0017,      -0.0007  ],
            [     -0.0001,      -0.0002,       0.0000,    ...,       0.0001,       0.0006,      -0.0001  ],
        ],
        [
            [      0.0032,      -0.0039,       0.0005,    ...,      -0.0004,       0.0037,      -0.0035  ],
            [      0.0004,       0.0005,       0.0008,    ...,      -0.0012,      -0.0022,       0.0004  ],
            [      0.0009,      -0.0013,      -0.0016,    ...,      -0.0004,      -0.0001,       0.0038  ],
            ..., 
            [      0.0004,       0.0002,      -0.0001,    ...,       0.0002,      -0.0016,      -0.0001  ],
            [      0.0001,       0.0006,      -0.0000,    ...,       0.0001,       0.0000,      -0.0003  ],
            [      0.0004,      -0.0001,      -0.0000,    ...,       0.0000,       0.0001,       0.0005  ],
        ],
    ]
    sum = -21.966097
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0511,      -0.0220,       0.0349,    ...,      -0.0577,      -0.2088,      -0.0310  ],
            [      0.0032,      -0.0039,       0.0005,    ...,      -0.0004,       0.0037,      -0.0035  ],
        ],
    ]
    sum = -25.767612
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [      0.0186,       0.0214,       0.0072,    ...,       0.0049,      -0.0257,      -0.0022  ],
            [      0.0004,       0.0005,       0.0008,    ...,      -0.0012,      -0.0022,       0.0004  ],
        ],
    ]
    sum = 0.872716
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0089,      -0.0244,      -0.0031,    ...,       0.0025,       0.0092,      -0.0025  ],
            [      0.0009,      -0.0013,      -0.0016,    ...,      -0.0004,      -0.0001,       0.0038  ],
        ],
    ]
    sum = 1.773644
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0008,      -0.0067,       0.0098,    ...,      -0.0082,       0.0038,       0.0027  ],
            [      0.0002,      -0.0025,      -0.0003,    ...,      -0.0008,      -0.0006,      -0.0033  ],
        ],
    ]
    sum = 0.260006

system_info: n_threads = 96 (n_threads_batch = 96) / 192 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

common_debug_cb_eval:      ffn_moe_weighted-32 = (f32)        MUL(ffn_moe_down-32{4096, 4, 30, 1}, ffn_moe_weights_norm-32 (reshaped){1, 4, 30, 1}}) = {4096, 4, 30, 1}
    [
        [
            [  -2526.8816,    -468.2716,    5885.8682,    ...,    -205.6809,    1444.4355,   -1961.9440  ],
            [     -0.0054,       0.0031,      -0.0012,    ...,       0.0069,       0.0003,      -0.0012  ],
            [     -0.0001,      -0.0018,       0.0027,    ...,       0.0008,       0.0006,      -0.0027  ],
            [     -0.0023,      -0.0003,      -0.0008,    ...,      -0.0013,      -0.0000,      -0.0046  ],
        ],
        [
            [      0.5057,       0.3572,       0.7750,    ...,      -0.1811,       0.2851,       0.1452  ],
            [      0.0019,       0.2234,      -0.1807,    ...,       0.1392,       0.2369,       0.1664  ],
            [     -0.1560,      -0.0971,       0.1389,    ...,      -0.0336,       0.0982,      -0.0383  ],
            [     -0.0777,      -0.2010,       0.1259,    ...,       0.0096,       0.0090,       0.0095  ],
        ],
        [
            [     -0.1498,       0.1264,      -0.3107,    ...,       0.4183,      -0.3672,      -0.2537  ],
            [     -0.2451,       0.1512,       0.0539,    ...,      -0.0710,       0.0478,      -0.0811  ],
            [     -0.0553,       0.0492,       0.0706,    ...,       0.0006,       0.0383,      -0.0575  ],
            [      0.0330,       0.0586,      -0.0472,    ...,       0.0649,      -0.0267,       0.0149  ],
        ],
        ..., 
        [
            [      0.3370,      -0.4041,      -1.5760,    ...,       0.2102,       0.1811,       0.6098  ],
            [     -0.0259,       0.0100,      -0.0058,    ...,      -0.0275,      -0.0011,      -0.0559  ],
            [     -0.0052,       0.0170,      -0.0072,    ...,      -0.0358,       0.0118,      -0.0224  ],
            [     -0.0130,       0.0112,       0.0027,    ...,       0.0276,      -0.0050,       0.0055  ],
        ],
        [
            [     -0.0828,      -0.3576,      -0.2163,    ...,      -0.4464,       0.1573,       0.7532  ],
            [     -0.0241,       0.0534,       0.0650,    ...,      -0.0410,      -0.0222,      -0.0236  ],
            [      0.0251,       0.0047,      -0.0601,    ...,      -0.0091,       0.0226,      -0.0236  ],
            [     -0.0489,       0.0651,       0.1208,    ...,       0.0190,       0.0436,      -0.0181  ],
        ],
        [
            [      0.0869,      -0.0826,      -0.1747,    ...,       0.0032,      -0.4236,       0.2473  ],
            [      0.0479,      -0.1143,       0.0173,    ...,      -0.2089,       0.3555,      -0.0883  ],
            [      0.0113,       0.0037,      -0.0545,    ...,      -0.0280,      -0.0623,       0.0449  ],
            [     -0.0140,       0.0095,       0.0075,    ...,      -0.0045,      -0.0471,       0.0529  ],
        ],
    ]
    sum = -577249.375000
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [  -2526.8816,    -468.2716,    5885.8682,    ...,    -205.6809,    1444.4355,   -1961.9440  ],
            [      0.5057,       0.3572,       0.7750,    ...,      -0.1811,       0.2851,       0.1452  ],
            [     -0.1498,       0.1264,      -0.3107,    ...,       0.4183,      -0.3672,      -0.2537  ],
            ..., 
            [      0.3370,      -0.4041,      -1.5760,    ...,       0.2102,       0.1811,       0.6098  ],
            [     -0.0828,      -0.3576,      -0.2163,    ...,      -0.4464,       0.1573,       0.7532  ],
            [      0.0869,      -0.0826,      -0.1747,    ...,       0.0032,      -0.4236,       0.2473  ],
        ],
    ]
    sum = -577287.187500
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0054,       0.0031,      -0.0012,    ...,       0.0069,       0.0003,      -0.0012  ],
            [      0.0019,       0.2234,      -0.1807,    ...,       0.1392,       0.2369,       0.1664  ],
            [     -0.2451,       0.1512,       0.0539,    ...,      -0.0710,       0.0478,      -0.0811  ],
            ..., 
            [     -0.0259,       0.0100,      -0.0058,    ...,      -0.0275,      -0.0011,      -0.0559  ],
            [     -0.0241,       0.0534,       0.0650,    ...,      -0.0410,      -0.0222,      -0.0236  ],
            [      0.0479,      -0.1143,       0.0173,    ...,      -0.2089,       0.3555,      -0.0883  ],
        ],
    ]
    sum = 36.512115
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0001,      -0.0018,       0.0027,    ...,       0.0008,       0.0006,      -0.0027  ],
            [     -0.1560,      -0.0971,       0.1389,    ...,      -0.0336,       0.0982,      -0.0383  ],
            [     -0.0553,       0.0492,       0.0706,    ...,       0.0006,       0.0383,      -0.0575  ],
            ..., 
            [     -0.0052,       0.0170,      -0.0072,    ...,      -0.0358,       0.0118,      -0.0224  ],
            [      0.0251,       0.0047,      -0.0601,    ...,      -0.0091,       0.0226,      -0.0236  ],
            [      0.0113,       0.0037,      -0.0545,    ...,      -0.0280,      -0.0623,       0.0449  ],
        ],
    ]
    sum = 8.844553
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0023,      -0.0003,      -0.0008,    ...,      -0.0013,      -0.0000,      -0.0046  ],
            [     -0.0777,      -0.2010,       0.1259,    ...,       0.0096,       0.0090,       0.0095  ],
            [      0.0330,       0.0586,      -0.0472,    ...,       0.0649,      -0.0267,       0.0149  ],
            ..., 
            [     -0.0130,       0.0112,       0.0027,    ...,       0.0276,      -0.0050,       0.0055  ],
            [     -0.0489,       0.0651,       0.1208,    ...,       0.0190,       0.0436,      -0.0181  ],
            [     -0.0140,       0.0095,       0.0075,    ...,      -0.0045,      -0.0471,       0.0529  ],
        ],
    ]
    sum = -14.182524
Model add_bos: true
Input prompt: "[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST]"
Token ids (30):
<s>(1) [(1091) SY(101289) STEM(58343) _PRO(25396) MP(7690) T(1084) ](1093)  You(3213)  are(1584)  a(1261)  helpful(20351)  assistant(27089) [(1091) /(1047) SY(101289) STEM(58343) _PRO(25396) MP(7690) T(1084) ][(5371) IN(3174) ST(3074) ](1093)  Hello(45383) [(1091) /(1047) IN(3174) ST(3074) ](1093)

And with Q4_0 (with ffn_down set to Q4_1) in case it's relevant:

./build/bin/llama-debug -m Mistral-Small-4-Q4_0.gguf -p "[SYSTEM_PROMPT] You are a helpful assistant[/SYSTEM_PROMPT][INST] Hello[/INST]" -n 1 --tensor-filter "ffn_moe_weighted-32

common_debug_cb_eval:      ffn_moe_weighted-32 = (f32)        MUL(ffn_moe_down-32{4096, 128, 2, 1}, ffn_moe_weights_norm-32 (reshaped){1, 128, 2, 1}}) = {4096, 128, 2, 1}
    [
        [
            [     -0.0377,      -0.0534,       0.0481,    ...,      -0.0858,      -0.2257,       0.0617  ],
            [      0.0159,       0.0224,       0.0115,    ...,       0.0055,      -0.0229,      -0.0037  ],
            [     -0.0093,      -0.0266,      -0.0029,    ...,       0.0008,       0.0078,       0.0002  ],
            ..., 
            [      0.0001,       0.0001,      -0.0001,    ...,      -0.0003,       0.0014,      -0.0006  ],
            [     -0.0007,       0.0004,       0.0007,    ...,       0.0005,       0.0010,       0.0014  ],
            [      0.0001,      -0.0003,       0.0001,    ...,       0.0001,       0.0006,      -0.0001  ],
        ],
        [
            [      0.0030,      -0.0027,       0.0018,    ...,      -0.0004,       0.0024,      -0.0029  ],
            [      0.0002,       0.0004,       0.0005,    ...,      -0.0010,      -0.0020,       0.0001  ],
            [      0.0008,      -0.0008,      -0.0016,    ...,      -0.0013,      -0.0007,       0.0037  ],
            ..., 
            [      0.0005,       0.0003,      -0.0000,    ...,       0.0002,      -0.0017,      -0.0001  ],
            [      0.0001,       0.0008,       0.0000,    ...,       0.0002,      -0.0003,      -0.0005  ],
            [      0.0005,      -0.0002,       0.0000,    ...,      -0.0001,      -0.0000,       0.0003  ],
        ],
    ]
    sum = -21.766863
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0377,      -0.0534,       0.0481,    ...,      -0.0858,      -0.2257,       0.0617  ],
            [      0.0030,      -0.0027,       0.0018,    ...,      -0.0004,       0.0024,      -0.0029  ],
        ],
    ]
    sum = -25.451143
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [      0.0159,       0.0224,       0.0115,    ...,       0.0055,      -0.0229,      -0.0037  ],
            [      0.0002,       0.0004,       0.0005,    ...,      -0.0010,      -0.0020,       0.0001  ],
        ],
    ]
    sum = 0.668126
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0093,      -0.0266,      -0.0029,    ...,       0.0008,       0.0078,       0.0002  ],
            [      0.0008,      -0.0008,      -0.0016,    ...,      -0.0013,      -0.0007,       0.0037  ],
        ],
    ]
    sum = 1.542167
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 128, 2, 1}, }) = {4096, 2, 1, 1}
    [
        [
            [     -0.0005,      -0.0085,       0.0111,    ...,      -0.0100,       0.0063,       0.0026  ],
            [      0.0006,      -0.0000,       0.0004,    ...,      -0.0011,       0.0013,      -0.0015  ],
        ],
    ]
    sum = 0.344204

system_info: n_threads = 96 (n_threads_batch = 96) / 192 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | 

common_debug_cb_eval:      ffn_moe_weighted-32 = (f32)        MUL(ffn_moe_down-32{4096, 4, 30, 1}, ffn_moe_weights_norm-32 (reshaped){1, 4, 30, 1}}) = {4096, 4, 30, 1}
    [
        [
            [        -nan,         -nan,         -nan,    ...,         -nan,         -nan,         -nan  ],
            [     -0.0046,       0.0024,      -0.0007,    ...,       0.0076,      -0.0001,      -0.0018  ],
            [     -0.0003,      -0.0017,       0.0028,    ...,       0.0011,       0.0001,      -0.0032  ],
            [     -0.0024,      -0.0001,      -0.0010,    ...,      -0.0012,      -0.0003,      -0.0046  ],
        ],
        [
            [      0.4818,       0.3821,       0.7114,    ...,      -0.1320,       0.2678,       0.1817  ],
            [     -0.0309,       0.1859,      -0.1440,    ...,       0.1226,       0.2021,       0.1455  ],
            [     -0.1730,      -0.1076,       0.1479,    ...,      -0.0188,       0.1073,      -0.0484  ],
            [     -0.0807,      -0.1962,       0.1267,    ...,      -0.0168,       0.0028,       0.0135  ],
        ],
        [
            [     -0.1117,       0.1570,      -0.2545,    ...,       0.3830,      -0.3810,      -0.1876  ],
            [     -0.2143,       0.1105,       0.0423,    ...,      -0.0512,       0.0245,      -0.0892  ],
            [     -0.0464,       0.0308,       0.0665,    ...,      -0.0129,       0.0517,      -0.0598  ],
            [      0.0488,       0.0612,      -0.0372,    ...,       0.0743,      -0.0307,       0.0143  ],
        ],
        ..., 
        [
            [     -0.0786,      -0.2572,      -0.2437,    ...,      -0.1087,       0.0904,       0.1486  ],
            [      0.0056,       0.0713,       0.0265,    ...,      -0.0325,      -0.0137,      -0.1162  ],
            [      0.0425,       0.1282,       0.0186,    ...,       0.0156,      -0.0127,       0.0121  ],
            [      0.0365,      -0.0189,      -0.0376,    ...,       0.0461,       0.0720,       0.0437  ],
        ],
        [
            [      0.1010,      -0.5950,      -0.6300,    ...,      -0.5743,       0.3738,       0.5940  ],
            [     -0.0817,       0.0937,       0.0383,    ...,       0.1123,      -0.0569,       0.0118  ],
            [      0.0167,      -0.0310,      -0.0787,    ...,       0.0418,      -0.0279,      -0.0389  ],
            [     -0.0318,      -0.0012,      -0.0139,    ...,      -0.0261,      -0.0425,      -0.0318  ],
        ],
        [
            [      0.0268,      -0.1215,       0.0420,    ...,       0.1526,       0.3784,      -0.2065  ],
            [      0.1985,      -0.0054,      -0.2593,    ...,      -0.0466,      -0.3999,       0.2104  ],
            [      0.0128,      -0.0004,       0.0352,    ...,       0.0763,      -0.0343,      -0.0307  ],
            [      0.0292,       0.0563,      -0.0295,    ...,      -0.0066,      -0.0192,       0.0690  ],
        ],
    ]
    sum = -nan
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [        -nan,         -nan,         -nan,    ...,         -nan,         -nan,         -nan  ],
            [      0.4818,       0.3821,       0.7114,    ...,      -0.1320,       0.2678,       0.1817  ],
            [     -0.1117,       0.1570,      -0.2545,    ...,       0.3830,      -0.3810,      -0.1876  ],
            ..., 
            [     -0.0786,      -0.2572,      -0.2437,    ...,      -0.1087,       0.0904,       0.1486  ],
            [      0.1010,      -0.5950,      -0.6300,    ...,      -0.5743,       0.3738,       0.5940  ],
            [      0.0268,      -0.1215,       0.0420,    ...,       0.1526,       0.3784,      -0.2065  ],
        ],
    ]
    sum = -nan
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0046,       0.0024,      -0.0007,    ...,       0.0076,      -0.0001,      -0.0018  ],
            [     -0.0309,       0.1859,      -0.1440,    ...,       0.1226,       0.2021,       0.1455  ],
            [     -0.2143,       0.1105,       0.0423,    ...,      -0.0512,       0.0245,      -0.0892  ],
            ..., 
            [      0.0056,       0.0713,       0.0265,    ...,      -0.0325,      -0.0137,      -0.1162  ],
            [     -0.0817,       0.0937,       0.0383,    ...,       0.1123,      -0.0569,       0.0118  ],
            [      0.1985,      -0.0054,      -0.2593,    ...,      -0.0466,      -0.3999,       0.2104  ],
        ],
    ]
    sum = 26.722431
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0003,      -0.0017,       0.0028,    ...,       0.0011,       0.0001,      -0.0032  ],
            [     -0.1730,      -0.1076,       0.1479,    ...,      -0.0188,       0.1073,      -0.0484  ],
            [     -0.0464,       0.0308,       0.0665,    ...,      -0.0129,       0.0517,      -0.0598  ],
            ..., 
            [      0.0425,       0.1282,       0.0186,    ...,       0.0156,      -0.0127,       0.0121  ],
            [      0.0167,      -0.0310,      -0.0787,    ...,       0.0418,      -0.0279,      -0.0389  ],
            [      0.0128,      -0.0004,       0.0352,    ...,       0.0763,      -0.0343,      -0.0307  ],
        ],
    ]
    sum = 43.818310
common_debug_cb_eval: ffn_moe_weighted-32 (view) = (f32)       VIEW(ffn_moe_weighted-32{4096, 4, 30, 1}, }) = {4096, 30, 1, 1}
    [
        [
            [     -0.0024,      -0.0001,      -0.0010,    ...,      -0.0012,      -0.0003,      -0.0046  ],
            [     -0.0807,      -0.1962,       0.1267,    ...,      -0.0168,       0.0028,       0.0135  ],
            [      0.0488,       0.0612,      -0.0372,    ...,       0.0743,      -0.0307,       0.0143  ],
            ..., 
            [      0.0365,      -0.0189,      -0.0376,    ...,       0.0461,       0.0720,       0.0437  ],
            [     -0.0318,      -0.0012,      -0.0139,    ...,      -0.0261,      -0.0425,      -0.0318  ],
            [      0.0292,       0.0563,      -0.0295,    ...,      -0.0066,      -0.0192,       0.0690  ],
        ],
    ]
    sum = -40.639267
llama-debug: /home/colin/git_repos/forks/mistral4-q4_0/ggml/src/ggml-cpu/ops.cpp:3756: void ggml_compute_forward_rms_norm_f32(const ggml_compute_params*, ggml_tensor*): Assertion `scale > 0.0f' failed.

(this was without my changes so it asserted)

If the model data is not sound, not sure where to go from here, though this clamping does make it run and doesn't affect any sound model.. But I totally understand not wanting to put arbitrary code that masks bugs with the model itself, so more than happy to hear your personal judgement

bartowski1182 added 3 commits April 8, 2026 18:34

Prevent the sum of the dequantized activation in q8_1 from overflowing

835acb7

Merge branch 'ggml-org:master' into mistral4-q4_0

d071411

Add CUDA overflow protection

48f1d71

bartowski1182 marked this pull request as ready for review April 9, 2026 02:35

bartowski1182 requested review from a team and ggerganov as code owners April 9, 2026 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent the sum of the dequantized activation in q8_1 from overflowing#21652

Prevent the sum of the dequantized activation in q8_1 from overflowing#21652
bartowski1182 wants to merge 3 commits intoggml-org:masterfrom
bartowski1182:mistral4-q4_0

bartowski1182 commented Apr 9, 2026 •

edited

Loading

Uh oh!

ggerganov commented Apr 9, 2026

Uh oh!

bartowski1182 commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bartowski1182 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

ggerganov commented Apr 9, 2026

Uh oh!

bartowski1182 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bartowski1182 commented Apr 9, 2026 •

edited

Loading

bartowski1182 commented Apr 9, 2026 •

edited

Loading