Fix Qwen3 garbage output: RMSNorm +1 for all Qwen-family models by unamedkr · Pull Request #23 · quantumaikr/quant.cpp

unamedkr · 2026-04-10T05:55:15Z

Summary

Fixes Qwen3 0.6B (and likely all Qwen2/2.5/3 non-DeltaNet models) producing garbage output in both native CLI and WASM demo.

Root cause

Qwen's RMSNorm computes output = norm(x) * (1 + weight), not norm(x) * weight. The +1 weight bake-in was only triggered when:

delta_n_heads > 0 (DeltaNet/Qwen3.5-hybrid), or
model_type == 1 (Gemma)

Plain Qwen3 has delta_n_heads=0 and model_type=0 → adjustment skipped → activations explode by layer 2 (values reaching 6000+) → garbage tokens.

Fix

Detect Qwen-family models via strstr(gguf->arch, "qwen") using the existing model->gguf_ctx reference. This covers all Qwen variants: qwen2, qwen2moe, qwen3, qwen3_5.

int is_qwen_family = (model->config.delta_n_heads > 0);
if (model->gguf_ctx) {
    const tq_gguf_ctx_t* gctx = (const tq_gguf_ctx_t*)model->gguf_ctx;
    if (strstr(gctx->arch, "qwen") != NULL) is_qwen_family = 1;
}

Impact

Model	Before	After
Qwen3-0.6B	Garbage	Fixed
Qwen2/2.5 (all sizes)	Garbage (if anyone tried)	Fixed
Qwen3.5 (DeltaNet)	Working (delta_n_heads > 0)	Still working
Gemma 3/4	Working (model_type == 1)	Unchanged
Llama/SmolLM/Phi/Mistral	Working (no RMSNorm +1)	Unchanged

Test plan

Native build passes
WASM rebuilt with fix
Qwen3-0.6B WASM demo produces coherent output
SmolLM2 / Llama 3.2 still work (regression check)

🤖 Generated with Claude Code

Qwen's RMSNorm computes `output = norm(x) * (1 + weight)`, not `norm(x) * weight`. The +1 weight adjustment was only applied when `delta_n_heads > 0` (DeltaNet/Qwen3.5-hybrid) or `model_type == 1` (Gemma). Plain Qwen3 (and Qwen2/2.5) models have `delta_n_heads=0` and `model_type=0`, so the adjustment was skipped entirely. Without it, RMSNorm produces wrong scales and activations explode by layer 2 (values reaching 6000+), generating garbage tokens. Fix: detect any Qwen-family model via `strstr(gguf->arch, "qwen")` in addition to the existing DeltaNet check. This covers qwen2, qwen2moe, qwen3, qwen3_5 — all use the same (1+w) RMSNorm. Applied to tq_model.c (library) + quant.h (single-header/WASM). WASM binary rebuilt to include the fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR #23 incorrectly added RMSNorm +1 for all Qwen-family GGUF models. Investigation reveals: - Qwen2/Qwen3: standard RMSNorm (weight * norm(x)), no +1 needed - Qwen3.5/Gemma: use (1+weight), but llama.cpp's GGUF converter already bakes +1 into the weights during conversion - Runtime +1 was double-applying for Qwen3.5 and incorrectly applying for Qwen2/3, causing activation explosion Fix: skip runtime +1 for all GGUF models. Only apply for non-GGUF (raw checkpoint) DeltaNet models. Also switch WASM demo default from Qwen3-0.6B Q4_K_M (broken due to double-quantization on a tiny model) to Qwen3.5-0.8B Q4_K_M (~508 MB) which produces coherent output at 25 tok/s. Verified: - Qwen3.5 0.8B Q8_0: coherent English output - Llama 3.2 1B Q8_0: coherent English output (unchanged) - Qwen3 0.6B Q4_K_M: real words now (was garbage Unicode), but quality limited by double-quantization on 0.6B model Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#24) PR #23 incorrectly added RMSNorm +1 for all Qwen-family GGUF models. Investigation reveals: - Qwen2/Qwen3: standard RMSNorm (weight * norm(x)), no +1 needed - Qwen3.5/Gemma: use (1+weight), but llama.cpp's GGUF converter already bakes +1 into the weights during conversion - Runtime +1 was double-applying for Qwen3.5 and incorrectly applying for Qwen2/3, causing activation explosion Fix: skip runtime +1 for all GGUF models. Only apply for non-GGUF (raw checkpoint) DeltaNet models. Also switch WASM demo default from Qwen3-0.6B Q4_K_M (broken due to double-quantization on a tiny model) to Qwen3.5-0.8B Q4_K_M (~508 MB) which produces coherent output at 25 tok/s. Verified: - Qwen3.5 0.8B Q8_0: coherent English output - Llama 3.2 1B Q8_0: coherent English output (unchanged) - Qwen3 0.6B Q4_K_M: real words now (was garbage Unicode), but quality limited by double-quantization on 0.6B model Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

unamedkr merged commit a44df86 into main Apr 10, 2026
3 checks passed

unamedkr deleted the fix/qwen3-rmsnorm-plus-one branch April 10, 2026 05:55

unamedkr mentioned this pull request Apr 10, 2026

Fix Qwen RMSNorm + switch WASM demo to Qwen3.5 0.8B #24

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen3 garbage output: RMSNorm +1 for all Qwen-family models#23

Fix Qwen3 garbage output: RMSNorm +1 for all Qwen-family models#23
unamedkr merged 1 commit into
mainfrom
fix/qwen3-rmsnorm-plus-one

unamedkr commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

unamedkr commented Apr 10, 2026

Summary

Root cause

Fix

Impact

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant