Fix Qwen3 garbage output: RMSNorm +1 for all Qwen-family models#23
Merged
Conversation
Qwen's RMSNorm computes `output = norm(x) * (1 + weight)`, not `norm(x) * weight`. The +1 weight adjustment was only applied when `delta_n_heads > 0` (DeltaNet/Qwen3.5-hybrid) or `model_type == 1` (Gemma). Plain Qwen3 (and Qwen2/2.5) models have `delta_n_heads=0` and `model_type=0`, so the adjustment was skipped entirely. Without it, RMSNorm produces wrong scales and activations explode by layer 2 (values reaching 6000+), generating garbage tokens. Fix: detect any Qwen-family model via `strstr(gguf->arch, "qwen")` in addition to the existing DeltaNet check. This covers qwen2, qwen2moe, qwen3, qwen3_5 — all use the same (1+w) RMSNorm. Applied to tq_model.c (library) + quant.h (single-header/WASM). WASM binary rebuilt to include the fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
unamedkr
added a commit
that referenced
this pull request
Apr 10, 2026
PR #23 incorrectly added RMSNorm +1 for all Qwen-family GGUF models. Investigation reveals: - Qwen2/Qwen3: standard RMSNorm (weight * norm(x)), no +1 needed - Qwen3.5/Gemma: use (1+weight), but llama.cpp's GGUF converter already bakes +1 into the weights during conversion - Runtime +1 was double-applying for Qwen3.5 and incorrectly applying for Qwen2/3, causing activation explosion Fix: skip runtime +1 for all GGUF models. Only apply for non-GGUF (raw checkpoint) DeltaNet models. Also switch WASM demo default from Qwen3-0.6B Q4_K_M (broken due to double-quantization on a tiny model) to Qwen3.5-0.8B Q4_K_M (~508 MB) which produces coherent output at 25 tok/s. Verified: - Qwen3.5 0.8B Q8_0: coherent English output - Llama 3.2 1B Q8_0: coherent English output (unchanged) - Qwen3 0.6B Q4_K_M: real words now (was garbage Unicode), but quality limited by double-quantization on 0.6B model Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
unamedkr
added a commit
that referenced
this pull request
Apr 10, 2026
…#24) PR #23 incorrectly added RMSNorm +1 for all Qwen-family GGUF models. Investigation reveals: - Qwen2/Qwen3: standard RMSNorm (weight * norm(x)), no +1 needed - Qwen3.5/Gemma: use (1+weight), but llama.cpp's GGUF converter already bakes +1 into the weights during conversion - Runtime +1 was double-applying for Qwen3.5 and incorrectly applying for Qwen2/3, causing activation explosion Fix: skip runtime +1 for all GGUF models. Only apply for non-GGUF (raw checkpoint) DeltaNet models. Also switch WASM demo default from Qwen3-0.6B Q4_K_M (broken due to double-quantization on a tiny model) to Qwen3.5-0.8B Q4_K_M (~508 MB) which produces coherent output at 25 tok/s. Verified: - Qwen3.5 0.8B Q8_0: coherent English output - Llama 3.2 1B Q8_0: coherent English output (unchanged) - Qwen3 0.6B Q4_K_M: real words now (was garbage Unicode), but quality limited by double-quantization on 0.6B model Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes Qwen3 0.6B (and likely all Qwen2/2.5/3 non-DeltaNet models) producing garbage output in both native CLI and WASM demo.
Root cause
Qwen's RMSNorm computes
output = norm(x) * (1 + weight), notnorm(x) * weight. The+1weight bake-in was only triggered when:delta_n_heads > 0(DeltaNet/Qwen3.5-hybrid), ormodel_type == 1(Gemma)Plain Qwen3 has
delta_n_heads=0andmodel_type=0→ adjustment skipped → activations explode by layer 2 (values reaching 6000+) → garbage tokens.Fix
Detect Qwen-family models via
strstr(gguf->arch, "qwen")using the existingmodel->gguf_ctxreference. This covers all Qwen variants: qwen2, qwen2moe, qwen3, qwen3_5.Impact
Test plan
🤖 Generated with Claude Code