Skip to content

Fix Qwen3 garbage output: RMSNorm +1 for all Qwen-family models#23

Merged
unamedkr merged 1 commit into
mainfrom
fix/qwen3-rmsnorm-plus-one
Apr 10, 2026
Merged

Fix Qwen3 garbage output: RMSNorm +1 for all Qwen-family models#23
unamedkr merged 1 commit into
mainfrom
fix/qwen3-rmsnorm-plus-one

Conversation

@unamedkr
Copy link
Copy Markdown
Collaborator

Summary

Fixes Qwen3 0.6B (and likely all Qwen2/2.5/3 non-DeltaNet models) producing garbage output in both native CLI and WASM demo.

Root cause

Qwen's RMSNorm computes output = norm(x) * (1 + weight), not norm(x) * weight. The +1 weight bake-in was only triggered when:

  • delta_n_heads > 0 (DeltaNet/Qwen3.5-hybrid), or
  • model_type == 1 (Gemma)

Plain Qwen3 has delta_n_heads=0 and model_type=0adjustment skipped → activations explode by layer 2 (values reaching 6000+) → garbage tokens.

Fix

Detect Qwen-family models via strstr(gguf->arch, "qwen") using the existing model->gguf_ctx reference. This covers all Qwen variants: qwen2, qwen2moe, qwen3, qwen3_5.

int is_qwen_family = (model->config.delta_n_heads > 0);
if (model->gguf_ctx) {
    const tq_gguf_ctx_t* gctx = (const tq_gguf_ctx_t*)model->gguf_ctx;
    if (strstr(gctx->arch, "qwen") != NULL) is_qwen_family = 1;
}

Impact

Model Before After
Qwen3-0.6B Garbage Fixed
Qwen2/2.5 (all sizes) Garbage (if anyone tried) Fixed
Qwen3.5 (DeltaNet) Working (delta_n_heads > 0) Still working
Gemma 3/4 Working (model_type == 1) Unchanged
Llama/SmolLM/Phi/Mistral Working (no RMSNorm +1) Unchanged

Test plan

  • Native build passes
  • WASM rebuilt with fix
  • Qwen3-0.6B WASM demo produces coherent output
  • SmolLM2 / Llama 3.2 still work (regression check)

🤖 Generated with Claude Code

Qwen's RMSNorm computes `output = norm(x) * (1 + weight)`, not
`norm(x) * weight`. The +1 weight adjustment was only applied when
`delta_n_heads > 0` (DeltaNet/Qwen3.5-hybrid) or `model_type == 1`
(Gemma). Plain Qwen3 (and Qwen2/2.5) models have `delta_n_heads=0`
and `model_type=0`, so the adjustment was skipped entirely.

Without it, RMSNorm produces wrong scales and activations explode
by layer 2 (values reaching 6000+), generating garbage tokens.

Fix: detect any Qwen-family model via `strstr(gguf->arch, "qwen")`
in addition to the existing DeltaNet check. This covers qwen2,
qwen2moe, qwen3, qwen3_5 — all use the same (1+w) RMSNorm.

Applied to tq_model.c (library) + quant.h (single-header/WASM).
WASM binary rebuilt to include the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@unamedkr unamedkr merged commit a44df86 into main Apr 10, 2026
3 checks passed
@unamedkr unamedkr deleted the fix/qwen3-rmsnorm-plus-one branch April 10, 2026 05:55
unamedkr added a commit that referenced this pull request Apr 10, 2026
PR #23 incorrectly added RMSNorm +1 for all Qwen-family GGUF models.
Investigation reveals:
- Qwen2/Qwen3: standard RMSNorm (weight * norm(x)), no +1 needed
- Qwen3.5/Gemma: use (1+weight), but llama.cpp's GGUF converter
  already bakes +1 into the weights during conversion
- Runtime +1 was double-applying for Qwen3.5 and incorrectly
  applying for Qwen2/3, causing activation explosion

Fix: skip runtime +1 for all GGUF models. Only apply for non-GGUF
(raw checkpoint) DeltaNet models.

Also switch WASM demo default from Qwen3-0.6B Q4_K_M (broken due to
double-quantization on a tiny model) to Qwen3.5-0.8B Q4_K_M (~508 MB)
which produces coherent output at 25 tok/s.

Verified:
- Qwen3.5 0.8B Q8_0: coherent English output
- Llama 3.2 1B Q8_0: coherent English output (unchanged)
- Qwen3 0.6B Q4_K_M: real words now (was garbage Unicode), but
  quality limited by double-quantization on 0.6B model

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
unamedkr added a commit that referenced this pull request Apr 10, 2026
…#24)

PR #23 incorrectly added RMSNorm +1 for all Qwen-family GGUF models.
Investigation reveals:
- Qwen2/Qwen3: standard RMSNorm (weight * norm(x)), no +1 needed
- Qwen3.5/Gemma: use (1+weight), but llama.cpp's GGUF converter
  already bakes +1 into the weights during conversion
- Runtime +1 was double-applying for Qwen3.5 and incorrectly
  applying for Qwen2/3, causing activation explosion

Fix: skip runtime +1 for all GGUF models. Only apply for non-GGUF
(raw checkpoint) DeltaNet models.

Also switch WASM demo default from Qwen3-0.6B Q4_K_M (broken due to
double-quantization on a tiny model) to Qwen3.5-0.8B Q4_K_M (~508 MB)
which produces coherent output at 25 tok/s.

Verified:
- Qwen3.5 0.8B Q8_0: coherent English output
- Llama 3.2 1B Q8_0: coherent English output (unchanged)
- Qwen3 0.6B Q4_K_M: real words now (was garbage Unicode), but
  quality limited by double-quantization on 0.6B model

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant