Skip to content

Fix Qwen RMSNorm + switch WASM demo to Qwen3.5 0.8B#24

Merged
unamedkr merged 1 commit into
mainfrom
fix/qwen-rmsnorm-revert-demo-model
Apr 10, 2026
Merged

Fix Qwen RMSNorm + switch WASM demo to Qwen3.5 0.8B#24
unamedkr merged 1 commit into
mainfrom
fix/qwen-rmsnorm-revert-demo-model

Conversation

@unamedkr
Copy link
Copy Markdown
Collaborator

Summary

Corrects PR #23's incorrect RMSNorm +1 and switches the WASM demo to a working model.

RMSNorm fix

PR #23 added weight += 1.0 for all Qwen GGUF models. This was wrong:

Model RMSNorm type GGUF +1 baked? Runtime +1 needed?
Qwen2/Qwen3 standard w * norm(x) No No
Qwen3.5 (1+w) * norm(x) Yes (converter) No
Gemma (1+w) * norm(x) Yes (converter) No

Fix: Skip runtime +1 for all GGUF models. Only apply for non-GGUF raw checkpoints with DeltaNet.

Demo model switch

Before After
Default Qwen3-0.6B Q4_K_M (378 MB) Qwen3.5-0.8B Q4_K_M (508 MB)
Quality Garbage (double-quant on tiny model) Coherent (25 tok/s verified)

Verification (native CLI)

Qwen3.5 0.8B Q8_0:  "a good place for anyone looking to explore..."  ✅
Llama 3.2 1B Q8_0:   "the boustreau de Brest, which is a fortified..." ✅
Qwen3 0.6B Q4_K_M:  "This is A... This website question..." ⚠ (real words, poor quality)

Test plan

  • Native build passes
  • WASM rebuilt
  • Qwen3.5 0.8B coherent output verified (native)
  • Llama 3.2 1B still works
  • WASM demo: Qwen3.5 downloads, loads, generates coherent text

🤖 Generated with Claude Code

PR #23 incorrectly added RMSNorm +1 for all Qwen-family GGUF models.
Investigation reveals:
- Qwen2/Qwen3: standard RMSNorm (weight * norm(x)), no +1 needed
- Qwen3.5/Gemma: use (1+weight), but llama.cpp's GGUF converter
  already bakes +1 into the weights during conversion
- Runtime +1 was double-applying for Qwen3.5 and incorrectly
  applying for Qwen2/3, causing activation explosion

Fix: skip runtime +1 for all GGUF models. Only apply for non-GGUF
(raw checkpoint) DeltaNet models.

Also switch WASM demo default from Qwen3-0.6B Q4_K_M (broken due to
double-quantization on a tiny model) to Qwen3.5-0.8B Q4_K_M (~508 MB)
which produces coherent output at 25 tok/s.

Verified:
- Qwen3.5 0.8B Q8_0: coherent English output
- Llama 3.2 1B Q8_0: coherent English output (unchanged)
- Qwen3 0.6B Q4_K_M: real words now (was garbage Unicode), but
  quality limited by double-quantization on 0.6B model

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@unamedkr unamedkr merged commit 9b9ce04 into main Apr 10, 2026
3 checks passed
@unamedkr unamedkr deleted the fix/qwen-rmsnorm-revert-demo-model branch April 10, 2026 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant