Fix Qwen RMSNorm + switch WASM demo to Qwen3.5 0.8B by unamedkr · Pull Request #24 · quantumaikr/quant.cpp

unamedkr · 2026-04-10T06:16:28Z

Summary

Corrects PR #23's incorrect RMSNorm +1 and switches the WASM demo to a working model.

RMSNorm fix

PR #23 added weight += 1.0 for all Qwen GGUF models. This was wrong:

Model	RMSNorm type	GGUF +1 baked?	Runtime +1 needed?
Qwen2/Qwen3	standard `w * norm(x)`	No	No
Qwen3.5	`(1+w) * norm(x)`	Yes (converter)	No
Gemma	`(1+w) * norm(x)`	Yes (converter)	No

Fix: Skip runtime +1 for all GGUF models. Only apply for non-GGUF raw checkpoints with DeltaNet.

Demo model switch

	Before	After
Default	Qwen3-0.6B Q4_K_M (378 MB)	Qwen3.5-0.8B Q4_K_M (508 MB)
Quality	Garbage (double-quant on tiny model)	Coherent (25 tok/s verified)

Verification (native CLI)

Qwen3.5 0.8B Q8_0:  "a good place for anyone looking to explore..."  ✅
Llama 3.2 1B Q8_0:   "the boustreau de Brest, which is a fortified..." ✅
Qwen3 0.6B Q4_K_M:  "This is A... This website question..." ⚠ (real words, poor quality)

Test plan

Native build passes
WASM rebuilt
Qwen3.5 0.8B coherent output verified (native)
Llama 3.2 1B still works
WASM demo: Qwen3.5 downloads, loads, generates coherent text

🤖 Generated with Claude Code

PR #23 incorrectly added RMSNorm +1 for all Qwen-family GGUF models. Investigation reveals: - Qwen2/Qwen3: standard RMSNorm (weight * norm(x)), no +1 needed - Qwen3.5/Gemma: use (1+weight), but llama.cpp's GGUF converter already bakes +1 into the weights during conversion - Runtime +1 was double-applying for Qwen3.5 and incorrectly applying for Qwen2/3, causing activation explosion Fix: skip runtime +1 for all GGUF models. Only apply for non-GGUF (raw checkpoint) DeltaNet models. Also switch WASM demo default from Qwen3-0.6B Q4_K_M (broken due to double-quantization on a tiny model) to Qwen3.5-0.8B Q4_K_M (~508 MB) which produces coherent output at 25 tok/s. Verified: - Qwen3.5 0.8B Q8_0: coherent English output - Llama 3.2 1B Q8_0: coherent English output (unchanged) - Qwen3 0.6B Q4_K_M: real words now (was garbage Unicode), but quality limited by double-quantization on 0.6B model Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

unamedkr merged commit 9b9ce04 into main Apr 10, 2026
3 checks passed

unamedkr deleted the fix/qwen-rmsnorm-revert-demo-model branch April 10, 2026 06:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen RMSNorm + switch WASM demo to Qwen3.5 0.8B#24

Fix Qwen RMSNorm + switch WASM demo to Qwen3.5 0.8B#24
unamedkr merged 1 commit into
mainfrom
fix/qwen-rmsnorm-revert-demo-model

unamedkr commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

unamedkr commented Apr 10, 2026

Summary

RMSNorm fix

Demo model switch

Verification (native CLI)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant