Skip to content

WebGPU: Qwen2 models produce garbled output (repeated @ token) #21602

@c0d3rman

Description

@c0d3rman

Description

Qwen2 models produce garbled output (repeated @ / token ID 31) when using the ggml WebGPU backend in the browser. Other architectures (TinyLlama/Llama) work correctly on the same setup.

Environment

  • Browser: Chrome 146, Dia (Chromium-based) — same result on both
  • GPU: Apple Metal-3 (M-series Mac)
  • WebGPU adapter: vendor: "apple", arch: "metal-3", features include shader-f16, subgroups
  • Wllama fork: reeselevine/wllama master branch (PR Add disk space requirements to README.md #201 to ngxson/wllama)
  • JSPI: Available and used

Models tested

Model GGUF Output
TinyLlama-1.1B-Chat Q4_K_M TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF ✅ Coherent
Qwen2.5-1.5B-Instruct Q4_K_M Qwen/Qwen2.5-1.5B-Instruct-GGUF ❌ Repeats @
Qwen2.5-1.5B-Instruct Q8_0 Custom fine-tune ❌ Repeats @
Qwen2.5-1.5B-Instruct Q4_K_M Custom fine-tune ❌ Repeats @

All Qwen2 models produce identical garbage. The same GGUF files work correctly on CPU (WASM-only wllama) and local mlx-lm inference.

Suspected cause

Qwen2-1.5B has dimensions that differ from Llama:

  • num_attention_heads: 12 (not a power of 2)
  • num_key_value_heads: 2 (GQA ratio 6:1)
  • hidden_size: 1536 (not a power of 2)
  • intermediate_size: 8960
  • rope_freq_base: 1000000

TinyLlama has num_attention_heads: 32, num_key_value_heads: 4 (GQA 8:1), hidden_size: 2048 — all power-of-2 dimensions. This suggests the WebGPU matmul or attention shaders may have an issue with non-power-of-2 head counts or hidden dimensions.

Steps to reproduce

  1. Build wllama with ggml WebGPU (reeselevine/wllama master branch)
  2. Load any Qwen2 GGUF model with preferWebGPU: true
  3. Generate text — output will be repeated @ characters

Expected behavior

Coherent text output matching CPU inference.

cc @reeselevine

Edit: my coding agent posted this during a debugging run without asking me 😬 Feel free to ignore if irrelevant

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions