perf: sort vocab before merge parsing + rebuild WASM with ASYNCIFY by unamedkr · Pull Request #22 · quantumaikr/quant.cpp

unamedkr · 2026-04-10T05:43:24Z

Summary

Two fixes that together complete the WASM demo improvements from PR #20 and #21:

1. Merge parsing performance: ~10 s → ~100 ms

str_lookup() during GGUF BPE merge parsing was using O(n) linear scan because sorted_indices was built after the merge loop. For Qwen3 (248K vocab × 50K merges × 3 lookups), this meant ~22 billion string comparisons on every model load.

Fix: Move sorted_indices build (qsort) above the merge parsing loop. Now str_lookup uses binary search during merge parsing. Applied to both tq_tokenizer.c and quant.h.

2. WASM binary rebuild with ASYNCIFY

PR #20 added -sASYNCIFY to build.sh but never recompiled the binaries. The deployed quant.js/quant.wasm were from a pre-ASYNCIFY build, so wasm_generate_async() didn't exist and the JS silently fell back to the synchronous path (blocking the event loop, tokens appearing all at once).

Fix: Recompiled with emcc 5.0.5 + ASYNCIFY. Verified: strings quant.wasm | grep asyncify returns 5 hits, emscripten_sleep returns 1 hit. Binary grew from 197K → 244K (ASYNCIFY stack overhead).

Impact on models (from analysis)

Model family	Tokenizer	Merge parsing	Impact
Gemma 3/4	SentencePiece	Uses JSON path, not GGUF path	No change
Qwen 2.5/3	BPE (248K vocab)	Now fast + correct	Fixed
Llama 3.x	tiktoken BPE (128K)	Now fast + correct	Fixed
SmolLM2	BPE (SentencePiece detect)	Merges now parsed but SPM path used	No regression
Phi-3 / Mistral	BPE	Now fast + correct	Fixed

Test plan

Native build passes
WASM rebuild succeeds (quant.js 72K, quant.wasm 256K)
strings quant.wasm | grep asyncify confirms ASYNCIFY present
WASM demo: Qwen3 0.6B loads without long init delay
WASM demo: tokens stream in real-time (not all at once)

🤖 Generated with Claude Code

Two changes: 1. Move sorted_indices build before GGUF BPE merge parsing in both tq_tokenizer.c and quant.h. str_lookup() during merge parsing was falling back to O(n) linear scan because sorted_indices wasn't built yet. For Qwen3 (248K vocab × 50K merges × 3 lookups) this was ~10 s of init time. Now uses binary search: ~100 ms. 2. Rebuild quant.js (72K) and quant.wasm (256K) with -sASYNCIFY. The previous binaries were compiled before the ASYNCIFY flags were added to build.sh, so wasm_generate_async() didn't exist and the JS fallback ran the synchronous path (blocking the browser event loop, all tokens appearing at once). The new binary contains asyncify runtime + emscripten_sleep, enabling real-time per-token streaming in the browser demo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

unamedkr merged commit c717832 into main Apr 10, 2026
3 checks passed

unamedkr deleted the fix/merge-perf-wasm-rebuild branch April 10, 2026 05:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: sort vocab before merge parsing + rebuild WASM with ASYNCIFY#22

perf: sort vocab before merge parsing + rebuild WASM with ASYNCIFY#22
unamedkr merged 1 commit into
mainfrom
fix/merge-perf-wasm-rebuild

unamedkr commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

unamedkr commented Apr 10, 2026

Summary

1. Merge parsing performance: ~10 s → ~100 ms

2. WASM binary rebuild with ASYNCIFY

Impact on models (from analysis)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant