perf: sort vocab before merge parsing + rebuild WASM with ASYNCIFY#22
Merged
Conversation
Two changes: 1. Move sorted_indices build before GGUF BPE merge parsing in both tq_tokenizer.c and quant.h. str_lookup() during merge parsing was falling back to O(n) linear scan because sorted_indices wasn't built yet. For Qwen3 (248K vocab × 50K merges × 3 lookups) this was ~10 s of init time. Now uses binary search: ~100 ms. 2. Rebuild quant.js (72K) and quant.wasm (256K) with -sASYNCIFY. The previous binaries were compiled before the ASYNCIFY flags were added to build.sh, so wasm_generate_async() didn't exist and the JS fallback ran the synchronous path (blocking the browser event loop, all tokens appearing at once). The new binary contains asyncify runtime + emscripten_sleep, enabling real-time per-token streaming in the browser demo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two fixes that together complete the WASM demo improvements from PR #20 and #21:
1. Merge parsing performance: ~10 s → ~100 ms
str_lookup()during GGUF BPE merge parsing was using O(n) linear scan becausesorted_indiceswas built after the merge loop. For Qwen3 (248K vocab × 50K merges × 3 lookups), this meant ~22 billion string comparisons on every model load.Fix: Move sorted_indices build (qsort) above the merge parsing loop. Now str_lookup uses binary search during merge parsing. Applied to both
tq_tokenizer.candquant.h.2. WASM binary rebuild with ASYNCIFY
PR #20 added
-sASYNCIFYtobuild.shbut never recompiled the binaries. The deployedquant.js/quant.wasmwere from a pre-ASYNCIFY build, sowasm_generate_async()didn't exist and the JS silently fell back to the synchronous path (blocking the event loop, tokens appearing all at once).Fix: Recompiled with
emcc5.0.5 + ASYNCIFY. Verified:strings quant.wasm | grep asyncifyreturns 5 hits,emscripten_sleepreturns 1 hit. Binary grew from 197K → 244K (ASYNCIFY stack overhead).Impact on models (from analysis)
Test plan
strings quant.wasm | grep asyncifyconfirms ASYNCIFY present🤖 Generated with Claude Code