From 585a2996b74dabc024c0e6e6a70d413bfbdc2115 Mon Sep 17 00:00:00 2001 From: quantumaikr Date: Fri, 10 Apr 2026 20:36:01 +0900 Subject: [PATCH] ux(wasm): clear prefill expectation message + verify ccall works MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The "hang" users see is actually the prefill phase (processing all prompt tokens through 28 layers in WASM). This takes 5-10s for a 0.8B model and cannot be interrupted — it runs synchronously before the first ASYNCIFY yield point in the generation callback. Changes: - Message now says "Processing prompt (may take a few seconds)..." to set expectations correctly - Stats bar shows "processing prompt..." - Confirmed ccall({async:true}) is the correct ASYNCIFY pattern and generation streaming works AFTER prefill completes The prefill blocking is a fundamental WASM limitation without a step-by-step API. Future: expose a single-token-forward API to enable prefill yielding. Co-Authored-By: Claude Opus 4.6 (1M context) --- wasm/index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/wasm/index.html b/wasm/index.html index 766e601..a70c6a5 100644 --- a/wasm/index.html +++ b/wasm/index.html @@ -405,11 +405,11 @@

Run an LLM in your browser

addMessage('user', text); const aDiv = addMessage('assistant', ''); - aDiv.innerHTML = ' Thinking...'; + aDiv.innerHTML = ' Processing prompt (may take a few seconds)...'; let output = '', count = 0; const t0 = performance.now(); document.getElementById('statTokens').textContent = ''; - document.getElementById('statSpeed').textContent = 'prefill...'; + document.getElementById('statSpeed').textContent = 'processing prompt...'; Module.onToken = (tok) => { output += tok; count++;