From 585a2996b74dabc024c0e6e6a70d413bfbdc2115 Mon Sep 17 00:00:00 2001
From: quantumaikr <hi@quantumai.kr>
Date: Fri, 10 Apr 2026 20:36:01 +0900
Subject: [PATCH] ux(wasm): clear prefill expectation message + verify ccall
 works
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The "hang" users see is actually the prefill phase (processing all
prompt tokens through 28 layers in WASM). This takes 5-10s for a
0.8B model and cannot be interrupted — it runs synchronously before
the first ASYNCIFY yield point in the generation callback.

Changes:
- Message now says "Processing prompt (may take a few seconds)..."
  to set expectations correctly
- Stats bar shows "processing prompt..."
- Confirmed ccall({async:true}) is the correct ASYNCIFY pattern
  and generation streaming works AFTER prefill completes

The prefill blocking is a fundamental WASM limitation without a
step-by-step API. Future: expose a single-token-forward API to
enable prefill yielding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 wasm/index.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/wasm/index.html b/wasm/index.html
index 766e601..a70c6a5 100644
--- a/wasm/index.html
+++ b/wasm/index.html
@@ -405,11 +405,11 @@ <h2>Run an <span>LLM</span> in your browser</h2>
 
     addMessage('user', text);
     const aDiv = addMessage('assistant', '');
-    aDiv.innerHTML = '<span class="thinking"><span class="spinner"></span> Thinking...</span>';
+    aDiv.innerHTML = '<span class="thinking"><span class="spinner"></span> Processing prompt (may take a few seconds)...</span>';
     let output = '', count = 0;
     const t0 = performance.now();
     document.getElementById('statTokens').textContent = '';
-    document.getElementById('statSpeed').textContent = 'prefill...';
+    document.getElementById('statSpeed').textContent = 'processing prompt...';
 
     Module.onToken = (tok) => {
         output += tok; count++;