Demo Upgrade Plan: Real Semantic Search with Transformers.js

Current State

20 synthetic docs about altor-vec itself, 64-dim term-frequency vectors
8.6 KB demo-index.bin in public/
5 pre-computed query vectors — users can only click chips or type text that keyword-matches to a pre-computed vector
No real semantic understanding — findClosestQuery() does keyword overlap to pick the closest pre-computed vector

Target State

10,000 real MS MARCO passages with 384-dim embeddings
Live query embedding via Transformers.js (Xenova/all-MiniLM-L6-v2) in a Web Worker
Users type ANY query and get relevant semantic search results in <1ms (search) + ~150ms (embedding)

Key Research Findings

1. GitHub Pages can serve the large files

GitHub Pages repo limit: 1GB, individual file soft limit: 100MB
Our files: index.bin (17MB) + metadata.json (3.5MB) = 20.5MB total → well within limits
GitHub Pages serves with proper Content-Type and supports gzip/brotli compression
The deploy workflow (actions/upload-pages-artifact + actions/deploy-pages) handles arbitrary files in dist/

2. Transformers.js approach: CDN dynamic import (recommended)

Option A: npm install — adds ~5MB to node_modules, but the ONNX model (~23MB) still downloads at runtime from HuggingFace CDN. Increases bundle size slightly.

Option B: CDN dynamic import (recommended) — import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers'. Zero bundle impact. Model downloads from HuggingFace CDN on first use and is cached by the browser.

Decision: Use npm install (@huggingface/transformers) for TypeScript types and build-time safety, but the ONNX model still loads from HuggingFace CDN at runtime (~23MB model_quantized.onnx). This is how Transformers.js is designed to work.

3. ONNX model download UX

The quantized model (model_quantized.onnx) is 23MB. On first visit:

Fast connection (50 Mbps): ~4 seconds
Moderate (10 Mbps): ~18 seconds
Slow (2 Mbps): ~90 seconds

The model is cached by the browser after first download (HuggingFace sets proper cache headers). Subsequent visits: instant.

UX strategy:

Show a multi-phase progress indicator: "Loading WASM engine..." → "Downloading embedding model (23MB)..." → "Ready"
Keep the suggested query chips functional immediately (with pre-computed vectors) so users can try the demo while the model loads
Once model is ready, switch to live embedding for all queries
Show "Embedding your query..." briefly when user types custom query

4. Keep suggested query chips — YES

They provide instant gratification while the model downloads
They guide users who don't know what to search for
With real MS MARCO data, we need good example queries. Suggested chips:
1. "what is a bank transit number" (finance)
2. "how does photosynthesis work" (science)
3. "what causes earthquakes" (geology)
4. "define machine learning" (tech)
5. "who invented the telephone" (history)

These should have pre-computed 384-dim vectors so they work instantly even before the model loads.

5. Web Worker for embedding — YES

Transformers.js model inference takes ~100-200ms. Running on main thread would cause jank. Use a dedicated Web Worker that:

Loads @huggingface/transformers pipeline
Accepts text queries via postMessage
Returns 384-dim Float32Array embeddings
Reports download progress back to main thread

6. Metadata format

Current metadata.json: [{"id": 0, "text": "..."}, ...] — 10K passages, avg 337 chars each, 3.5MB total.

For display, we want a title + snippet. MS MARCO passages don't have titles, so we'll:

Use the first ~60 chars as a "title" (or first sentence)
Use the full text as the snippet
Generate this at build time in a new script, outputting a compact format

Implementation Plan

Files to Modify

1. `src/demo-data.ts` — Rewrite

Remove: All 20 synthetic docs, all 64-dim pre-computed vectors Replace with:

export interface DemoDoc {
  title: string;   // First sentence or first 80 chars
  snippet: string; // Full passage text
}

export interface DemoQuery {
  query: string;
  vector: number[]; // 384-dim pre-computed vector
}

// Will be generated by new build script
export const DEMO_QUERIES: DemoQuery[] = [...]; // 5 queries with 384-dim vectors
// DEMO_DOCS removed — loaded dynamically from metadata.json

2. `src/App.tsx` — Modify `LiveSearchDemo` component (~lines 685-937)

Changes:

Remove DEMO_DOCS import (docs come from metadata.json loaded at runtime)
Add state for: modelState ("idle" | "downloading" | "ready" | "error"), modelProgress, passages (loaded from metadata.json)
Add Web Worker ref for Transformers.js embedding
Change runQuery to:
1. If pre-computed vector available (chip click) → use directly
2. Else if model ready → send to worker, await embedding, then search
3. Else → show "Model still loading..." message
Update status bar to show model download progress
Update search footer to show "embedding: Xms + search: Xms"
Add debounced auto-search on typing (300ms debounce) when model is ready

Keep unchanged:

Overall section structure, styling, animations
Reveal components
Engine loading logic (init + fetch index)
Result rendering cards
Search input UI

3. `src/embed-worker.ts` — NEW file (Web Worker)

// Web Worker for Transformers.js embedding
import { pipeline, env } from '@huggingface/transformers';

let embedder: any = null;

self.onmessage = async (e) => {
  if (e.data.type === 'init') {
    env.allowLocalModels = false;
    embedder = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2', {
      quantized: true,
      progress_callback: (progress: any) => {
        self.postMessage({ type: 'progress', ...progress });
      }
    });
    self.postMessage({ type: 'ready' });
  }

  if (e.data.type === 'embed') {
    const output = await embedder(e.data.text, { pooling: 'mean', normalize: true });
    self.postMessage({ type: 'embedding', vector: Array.from(output.data), id: e.data.id });
  }
};

4. `scripts/generate-demo-data.mjs` — Rewrite

New script that:

Reads data/metadata.json (10K passages)
Generates title/snippet pairs for each passage
Uses Transformers.js (Node.js) to compute embeddings for 5 demo queries
Outputs:
- public/metadata.json — compact [{title, snippet}, ...] (id = array index)
- src/demo-data.ts — only DEMO_QUERIES with pre-computed 384-dim vectors
Copies data/index.bin → public/demo-index.bin

5. `vite.config.ts` — Minor update

Add worker config for proper Web Worker bundling:

worker: {
  format: 'es'
}

6. `package.json` — Add dependency

"@huggingface/transformers": "^3.4.0"

Files to Add to `public/`

File	Size	Source
`demo-index.bin`	17MB	Copy from `data/index.bin` (replaces existing 8.6KB)
`metadata.json`	~2-3MB	Generated from `data/metadata.json` (title+snippet only)

Files to Remove

Nothing removed, but public/demo-index.bin is replaced (8.6KB → 17MB)
scripts/generate-demo-data.mjs is rewritten (old toy version replaced)

UX Flow with Loading States

Phase 1: Page Load (0-2s)

Status: ● Loading WASM engine...
[Search input disabled]
[Query chips disabled]

Phase 2: WASM Ready, Model Downloading (2-20s)

Status: ● WASM engine running — 10,000 vectors
        ⟳ Downloading embedding model... 45%
[Query chips ENABLED — use pre-computed vectors]
[Search input ENABLED for chips, typing shows "Model loading..." hint]

Phase 3: Everything Ready

Status: ● WASM engine running — 10,000 vectors
        ● Embedding model ready
[Query chips ENABLED]
[Search input ENABLED — live embedding on type/enter]
[Auto-search as you type with 300ms debounce]

Query Execution Flow

User types query → 300ms debounce
Show "Embedding..." spinner in input
Worker returns vector (~150ms)
WASM search (~0.5ms)
Display results with "embed: 142ms + search: 0.48ms" in footer

Fallback Strategy

If Transformers.js fails to load:

Query chips still work (pre-computed vectors)
Show message: "Live embedding unavailable. Try a suggested query above."
Log error for debugging

If model download is too slow:

Show progress bar with percentage
After 30s without progress, show: "Download taking longer than expected. The model is cached after first load."

If WASM fails:

Existing error state: "Failed to load engine. Try refreshing."
This is already handled

If metadata.json fails to load:

Fall back to showing just doc IDs and distances (no title/snippet)
Show message: "Could not load passage text"

Build & Deploy Changes

Local dev workflow:

npm install @huggingface/transformers
Run new scripts/generate-demo-data.mjs to generate public/metadata.json + src/demo-data.ts + copy public/demo-index.bin
npm run dev — works as before

CI/CD (`.github/workflows/deploy.yml`):

No changes needed! The large files go in public/, Vite copies them to dist/, and upload-pages-artifact handles them
The ONNX model is NOT in our repo — it's fetched from HuggingFace CDN at runtime

Estimated bundle impact:

@huggingface/transformers JS glue: ~200KB (tree-shaken)
Worker file: ~5KB
Public assets: +17MB (index) + ~2.5MB (metadata) = ~19.5MB
ONNX model: 23MB downloaded from CDN (not in our bundle)

Summary of Changes

Component	Current	Upgraded
Documents	20 synthetic, 64-dim	10,000 MS MARCO, 384-dim
Index size	8.6 KB	17 MB
Query embedding	Keyword match to 5 pre-computed vectors	Live Transformers.js in Web Worker
Model	None	all-MiniLM-L6-v2 quantized (23MB, cached)
Search latency	<0.1ms (toy data)	<1ms (real HNSW)
User can search for	5 pre-defined topics	Anything
Privacy	100% client-side	100% client-side (model from CDN, cached)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo Upgrade Plan: Real Semantic Search with Transformers.js

Current State

Target State

Key Research Findings

1. GitHub Pages can serve the large files

2. Transformers.js approach: CDN dynamic import (recommended)

3. ONNX model download UX

4. Keep suggested query chips — YES

5. Web Worker for embedding — YES

6. Metadata format

Implementation Plan

Files to Modify

1. `src/demo-data.ts` — Rewrite

2. `src/App.tsx` — Modify `LiveSearchDemo` component (~lines 685-937)

3. `src/embed-worker.ts` — NEW file (Web Worker)

4. `scripts/generate-demo-data.mjs` — Rewrite

5. `vite.config.ts` — Minor update

6. `package.json` — Add dependency

Files to Add to `public/`

Files to Remove

UX Flow with Loading States

Phase 1: Page Load (0-2s)

Phase 2: WASM Ready, Model Downloading (2-20s)

Phase 3: Everything Ready

Query Execution Flow

Fallback Strategy

If Transformers.js fails to load:

If model download is too slow:

If WASM fails:

If metadata.json fails to load:

Build & Deploy Changes

Local dev workflow:

CI/CD (`.github/workflows/deploy.yml`):

Estimated bundle impact:

Summary of Changes

FilesExpand file tree

UPGRADE_PLAN.md

Latest commit

History

UPGRADE_PLAN.md

File metadata and controls

Demo Upgrade Plan: Real Semantic Search with Transformers.js

Current State

Target State

Key Research Findings

1. GitHub Pages can serve the large files

2. Transformers.js approach: CDN dynamic import (recommended)

3. ONNX model download UX

4. Keep suggested query chips — YES

5. Web Worker for embedding — YES

6. Metadata format

Implementation Plan

Files to Modify

1. src/demo-data.ts — Rewrite

2. src/App.tsx — Modify LiveSearchDemo component (~lines 685-937)

3. src/embed-worker.ts — NEW file (Web Worker)

4. scripts/generate-demo-data.mjs — Rewrite

5. vite.config.ts — Minor update

6. package.json — Add dependency

Files to Add to public/

Files to Remove

UX Flow with Loading States

Phase 1: Page Load (0-2s)

Phase 2: WASM Ready, Model Downloading (2-20s)

Phase 3: Everything Ready

Query Execution Flow

Fallback Strategy

If Transformers.js fails to load:

If model download is too slow:

If WASM fails:

If metadata.json fails to load:

Build & Deploy Changes

Local dev workflow:

CI/CD (.github/workflows/deploy.yml):

Estimated bundle impact:

Summary of Changes

1. `src/demo-data.ts` — Rewrite

2. `src/App.tsx` — Modify `LiveSearchDemo` component (~lines 685-937)

3. `src/embed-worker.ts` — NEW file (Web Worker)

4. `scripts/generate-demo-data.mjs` — Rewrite

5. `vite.config.ts` — Minor update

6. `package.json` — Add dependency

Files to Add to `public/`

CI/CD (`.github/workflows/deploy.yml`):