feat(ai): on-device hybrid RAG for Ask your PDF#85
Merged
Conversation
Registry `approxSizeBytes` values had drifted from the real on-disk
weights — the Quality tier was showing "≈ 1.8 GB total" in the gate
when the actual download is ~1.88 GB, and the Compact tier overstated
by ~390 MB on the chat model alone. Verified against HuggingFace file
listings for each pinned dtype:
- LFM2.5-1.2B (q4): 1.2 GB → 810 MB (model_q4.onnx_data is 850 MB)
- LFM2-2.6B (q4f16): 1.5 GB → 1.55 GB
- EmbeddingGemma q8: 309 MB → 320 MB (was missing the 26 MB Gemma
SentencePiece tokenizer)
- Embed peak RAM: 400 MB → 500 MB (int8 dequant overhead)
Aggregate now reads "≈ 1.1 GB total" on Compact and "≈ 1.9 GB total"
on Quality. Prose updated in README, docs/local-ai.md, tool-registry,
ChatModelPicker, AiModelDetailsModal, ai-runtime, useRagModels, and
the AskPdf timing-weight comment.
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
cloakpdf | cb7da66 | Commit Preview URL Branch Preview URL |
May 17 2026, 12:44 PM |
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships Ask your PDF — a fully on-device RAG pipeline that lets users chat with any PDF without uploading bytes anywhere. Everything (chunking, embedding, retrieval, reranking, chat) runs in the browser via Transformers.js + WebGPU.
approxSizeBytesagainst actual HuggingFace file listings — the Quality bundle reads "≈ 1.9 GB total" (was "1.8 GB" / actual 1.88 GB) and Compact reads "≈ 1.1 GB" (was implicitly 1.5 GB / actual 1.13 GB). Embed peak RAM bumped 400 → 500 MB and now includes the 26 MB Gemma SentencePiece tokenizer that was previously missed.Deep-dive in docs/local-ai.md.
Test plan
vp checkpasses (format + lint + type-check)vp testpasses (unit suite — 126 tests)pnpm test:e2eruns Ask PDF end-to-end with real model weights (résumé fixture)pnpm exec tsx tests/e2e/retrieval-probe.tsdumps per-retriever hits + relevance scores per question for tuning