feat(ai): on-device hybrid RAG for Ask your PDF by sumitsahoo · Pull Request #85 · cloakyard/cloakpdf

sumitsahoo · 2026-05-17T12:44:17Z

Summary

Ships Ask your PDF — a fully on-device RAG pipeline that lets users chat with any PDF without uploading bytes anywhere. Everything (chunking, embedding, retrieval, reranking, chat) runs in the browser via Transformers.js + WebGPU.

Two chat tiers (user-selectable, persisted to localStorage): Compact (LFM2.5-1.2B, ~810 MB) and Quality (LFM2-2.6B, ~1.55 GB). Shared embedder (EmbeddingGemma-300M) and cross-encoder reranker (MS MARCO MiniLM-L-6-v2).
Hybrid retrieval: BM25 + dense fused via Reciprocal Rank Fusion, anchor-chunk merge for identity questions, cosine relevance gate to refuse off-topic queries before they reach the LLM.
Deterministic fast-paths for verbatim contact extraction, doc-type identification, and topic-absence refusal — defends against small-model failure modes (mis-extracted digits, hallucinated topics).
Cache + consent UX: model weights persisted in CacheStorage keyed by SHA-256 of the PDF for instant re-opens, two-step consent dialog with per-model progress, free-RAM vs delete-cache affordances.
Latest commit: corrected the registry's approxSizeBytes against actual HuggingFace file listings — the Quality bundle reads "≈ 1.9 GB total" (was "1.8 GB" / actual 1.88 GB) and Compact reads "≈ 1.1 GB" (was implicitly 1.5 GB / actual 1.13 GB). Embed peak RAM bumped 400 → 500 MB and now includes the 26 MB Gemma SentencePiece tokenizer that was previously missed.

Deep-dive in docs/local-ai.md.

Test plan

vp check passes (format + lint + type-check)
vp test passes (unit suite — 126 tests)
pnpm test:e2e runs Ask PDF end-to-end with real model weights (résumé fixture)
pnpm exec tsx tests/e2e/retrieval-probe.ts dumps per-retriever hits + relevance scores per question for tuning
Manual: load Ask PDF on a fresh browser profile — gate shows correct aggregate size; pick each tier and verify per-tile download/RAM labels render from registry
Manual: refresh after first download — warm-load reads from CacheStorage in seconds, no network
Manual: open an encrypted PDF — friendly notice renders instead of crashing

Registry `approxSizeBytes` values had drifted from the real on-disk weights — the Quality tier was showing "≈ 1.8 GB total" in the gate when the actual download is ~1.88 GB, and the Compact tier overstated by ~390 MB on the chat model alone. Verified against HuggingFace file listings for each pinned dtype: - LFM2.5-1.2B (q4): 1.2 GB → 810 MB (model_q4.onnx_data is 850 MB) - LFM2-2.6B (q4f16): 1.5 GB → 1.55 GB - EmbeddingGemma q8: 309 MB → 320 MB (was missing the 26 MB Gemma SentencePiece tokenizer) - Embed peak RAM: 400 MB → 500 MB (int8 dequant overhead) Aggregate now reads "≈ 1.1 GB total" on Compact and "≈ 1.9 GB total" on Quality. Prose updated in README, docs/local-ai.md, tool-registry, ChatModelPicker, AiModelDetailsModal, ai-runtime, useRagModels, and the AskPdf timing-weight comment.

cloudflare-workers-and-pages · 2026-05-17T12:44:22Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	cloakpdf	`cb7da66`	Commit Preview URL Branch Preview URL	May 17 2026, 12:44 PM

sumitsahoo merged commit db278fa into dev May 17, 2026
2 checks passed

sumitsahoo mentioned this pull request May 17, 2026

fix(ai): correct on-device model size labels #86

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ai): on-device hybrid RAG for Ask your PDF#85

feat(ai): on-device hybrid RAG for Ask your PDF#85
sumitsahoo merged 1 commit into
devfrom
feature/ai-local-hybrid-rag

sumitsahoo commented May 17, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sumitsahoo commented May 17, 2026

Summary

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented May 17, 2026

Deploying with Cloudflare Workers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant