Skip to content

fix(ai): correct on-device model size labels#86

Merged
sumitsahoo merged 2 commits into
mainfrom
dev
May 17, 2026
Merged

fix(ai): correct on-device model size labels#86
sumitsahoo merged 2 commits into
mainfrom
dev

Conversation

@sumitsahoo
Copy link
Copy Markdown
Collaborator

Summary

Promotes the size-label fix from dev to main. Single change in flight: registry approxSizeBytes corrected against the actual HuggingFace file listings so the consent gate's "X GB total" matches what users actually download.

  • Compact tier: 1.13 GB total → displays "≈ 1.1 GB" (was implicitly 1.5 GB)
  • Quality tier: 1.88 GB total → displays "≈ 1.9 GB" (was 1.8 GB)
  • LFM2.5-1.2B (q4): 1.2 GB → 810 MB
  • LFM2-2.6B (q4f16): 1.5 GB → 1.55 GB
  • EmbeddingGemma q8: 309 MB → 320 MB (the 26 MB Gemma SentencePiece tokenizer was previously missing from the estimate)
  • Embed peak RAM: 400 MB → 500 MB (int8 dequant overhead)

Prose updated in README, docs/local-ai.md, tool-registry, ChatModelPicker, AiModelDetailsModal, ai-runtime, useRagModels, and the AskPdf timing-weight comment to match.

Came in via #85.

Test plan

  • vp check passes (format + lint + type-check)
  • vp test passes (unit suite)
  • Manual smoke on production preview: open Ask PDF on a fresh browser profile and confirm the gate's aggregate label matches the picker's per-tile sum on each tier

Registry `approxSizeBytes` values had drifted from the real on-disk
weights — the Quality tier was showing "≈ 1.8 GB total" in the gate
when the actual download is ~1.88 GB, and the Compact tier overstated
by ~390 MB on the chat model alone. Verified against HuggingFace file
listings for each pinned dtype:

  - LFM2.5-1.2B (q4):  1.2 GB → 810 MB (model_q4.onnx_data is 850 MB)
  - LFM2-2.6B (q4f16): 1.5 GB → 1.55 GB
  - EmbeddingGemma q8: 309 MB → 320 MB (was missing the 26 MB Gemma
                                       SentencePiece tokenizer)
  - Embed peak RAM:    400 MB → 500 MB (int8 dequant overhead)

Aggregate now reads "≈ 1.1 GB total" on Compact and "≈ 1.9 GB total"
on Quality. Prose updated in README, docs/local-ai.md, tool-registry,
ChatModelPicker, AiModelDetailsModal, ai-runtime, useRagModels, and
the AskPdf timing-weight comment.
feat(ai): on-device hybrid RAG for Ask your PDF
Copilot AI review requested due to automatic review settings May 17, 2026 12:46
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
cloakpdf db278fa Commit Preview URL

Branch Preview URL
May 17 2026, 12:45 PM

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Ask PDF on-device AI model size metadata and related prose so consent/download labels better reflect the actual model bundle sizes.

Changes:

  • Corrects approxSizeBytes and embed peak RAM metadata in the AI model registry.
  • Updates user-facing README/docs and inline comments for Compact/Quality bundle sizes.
  • Refreshes picker/modal/tool copy to match the revised model footprints.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/utils/ai-models.ts Updates AI registry model size/RAM metadata and explanatory comments.
src/utils/ai-runtime.ts Updates cache eviction size documentation.
src/hooks/useRagModels.ts Updates full-evict size documentation.
src/tools/AskPdf.tsx Updates indexing progress comment for embedder size.
src/config/tool-registry.ts Updates Ask PDF tool registry comments with revised tier/bundle sizes.
src/components/ChatModelPicker.tsx Updates picker documentation for chat tier sizes.
src/components/AiModelDetailsModal.tsx Updates modal deletion/storage documentation.
README.md Updates public Ask PDF feature and Local AI size descriptions.
docs/local-ai.md Updates implementation docs and cache diagram with revised model bundle sizes.

Comment thread src/utils/ai-runtime.ts
* Evict the Transformers.js model bytes from the browser's
* CacheStorage. Frees ~1.5 GB of disk for the current AI bundle
* (chat + embed + rerank) and forces a fresh download on next use.
* CacheStorage. Frees roughly 1.2 GB on the Compact tier / 1.9 GB on
Comment thread src/hooks/useRagModels.ts
* `cloakpdf:ai-model-ready:*` flag so the consent dialog re-
* appears on next use. Frees ~1.5 GB of disk for the current AI
* bundle; the user pays a full re-download next time they touch
* appears on next use. Frees roughly 1.2 GB (Compact) / 1.9 GB
* (release RAM, keep the downloaded weights cached on disk so the
* next use warm-loads in seconds) and a destructive "Delete cached
* models" (also evict the CacheStorage bytes, ~1.5 GB).
* models" (also evict the CacheStorage bytes — roughly 1.2 GB on the
Comment thread src/utils/ai-models.ts
// digits/emails instead of copying from the retrieved chunk.
// Sticking to 1.2B keeps the discipline guarantee.
approxSizeBytes: Math.round(1.2 * 1024 * 1024 * 1024),
approxSizeBytes: Math.round(810 * 1024 * 1024),
@sumitsahoo sumitsahoo merged commit 977a534 into main May 17, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants