OCR: auto-pick paddle/tesseract by device, with visible UI switch#23
Open
tautme wants to merge 1 commit into
Open
OCR: auto-pick paddle/tesseract by device, with visible UI switch#23tautme wants to merge 1 commit into
tautme wants to merge 1 commit into
Conversation
Phones split sharply on OCR perf: iOS Safari before 18 has no WebGPU,
and paddle on single-threaded WASM there is painfully slow; tesseract
is the honest default. With WebGPU available (Android Chrome, recent
desktops, iOS 18+), paddle wins on accuracy. We can choose at boot
without asking the user.
src/ocr/index.js: refactored boot to a three-step cascade — URL ?ocr=
> localStorage > navigator.gpu auto-pick — and exposes getActiveOcr(),
setOcrProviderByName(name, {persist}), getOcrTimings(), and
resetOcrOverride() for the UI. extractTitles() now records measured
per-provider runtime to localStorage under obm.ocr.timings so the
picker can show "2.4s last run" without holding it in memory. Manual
switches persist AND rewrite any ?ocr= in the URL so a shared link
stays in sync with the user's choice. The "default" fallback chain
(paddle→tesseract on primary failure) is kept as the resort the
fallback the boot cascade never actually selects — preserved as the
escape hatch for setOcrProvider(null).
src/pages/contribute.js: compact picker at the top of the upload
card — `OCR: **Paddle** [auto] · 2.4s last run · [Switch to
Tesseract]`. Phone-friendly: 40px tap target on the switch button.
Re-renders inline after each photo's OCR completes so the timing
updates without a full re-render. Boot log: `OCR: auto-picked
tesseract (no WebGPU).`
src/styles/contribute.css + src/ocr/README.md: styling for the picker
and docs for the new cascade. The existing ?ocr= URL-flag behavior
is preserved.
https://claude.ai/code/session_01AmQ9QPAF1yskmQs7wSnZYc
tautme
commented
Jun 11, 2026
tautme
left a comment
Owner
Author
There was a problem hiding this comment.
may need to try on branch first
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Phones split sharply on OCR perf: iOS Safari before 18 has no WebGPU and paddle on single-threaded WASM there is painfully slow; tesseract is the honest default. With WebGPU available (Android Chrome, recent desktops, iOS 18+), paddle wins on accuracy. We can choose at boot without asking the user — and let the user override when they disagree.
This is the "hybrid" we agreed on from the device-capability discussion: smart default + visible switch + honest labels.
What the user sees
A compact line at the top of the contribute upload card:
autobadge appears only when the boot cascade picked it for you. Disappears once you switch (you're "Using" it, not "Auto").2.4s last runappears only after the first OCR completes — honest measured time, per-provider, persisted across sessions inlocalStorageunderobm.ocr.timings.Switch to Xis a ~40 px tap target. Click → persists the new choice (localStorage.obm.ocr) and rewrites any?ocr=already in the URL so a shared link stays in sync.What the boot does
Cascade, highest priority first:
?ocr=<name>URL param — pinned for the session, persisted.localStorageprevious choice — from a prior manual switch.navigator.gpuauto-pick — present → paddle, absent → tesseract.Logged at load:
OCR: auto-picked tesseract (no WebGPU).I considered timing a real benchmark inference at boot but skipped it: ~500 ms–1 s cost for a signal that's only marginally better than the binary WebGPU check at the device classes we actually see. KISS.
New
src/ocr/index.jsexportsgetActiveOcr(){ name, source }for the UI indicatorgetOcrTimings(){ paddle?: ms, tesseract?: ms }— last measured run per providersetOcrProviderByName(name, { persist, source })resetOcrOverride()Pre-existing
setOcrProvider(fn)andsplitOcrIntoTitles()unchanged.Backward compatibility
?ocr=paddle/?ocr=tesseract/?ocr=defaultURL flags from PR Clarify OCR flag scope in README #17 still work, with?ocr=defaultnow meaning "reset to auto-pick."paddle → tesseractfallback chain stays in the module as the resort forsetOcrProvider(null); the boot cascade never actually selects it, but the escape hatch is intact.splitOcrIntoTitlesbehavior identical; the 29 tests still pass.Smartphone considerations baked in
flex-wrap); the label and the switch button each have their own row when width is tight.Verification
npm run lint— cleannpm test— 29 / 29 passnpm run format:check— cleannpm run build— cleanOCR: auto-picked tesseract (no WebGPU).).localStorage.obm.ocrreflects it.autotag), respecting the stored choice.Files
src/ocr/index.js— refactor + new exports (~280 lines, ~80 changed)src/pages/contribute.js— picker render + wire + post-OCR refresh (~70 lines added)src/styles/contribute.css— picker styling (~54 lines added)src/ocr/README.md— documents the cascadehttps://claude.ai/code/session_01AmQ9QPAF1yskmQs7wSnZYc
Generated by Claude Code