Skip to content

OCR: auto-pick paddle/tesseract by device, with visible UI switch#23

Open
tautme wants to merge 1 commit into
mainfrom
claude/ocr-auto-pick
Open

OCR: auto-pick paddle/tesseract by device, with visible UI switch#23
tautme wants to merge 1 commit into
mainfrom
claude/ocr-auto-pick

Conversation

@tautme

@tautme tautme commented May 31, 2026

Copy link
Copy Markdown
Owner

Why

Phones split sharply on OCR perf: iOS Safari before 18 has no WebGPU and paddle on single-threaded WASM there is painfully slow; tesseract is the honest default. With WebGPU available (Android Chrome, recent desktops, iOS 18+), paddle wins on accuracy. We can choose at boot without asking the user — and let the user override when they disagree.

This is the "hybrid" we agreed on from the device-capability discussion: smart default + visible switch + honest labels.

What the user sees

A compact line at the top of the contribute upload card:

OCR: Paddle [auto] · 2.4s last run · [Switch to Tesseract]
  • auto badge appears only when the boot cascade picked it for you. Disappears once you switch (you're "Using" it, not "Auto").
  • 2.4s last run appears only after the first OCR completes — honest measured time, per-provider, persisted across sessions in localStorage under obm.ocr.timings.
  • Switch to X is a ~40 px tap target. Click → persists the new choice (localStorage.obm.ocr) and rewrites any ?ocr= already in the URL so a shared link stays in sync.

What the boot does

Cascade, highest priority first:

  1. ?ocr=<name> URL param — pinned for the session, persisted.
  2. localStorage previous choice — from a prior manual switch.
  3. navigator.gpu auto-pick — present → paddle, absent → tesseract.

Logged at load: OCR: auto-picked tesseract (no WebGPU).

I considered timing a real benchmark inference at boot but skipped it: ~500 ms–1 s cost for a signal that's only marginally better than the binary WebGPU check at the device classes we actually see. KISS.

New src/ocr/index.js exports

Purpose
getActiveOcr() { name, source } for the UI indicator
getOcrTimings() { paddle?: ms, tesseract?: ms } — last measured run per provider
setOcrProviderByName(name, { persist, source }) Wire to the UI's switch button
resetOcrOverride() Clear URL + storage + re-pick

Pre-existing setOcrProvider(fn) and splitOcrIntoTitles() unchanged.

Backward compatibility

  • ?ocr=paddle / ?ocr=tesseract / ?ocr=default URL flags from PR Clarify OCR flag scope in README #17 still work, with ?ocr=default now meaning "reset to auto-pick."
  • The paddle → tesseract fallback chain stays in the module as the resort for setOcrProvider(null); the boot cascade never actually selects it, but the escape hatch is intact.
  • Existing tests untouched — splitOcrIntoTitles behavior identical; the 29 tests still pass.

Smartphone considerations baked in

  • Picker is a single horizontal line that wraps on narrow screens (flex-wrap); the label and the switch button each have their own row when width is tight.
  • 40 px min-height on the switch button (Apple HIG floor for secondary actions is 32; we're generous).
  • No hover affordances — the picker reads the same on touch and mouse.
  • No tooltips or popovers — the labels are honest enough not to need them.

Verification

  • npm run lint — clean
  • npm test — 29 / 29 pass
  • npm run format:check — clean
  • npm run build — clean
  • Manually verified in this env that the boot log fires (test output shows OCR: auto-picked tesseract (no WebGPU).).
  • Not verified on a real phone — needs your testing on a smartphone after merge. The behaviors to spot-check on-device:
    • iOS Safari (any version): boot log says "no WebGPU", picker shows "Tesseract auto".
    • Android Chrome: boot log says "WebGPU available", picker shows "Paddle auto".
    • Tap switch → picker updates inline, the next photo uses the new provider, localStorage.obm.ocr reflects it.
    • Reload: picker shows "Using X" (no more auto tag), respecting the stored choice.

Files

  • src/ocr/index.js — refactor + new exports (~280 lines, ~80 changed)
  • src/pages/contribute.js — picker render + wire + post-OCR refresh (~70 lines added)
  • src/styles/contribute.css — picker styling (~54 lines added)
  • src/ocr/README.md — documents the cascade

https://claude.ai/code/session_01AmQ9QPAF1yskmQs7wSnZYc


Generated by Claude Code

Phones split sharply on OCR perf: iOS Safari before 18 has no WebGPU,
and paddle on single-threaded WASM there is painfully slow; tesseract
is the honest default. With WebGPU available (Android Chrome, recent
desktops, iOS 18+), paddle wins on accuracy. We can choose at boot
without asking the user.

src/ocr/index.js: refactored boot to a three-step cascade — URL ?ocr=
> localStorage > navigator.gpu auto-pick — and exposes getActiveOcr(),
setOcrProviderByName(name, {persist}), getOcrTimings(), and
resetOcrOverride() for the UI. extractTitles() now records measured
per-provider runtime to localStorage under obm.ocr.timings so the
picker can show "2.4s last run" without holding it in memory. Manual
switches persist AND rewrite any ?ocr= in the URL so a shared link
stays in sync with the user's choice. The "default" fallback chain
(paddle→tesseract on primary failure) is kept as the resort the
fallback the boot cascade never actually selects — preserved as the
escape hatch for setOcrProvider(null).

src/pages/contribute.js: compact picker at the top of the upload
card — `OCR: **Paddle** [auto] · 2.4s last run · [Switch to
Tesseract]`. Phone-friendly: 40px tap target on the switch button.
Re-renders inline after each photo's OCR completes so the timing
updates without a full re-render. Boot log: `OCR: auto-picked
tesseract (no WebGPU).`

src/styles/contribute.css + src/ocr/README.md: styling for the picker
and docs for the new cascade. The existing ?ocr= URL-flag behavior
is preserved.

https://claude.ai/code/session_01AmQ9QPAF1yskmQs7wSnZYc

@tautme tautme left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to try on branch first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants