Feature: Browser-Native Screen Rewind
Summary
A privacy-first, local-first screen capture + OCR timeline that runs entirely in the browser — no cloud, no uploads, no server dependency for capture.
Problem
Users want a 'rewind' capability to recall what was on their screen — similar to Windows Recall or Rewind.ai — but without trusting a third party with their screen data.
Proposed Solution
Architecture:
- Browser getDisplayMedia() API captures the screen at a configurable interval (e.g. every 2s)
- A 32×32 grayscale pixel diff heuristic skips unchanged frames (saves ~90% of frames)
- Changed frames are saved as WebP blobs into IndexedDB (omi_timeline DB) — fully local, never uploaded
- A background Web Worker runs Tesseract.js OCR on each frame
- Extracted text is passed through a local PII redaction layer (strips credit cards, emails, phone numbers, API keys via regex) before storing
- A HeuristicSyncService detects action items (TODO/need to/must) and focus sessions (30s+ on same domain) and can optionally POST them to the backend
Key Files to Create:
- web/app/src/lib/ScreenCaptureService.ts — capture loop + diff heuristic
- web/app/src/lib/IndexedDbClient.ts — local storage with FIFO eviction at 15,000 frames
- web/app/src/lib/HeuristicSyncService.ts — PII redaction + action item / focus session triggers
- web/app/src/lib/ocr.worker.ts — Tesseract.js web worker
- web/app/src/components/recording/ScreenCaptureControls.tsx — UI toggle on /record page
- web/app/public/tesseract/ — local Tesseract WASM assets
Backend endpoints needed:
- POST /v1/focus-sessions — receives focus session events
- POST /v1/action-items — receives detected action items
Privacy Properties
- Screen frames never leave the device
- PII is redacted before OCR text is stored
- User must explicitly grant getDisplayMedia permission every session
- IndexedDB auto-evicts oldest frames beyond 15,000 cap (~72h of activity)
Acceptance Criteria
Dependencies
- esseract.js npm package
- idb npm package
- Backend: two new POST endpoints (/v1/focus-sessions, /v1/action-items)
Notes
- lc3py (LC3 audio codec) is not on PyPI — backend startup on Windows requires guarding this import with try/except
- Deepgram SDK must be pinned to 4.8.1 (v7.x renamed DeepgramClientOptions → DeepgramClientEnvironment)
- Typesense requires TYPESENSE_API_KEY env var even at module import time — needs a value for local dev startup
Feature: Browser-Native Screen Rewind
Summary
A privacy-first, local-first screen capture + OCR timeline that runs entirely in the browser — no cloud, no uploads, no server dependency for capture.
Problem
Users want a 'rewind' capability to recall what was on their screen — similar to Windows Recall or Rewind.ai — but without trusting a third party with their screen data.
Proposed Solution
Architecture:
Key Files to Create:
Backend endpoints needed:
Privacy Properties
Acceptance Criteria
Dependencies
Notes