Skip to content

feat: Browser-Native Screen Rewind with Local OCR and IndexedDB Timeline #7628

@thesohamdatta

Description

@thesohamdatta

Feature: Browser-Native Screen Rewind

Summary

A privacy-first, local-first screen capture + OCR timeline that runs entirely in the browser — no cloud, no uploads, no server dependency for capture.

Problem

Users want a 'rewind' capability to recall what was on their screen — similar to Windows Recall or Rewind.ai — but without trusting a third party with their screen data.

Proposed Solution

Architecture:

  • Browser getDisplayMedia() API captures the screen at a configurable interval (e.g. every 2s)
  • A 32×32 grayscale pixel diff heuristic skips unchanged frames (saves ~90% of frames)
  • Changed frames are saved as WebP blobs into IndexedDB (omi_timeline DB) — fully local, never uploaded
  • A background Web Worker runs Tesseract.js OCR on each frame
  • Extracted text is passed through a local PII redaction layer (strips credit cards, emails, phone numbers, API keys via regex) before storing
  • A HeuristicSyncService detects action items (TODO/need to/must) and focus sessions (30s+ on same domain) and can optionally POST them to the backend

Key Files to Create:

  • web/app/src/lib/ScreenCaptureService.ts — capture loop + diff heuristic
  • web/app/src/lib/IndexedDbClient.ts — local storage with FIFO eviction at 15,000 frames
  • web/app/src/lib/HeuristicSyncService.ts — PII redaction + action item / focus session triggers
  • web/app/src/lib/ocr.worker.ts — Tesseract.js web worker
  • web/app/src/components/recording/ScreenCaptureControls.tsx — UI toggle on /record page
  • web/app/public/tesseract/ — local Tesseract WASM assets

Backend endpoints needed:

  • POST /v1/focus-sessions — receives focus session events
  • POST /v1/action-items — receives detected action items

Privacy Properties

  • Screen frames never leave the device
  • PII is redacted before OCR text is stored
  • User must explicitly grant getDisplayMedia permission every session
  • IndexedDB auto-evicts oldest frames beyond 15,000 cap (~72h of activity)

Acceptance Criteria

  • Screen capture starts/stops from /record page UI
  • Frames captured and stored in IndexedDB (verifiable in DevTools → Application → IndexedDB)
  • OCR text visible in the UI for the last captured frame
  • PII redaction strips test credit card / email inputs before storing
  • Focus sessions and action items POST to backend endpoints
  • FIFO eviction works when frame count exceeds 15,000

Dependencies

  • esseract.js npm package
  • idb npm package
  • Backend: two new POST endpoints (/v1/focus-sessions, /v1/action-items)

Notes

  • lc3py (LC3 audio codec) is not on PyPI — backend startup on Windows requires guarding this import with try/except
  • Deepgram SDK must be pinned to 4.8.1 (v7.x renamed DeepgramClientOptions → DeepgramClientEnvironment)
  • Typesense requires TYPESENSE_API_KEY env var even at module import time — needs a value for local dev startup

Metadata

Metadata

Assignees

No one assigned

    Labels

    p3Priority: Backlog (score <14)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions