Skip to content

feat: IDE context awareness for dictation prompts#326

Draft
gabrielste1n wants to merge 8 commits intomainfrom
ide-context-awareness
Draft

feat: IDE context awareness for dictation prompts#326
gabrielste1n wants to merge 8 commits intomainfrom
ide-context-awareness

Conversation

@gabrielste1n
Copy link
Copy Markdown
Collaborator

@gabrielste1n gabrielste1n commented Feb 26, 2026

What this does

When you press the OpenWhispr hotkey, the app captures what you're working on — the foreground app, window title, current file, project name, open tabs, and a scan of project files — and includes that as context in the system prompt sent to the reasoning model.

Say "refactor this file to use async/await" in your editor and the model now actually knows which file and which project you mean. Say "where is the database config" and the model can reference real file paths from your project, formatted as @project/path/to/file.

Defaults to on, opt-out via Settings → General → Context Awareness.

How it works

Capture — on every hotkey press, the main process spawns a small native helper (spawnSync, 500ms timeout) that returns JSON and exits:

  • macOS — Swift binary (macos-context-capture) uses NSWorkspace + the Accessibility API. For known Electron editors (VS Code, Cursor, Windsurf) it walks the AX tree to extract open tab names (AXTabGroupAXRadioButton) and sidebar file names (AXOutlineAXRow). 400ms internal deadline so it can't block.
  • Windows — 80-line C binary (windows-context-capture.exe) using GetForegroundWindow + QueryFullProcessImageNameA. Returns window title + process exe name.
  • Linux — no native binary needed. Pure JS auto-detects the compositor and shells out: hyprctl activewindow -j (Hyprland), swaymsg get_tree (Sway), xdotool + xprop (X11), or gdbusorg.gnome.Shell.Eval (GNOME Wayland).

Parse — window-title parsers per editor family (VS Code em-dash format, JetBrains en-dash format, Xcode reversed, Sublime hyphen format, Vim/Neovim). Extracts file name + project name from the title alone.

Fallback file list — for VS Code/Cursor/Windsurf, if AX tree returned no sidebar (e.g. accessibility permission not granted, or Windows/Linux with no equivalent API), the JS-side reads the editor's storage.json from the app-support directory, finds the project's absolute path, then does a depth-4 scan for code files (capped at 200 files). Results cached 1-2min.

Prompt injectiongetSystemPrompt appends a Context (the user is currently working in): block with app / project / file / open tabs / project file list, plus an instruction to format file references as @project/filename. Threads through OpenAI, Anthropic (IPC), Gemini, local, and enterprise (Azure/Vertex/Bedrock) providers.

Performance + privacy

  • Opt-out gated in the main process. When the setting is off, windowManager._contextAwarenessEnabled short-circuits and the native binary is never spawned — zero overhead.
  • Setting is persisted in .env as CONTEXT_AWARENESS_ENABLED so it survives restart and main knows before the renderer loads.
  • Context never leaves the user's machine except as part of the prompt sent to whichever reasoning provider they've already configured.

Packaging

  • extraResources in electron-builder.json: resources/bin/macos-context-capture (top-level) + windows-context-capture* (Windows filter). No ASAR unpack needed — binaries live alongside every other native listener in resources/bin/.
  • compile:context-capture + compile:wincontext wired into compile:native (runs on prestart, predev, every prebuild:*).
  • macOS script cross-compiles arm64 + x86_64 and verifies the Mach-O CPU type after build.
  • Windows script tries MSVC → MinGW → Clang, gracefully skips if none are available.

Why this matters

OpenWhispr sits on top of every reasoning model — Claude, GPT, Gemini, local — and the single biggest gap vs. dictating directly into Cursor/Claude Code is that the ambient model has no idea what the user is looking at. This closes that gap with ~1400 lines of code and no new dependencies.

Capture the active app name and window title when dictation starts,
then inject that context into the AI system prompt so responses are
more relevant to what the developer is working on.

- Swift one-shot binary (macos-context-capture) using AXUIElement API
- Parses file names from VS Code, Xcode, JetBrains, Sublime, Vim titles
- Graceful degradation: exit code 2 when Accessibility not granted
- Settings toggle (default on, macOS only) in General section
- Context flows: main → preload → useAudioRecording → audioManager → ReasoningService
- Hash-based build caching with cross-arch compilation support
- Add Windows context capture via C binary (Win32 GetForegroundWindow/
  GetWindowTextW/QueryFullProcessImageName)
- Add Linux support with auto-detected strategy: hyprctl (Hyprland),
  swaymsg (Sway), xdotool+xprop (X11/XWayland), gdbus (GNOME)
- Move filename parsing from Swift binary to shared JS (DRY across
  all 3 platforms) with IDE detection for VS Code, Xcode, JetBrains,
  Sublime Text, and Vim/Neovim
- Simplify macOS Swift binary (~80 lines removed)
- Remove macOS-only gate on Context Awareness settings toggle
Extend Swift binary to walk the Accessibility tree for VS Code family
editors (VS Code, Cursor, Windsurf) to extract open tab names and
sidebar file tree items. Add project name parsing from window titles.
Enrich system prompt with project context and @project/filename tagging.
Resolve project name to filesystem path via VS Code/Cursor/Windsurf
storage.json, then scan the directory for code files. Falls back from
AX tree walking when sidebar items are unavailable (VS Code on macOS,
all editors on Windows/Linux). Cached with TTL for zero-latency repeats.
VS Code extension panels (e.g., Claude Code) use a 2-part title format
without the app name suffix: "Panel — project". The parser now checks
if the last segment is a known app name to determine which part holds
the project name.
- Persist contextAwarenessEnabled to .env and cache in windowManager
  so captureContext() is skipped when disabled (avoids ~5-50ms of
  native binary spawn per hotkey press when feature is off).
- Extract sendToggleDictation() and captureAppContext() helpers in
  windowManager, replacing five duplicated call sites across main.js
  and windowManager.js.
- Rename _commandCache to _cache in ContextCaptureManager since it
  holds polymorphic value shapes keyed by prefix.
Integrate 769 commits from main while preserving IDE context-awareness
changes. Key integrations:

- Wire contextAwarenessEnabled into the expanded settingsStore / useSettings
  alongside new startMinimized, gcalAccounts, meetingDetection, and
  panel-position settings.
- Merge into windowManager's new sendToggleDictation/sendStartDictation
  helpers (which gained meetingDetectionEngine integration in main),
  preserving captureAppContext() gating.
- Pass config.appContext through the new processWithEnterprise reasoning
  path so Azure/Vertex/Bedrock providers also receive IDE context.
- Keep CONTEXT_AWARENESS_ENABLED in environment.js alongside the new
  START_MINIMIZED and PANEL_START_POSITION env keys.
- Update electron-builder.json to bundle windows-context-capture*
  alongside the new mic/text/AEC binaries.
Comment thread scripts/build-windows-context-capture.js Fixed
@gabrielste1n gabrielste1n changed the title Ide context awareness feat: IDE context awareness for dictation prompts Apr 20, 2026
Flagged by CodeQL. Inputs are internal constants so not exploitable,
but the primitive is now correct for any future caller.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants