A floating, always-on AI desktop companion that talks, listens, sees your screen, drives your mouse, opens your apps, edits your files, and remembers every conversation. Bring whichever AI you already love — 12 providers, one interface — and give it a body.
Download · Privacy · Terms · Report an issue
◍ collapsed orb ──summon (Ctrl+Shift+Space)──► ▢ command panel
│
┌─────────────────────────────┼─────────────────────────────┐
▼ ▼ ▼
Nexus Chat Quick AI
radial HUD threaded conversations one-shot answer
modes · projects · tools agent · vision · voice Ctrl+Shift+J
The biggest release since v1.0 — turns "I downloaded VoidSoul, now what?" into "VoidSoul found my stuff, I'm chatting in 30 seconds."
Magic-moment setup
- Auto-detection on first launch — scans for Claude Desktop, Cursor, ChatGPT Desktop, env-var API keys (Anthropic, OpenAI, Gemini, Groq, xAI, OpenRouter, DeepSeek, Mistral), and running local providers (Ollama, LM Studio). One-click "Import everything" pulls all your existing MCP servers + API keys into VoidSoul with zero typing.
- Settings → MCP also gains import shortcuts — "Import from Claude Desktop · 5" / "Import from Cursor · 3" buttons surface when those tools are detected. Idempotent — re-run after adding a new MCP server to Claude Desktop and only the new one gets imported.
- Re-run setup from Settings → About — manual re-trigger of the discovery panel.
MCP server marketplace
- Settings → MCP → Browse marketplace — curated list of 10 popular MCP servers (Filesystem, GitHub, Memory, Brave Search, Puppeteer, Fetch, Postgres, SQLite, Slack, Sequential Thinking). One-click install with per-server config dialog (folder path, API key, etc.).
- Fetched live from the project repo + bundled fallback in the installer so it works offline.
- "Browse all community servers ↗" link to the wider awesome-mcp-servers catalog.
Voice — full Piper TTS swap
- Replaced the Web Speech / Microsoft SAPI voices with Piper neural TTS. Real character voices for Void (Arctic) and Soul (Amy), running locally, no API key, no cloud.
- Piper binary bundled in the installer (~22 MB Windows). Voice models live in
<userData>/voices/<persona>/and the picker shows them with name, language, quality tier, and file size. - Powered by Piper TTS credit card in Settings → Voice links out to Michael Hansen's repo for stars / sponsorship — supporting the open-source maintainer this is built on.
- Streaming TTS still chunks per sentence; one piper subprocess per sentence renders WAV → blob →
<audio>queue. Real-time factor 0.07× — sentences ready inside 200 ms. --length_scalemapped from the existing rate slider; volume applied per-utterance.
Collaborative decision cards
- The AI can now emit a ```askuser fenced JSON block when it faces a meaningful design fork (library / pattern / approach choice). The block renders as an interactive card in the chat with checkboxes, an "Other" free-text fallback, preview tooltips, and inline-editable labels so you can refine the AI's wording before submitting.
- System prompt teaches the AI when to use it (skipped for trivial yes/no, used for "which database?" style forks).
Plugin marketplace (lightweight)
- Settings → Plugins now has Installed / Browse tabs. Browse fetches the curated plugin registry from the project repo, one-click install pulls the manifest into your plugins folder.
- Three starter packs ship: Dev Bookmarks, AI Research Shortcuts, Creator Toolkit.
Auto-routing expansion
- Router classifier picks up new signals: fenced code block in your prompt → coding-grade model; conversational opener ("hey", "thanks", short prompt) → fast cheap model; long prompt (>2000 chars) → reasoning model.
- Composer pill now shows "✨ Auto · sonnet-4-6" when no manual lock is set, with a tooltip explaining the per-prompt routing. Lock to a specific model with the existing dropdown.
Quality of life
- Per-message mute button on every assistant reply — same icon used as a smart toggle (play / stop while speaking).
- Speech volume slider in Settings → Voice (separate from OS volume).
- In-app reviews (Settings → About → Leave a review) — star rating + comment → emails the studio via Formspree. Curated reviews land on voidsoul.app.
- Ctrl+V image paste in the chat composer (and the Nexus inline composer).
- Inline composer on the Nexus panel — quick follow-ups without switching to the full chat tab.
- Rolling-line voice preview in Nexus — replaces the scrollable text box with a single sentence that ticks forward as TTS reads.
- Tap orb → start/stop voice input — both Nexus layouts. The orb now matches the "tap the orb to talk" hint that was already shown beside it.
- Frameless Settings window — drops the OS title bar / File-Edit-View menu, keeps native min/max/close as a slim overlay.
- Provider settings cleanup — NEW MODELS callout collapses behind a "Show all" toggle, Remove stored key shrunk to a trash icon next to the saved badge, Endpoint override collapsed under an Advanced disclosure for cloud providers.
- Agent step ceiling bumped 30 → 60 — multi-page audits and long automation chains finish in one pass without hitting the "type continue" pause.
Fixes
- Windows installer icon — multi-resolution
.icogenerated bynpm run icons(BMP DIB for small sizes, PNG for 256). Start Menu / Desktop shortcuts and the EXE icon now show the orb instead of the generic Electron logo. - Gemini schema fix —
$schema,additionalProperties, and similar JSON Schema meta-fields recursively stripped from MCP tool definitions before they hit Gemini's strict OpenAPI-subset validator. The 400 "Unknown name '$schema'" errors are gone. - Marketplace install — already-installed servers now read as "Already installed" instead of the misleading "X connected · N tools" success toast.
- Race fixes — close-during-import races in the discovery panel and MCP import dialog no longer write state to unmounted components.
- Plugin / MCP registries — bundled fallback ships in the installer so both marketplaces work offline or before a registry update has propagated through the CDN.
Polish pass from beta-tester feedback — UX cleanup across the board, no breaking changes.
Voice + TTS
- Streaming TTS — the assistant starts talking on the first complete sentence instead of waiting for the whole reply to finish typing. Drops perceived latency from "wait for paragraph, then voice" to "voice within the first sentence."
- Smoother voice picks —
guessVoice()now prefers neural OS voices (Windows Aria/Guy Online, macOS Premium/Enhanced) before falling back to the older SAPI Zira/David defaults beta testers complained about. - Speech volume slider — independent of the OS / app volume mixer, so users can dial Void/Soul down without muting everything else from the app.
- Per-message mute toggle — the speaker icon on every assistant message doubles as a quick mute while TTS is active. One-click silence on long replies.
Nexus panel
- Rolling-line voice preview — the response area is now a clean single-line "teleprompter" that ticks to the next sentence as TTS reads. No more scrollable text-spill.
- Inline composer — compact send box right on the Nexus panel so you can ship a quick follow-up without switching to the full chat tab.
Chat composer
- Ctrl+V image paste — pasting a screenshot (Snipping Tool, browser "copy image", anything on the clipboard) attaches it as an image attachment, exactly like drag-and-drop.
Settings + onboarding
- Frameless Settings window — drops the OS title bar + File/Edit/View menu, keeps native min/max/close as a slim overlay. Cleaner, more app-like.
- Honest voice banner — replaced the misleading "install Aria/Guy from Narrator" copy (those voices are gated behind a private Microsoft API and unreachable to third-party apps) with a straight explanation + a deep link to Speech → Add voices for voices that actually work.
- In-app reviews — Settings → About → "Leave a review" opens a star-rating + comment dialog. Submissions email the studio; the best ones land on the marketing site's Reactions wall.
Installer
- Windows .ico icon — Start Menu shortcut, Desktop shortcut, and the EXE itself now show the VoidSoul orb instead of the generic Electron icon.
Two big systems landed in this release.
Auto-routing across providers. Configure Anthropic + OpenAI + Gemini + Ollama and the assistant picks the right one per task. Attach a screenshot → routed to a vision model. Agent loop with 20+ tool calls → routed to a fast tool-use model (Sonnet, not Opus). Near your monthly budget cap → routed to local. You never touch the picker; your per-thread sticky overrides still win when you want manual control.
Long-running task resilience. Agent runs now survive crash / restart / sleep / panel-close. Every step persists a checkpoint to SQLite; a recovery banner on the next launch offers to resume from where you stopped. The 30-step ceiling is up from 6. Rolling-summary compaction lets sustained runs fit in the model's context window. Close the panel and walk away — the tray shows live ⚙ step 23 progress, the loop keeps running, and when you come back the doc is in your thread.
Other improvements
- Step-cap pause UX — clear "type continue to resume" instead of silent half-done bubble
- Provider
stop_reason/finish_reasonsurfaced as real errors (was silent truncation) - 120s invoke timeout — hung providers no longer block the agent loop forever, friendly error message instead of
AbortError - Cost dashboard signal feeds into routing — near-cap users auto-fall-back to free/local
- Graceful pause-on-quit — closing the app while runs are live marks them as paused, not crashed
Today's AI apps split awkwardly:
- Cloud chatbots can't touch your machine.
- Editor copilots are trapped in one IDE.
- Closed ecosystems lock you to one vendor and one model.
VoidSoul is the third path — a single persistent assistant that's always there, can do real work on your computer, speaks every major model, never sends your data anywhere you didn't tell it to, and grows with you through plugins, MCP servers, and per-mode memory.
It's not a better chatbot. It's an operating layer that turns whatever AI you pay for into Jarvis.
|
12 providers, one interface: OpenAI · Anthropic · Gemini · Ollama · LM Studio · llama.cpp · Groq · xAI · OpenRouter · DeepSeek · Mistral · Custom OpenAI-compatible
|
Grab the latest installer from the Releases page:
- Windows —
VoidSoul AI Companion Setup *.exe(NSIS installer) - macOS —
VoidSoul AI Companion-*.dmg - Linux —
VoidSoul AI Companion-*.AppImage
Updates arrive automatically — VoidSoul checks GitHub Releases on boot and notifies you when a new version is ready. Settings → About → Check for updates triggers a manual check.
First-launch heads-up: Windows SmartScreen may flag the installer because it's not yet code-signed. Click More info → Run anyway to install. macOS users: right-click the
.dmg→ Open the first time. Code-signed builds land once the initial-release budget covers the certificate.
- Pick a workflow mode (Indie Dev · Creator · Streamer · Researcher · Writer · Productivity) and a Nexus layout
- The interactive checklist walks you through: send a message → press Cmd+F → open mic → open Settings
- Either install Ollama (ollama.com/download) and VoidSoul auto-detects it for free local chat, or paste a provider key in Settings → AI Providers
That's it. The orb is alive.
| Layer | What it remembers | When it triggers |
|---|---|---|
| Threads | Every message, every conversation. Named, pinned, full-text searchable across the whole history. | Always on (unless Private mode is toggled per chat). |
| Projects | Pinned mode + instructions that auto-prepend to every thread in the project. | Per project; per-thread overrides win. |
| Story-so-far | Prose recap of older messages in a long conversation, injected into the system prompt. | When estimated tokens exceed ~10k and the chat has ≥8 messages. Cached, regenerated only when stale. |
| Facts | Short, durable bullets ("user is solo dev of Spiritless in UE5"). Mode-taggable, capped at 50. | Auto-extracted after every reply (toggleable). Manual entry from Settings → Memory. |
| RAG | Embeddings of every persisted message + any indexed files (PDFs, DOCX, code, text). | On every send: retrieves top-K relevant snippets, injects into the system prompt. |
The five layers compose without stepping on each other. Short chats use only Threads; medium ones add Story-so-far; long-history users get RAG; Facts thread through all of them; Projects override the baseline.
Toggle Agent in the chat header. The model can call any of these, plus every MCP tool from every connected server:
| Tool | Permission | What it does |
|---|---|---|
open_app · open_url · open_folder |
appControl · browser · filesystem |
Launch apps, open links and folders |
run_shell |
terminal |
Run shell commands, capped output |
list_files · read_file · write_file · organize_folder |
filesystem |
File I/O (writes are reversible via undo) |
type_text · send_hotkey |
inputAccess |
Drive keyboard into any focused window |
move_mouse · click_mouse |
inputAccess |
Precise cursor control |
read_screen |
screenCapture |
OCR the screen to text via Tesseract |
see_screen |
screenCapture |
Capture the screen and look at it as an image |
web_search |
— | DuckDuckGo (keyless) or Tavily (upgrade) |
web_fetch |
— | Pull a URL, extract the main content (readability + SSRF guard) |
run_python |
terminal |
Run Python in a sandboxed tmp dir, capped output |
generate_image |
— | Pollinations (keyless), DALL·E 3, Stable Diffusion 3, Gemini Imagen 3 |
Plus your MCP servers, your plugin actions, and any tool you wire in. The agent loop is capped at six steps per turn, runs non-streaming for accuracy, surfaces live progress (Searching the web: "metal vs vulkan"…), preserves partial content on user-stop, and logs every step to the Logs tab — filterable by level + category, searchable.
With a vision-capable model (Claude 3+ · GPT-4o · Gemini 1.5+ · LLaVA · MiniCPM-V · Grok-2 vision), see_screen captures your screen and attaches the image to the model's next reasoning step — it looks, not just OCRs.
Combine with mouse control for true GUI work:
"Take a screenshot, find the Build button in the toolbar, click it."
The agent does: see_screen → identifies pixel coordinates → move_mouse(x, y) → click_mouse(). The full hands-free loop.
The model picker flags vision-capable models with an emerald eye icon so you never accidentally attach an image to a text-only model — and the composer warns if you do.
- Wake word: keyless local Whisper engine that matches arbitrary phrases ("Hey Void", "Soul"), or upgrade to Porcupine for lower CPU.
- STT: local Whisper-tiny via
@xenova/transformers(no API key needed) — falls back to OpenAI Whisper / Gemini if you've configured them and want quality. - TTS: OS speech-synthesis with named personas — Void (male) and Soul (female).
- Barge-in: speak mid-reply — TTS stops, the mic auto-opens. Same UX shape as ChatGPT voice mode.
- DND / quiet hours: auto-suppress voice between configured times.
The default voice loop runs without sending a byte of audio off your machine.
VoidSoul speaks the Model Context Protocol as a client. Add any MCP server — community or custom — from Settings → MCP Servers:
Name: Filesystem
Command: npx
Args: -y @modelcontextprotocol/server-filesystem C:\Path\To\Project
Click Add & start. Its tools instantly become callable by the agent. No code changes, no rebuilds.
Tested out of the box with:
- 🗂️ Filesystem — search, read, write, list (14 tools)
- 🐙 GitHub — issues, PRs, commits, gists, search (26 tools)
- 🎮 Unreal Engine 5 — custom bridge bundled with the install
| Provider | Notes |
|---|---|
| Anthropic Claude | Opus 4.7, Sonnet 4.6/4.5, Haiku 4.5, Claude 3 — vision |
| OpenAI | GPT-4o, GPT-4.1, o1, o3-mini — vision |
| Google Gemini | 2.0 Flash/Pro, 1.5 Pro/Flash — vision |
| Ollama (local) | Auto-detected on port 11434. Llama, Qwen, Phi, Mistral, LLaVA, Moondream |
| LM Studio (local) | Auto-detected on port 1234. Any model you've loaded |
| llama.cpp (local) | Auto-detected llama-server on port 8080. Any GGUF |
| Groq | Blisteringly fast — Llama, Mixtral |
| xAI Grok | Grok-2, Grok-2 Vision |
| OpenRouter | Aggregator — try almost anything |
| DeepSeek | Strong coder models |
| Mistral | Mistral Large, Codestral |
| Custom | Any OpenAI-compatible endpoint |
Switch providers mid-conversation; threads, memory, and tools follow. Pick a different model per thread for "this one's for vision, this one's for cheap code Q&A".
Every API call is metered, priced against the current model table, and surfaced in Settings → Usage & Cost:
- Month-to-date total + per-model breakdown
- Daily-spend bar chart for the current month
- Provider-share stacked bar with legend
- Monthly budget with progress bar + toast alerts at 75/90/100%
- Recent-calls list with tokens in/out per call
Token counts are estimates from text length unless the provider reports actuals; the actual bill takes precedence. Local providers (Ollama / LM Studio / llama.cpp / custom) are zero-priced.
VoidSoul never acts silently. Each capability maps to a permission you grant explicitly and can revoke at any time:
| Permission | Unlocks | Risk |
|---|---|---|
terminal |
Running shell commands · Python | High |
filesystem |
Reading / writing / organising files | High |
browser |
Opening URLs | Low |
appControl |
Launching apps, foregrounding windows | Medium |
inputAccess |
Keyboard + mouse control | High |
microphone |
Voice input | Medium |
screenCapture |
Screenshots, OCR, vision | Medium |
Toggle the shield icon in the chat header. The current conversation:
- Isn't persisted to disk
- Doesn't trigger fact extraction
- Suppresses the screen-awareness line in the system prompt
For sensitive material — IP, personal data, anything you don't want lingering on disk.
Everything lives in a local data folder (Settings → About → Open data folder) backed by SQLite for chat history + plain JSON for config / memory / plugins / MCP servers. API keys are encrypted via Electron safeStorage (Windows DPAPI / macOS Keychain / Linux libsecret) and never leave the main process.
See PRIVACY.md for the full policy — short version: nothing leaves your computer unless you trigger an action that explicitly sends it.
Settings → Backup & Sync exports the whole setup (config · memory · plugins · chat history · usage · scheduled tasks · MCP servers) as one portable JSON — or writes it to a chosen folder. Point that folder at Dropbox / OneDrive / Google Drive and you have de-facto cloud sync, using your own cloud. API keys are deliberately excluded; re-enter them on each machine.
A plugin is a declarative JSON workflow pack — drop a .json file into the plugins folder (Settings → Plugins → Open folder). It contributes permission-gated quick actions built from the existing action types; it cannot execute arbitrary code, so a plugin is never more dangerous than the actions it bundles.
Invalid manifests surface their validation error rather than being silently dropped.
| Shortcut | What |
|---|---|
Ctrl/Cmd + Shift + Space |
Summon / hide the widget |
Ctrl/Cmd + Shift + J |
Open the Quick AI overlay from anywhere |
Ctrl/Cmd + F |
Cross-thread search |
Ctrl/Cmd + K |
Command palette |
Enter |
Send a message |
Shift + Enter |
Newline in the composer |
Esc |
Close any open dialog |
VoidSoul uses ChatGPT, Claude, Gemini, and the rest under the hood — it doesn't replace them. It gives them a body they don't have:
| VoidSoul | ChatGPT Desktop | Claude Desktop | Raycast | LM Studio | |
|---|---|---|---|---|---|
| Multi-provider in one app | ✅ 12 | ❌ | ❌ | ✅ | ❌ |
| Open / drive YOUR apps & files | ✅ | sandbox | beta | ❌ | ❌ |
| See your real screen | ✅ | ❌ | beta | ❌ | ❌ |
| Drive mouse + keyboard | ✅ | ❌ | beta | ❌ | ❌ |
| MCP support | ✅ GUI | ❌ | ✅ JSON | ❌ | ✅ |
| Local-first defaults | ✅ | ❌ | ❌ | ❌ | ✅ |
| Memory across vendors | ✅ | ❌ | ❌ | ❌ | ❌ |
| Floating orb / global hotkey | ✅ | ❌ | ❌ | Mac only | ❌ |
| Workflow modes | ✅ 6 | ❌ | ❌ | ❌ | ❌ |
| Custom plugins | ✅ | limited | ✅ | ❌ | |
| Voice loop (wake → STT → reply → TTS) | ✅ local | partial | ❌ | ❌ | ❌ |
| Cost tracking dashboard | ✅ | ❌ | ❌ | ❌ | ❌ |
| Per-token cost only | ✅ | $20/mo | $20/mo | $8/mo | free |
Use the official apps for their sandboxed exploration. Use VoidSoul when you want the AI to actually do things in your environment.
- Windows 11 (primary target — automation paths are Windows-tuned)
- macOS 14+ (Apple Silicon + Intel)
- Ubuntu 22.04 (AppImage)
- Models used in agent mode: Claude Sonnet 4.5, GPT-4o, Gemini 2.0 Flash, Qwen 2.5 7B (Ollama)
- MCP servers:
@modelcontextprotocol/server-filesystem,@modelcontextprotocol/server-github, the bundled UE5 bridge
Shipped: local Whisper STT, wake word, voice barge-in, vision flag, agent live progress, PDF inline preview, GGUF / llama.cpp detection, multi-model per thread, settings backup, light theme, i18n (4 locales), interactive onboarding, math/mermaid/syntax-highlight markdown, Cmd+F cross-thread search, cost dashboard with charts, share-to-gist, auto-updater, privacy + terms.
Queued:
- Persistent Python sandbox with file workspace (Jupyter-style across runs)
- Computer-use mode — autonomous goal-driven loop, not one tool at a time
- Deep-research mode — multi-step plan → search → fetch → cite
- Vector-store browser — visual RAG corpus management
- Mobile companion — iOS first, sync threads
- Browser extension — chat overlay on any page
- Code signing + notarisation (Windows EV cert + Apple Developer Program)
- Hosted share URLs at
share.voidsoulstudio.com(replace gists) - Team workspace — shared projects, SSO, audit log
- Plugin / theme marketplace
- Localization expansion — Chinese, Korean, French, Portuguese
- Binary distribution & use: governed by Terms of Service.
- Privacy: see Privacy Policy. Short version: nothing leaves your computer unless you trigger an action that explicitly sends it.
- Bug reports & feedback: open an issue.
VoidSoul AI Companion is © Void Soul Studio. Distribution of the compiled application is governed by the Terms of Service linked above. Re-selling, re-distributing, or hosting the application as a service requires written permission.
Built solo. Local-first. The AI you already love, with a body.
Talk to the orb.
{ "id": "spiritless-pack", "name": "Spiritless", "version": "1.0.0", "author": "your-name", "description": "Shortcuts for my UE5 game.", "quickActions": [ { "id": "open-engine", "label": "Open UE5", "icon": "Gamepad2", "description": "Launch the editor on Spiritless.", "requires": "appControl", "action": { "type": "open-app", "params": { "app": "C:/Epic/UE_5.4/Engine/Binaries/Win64/UnrealEditor.exe" } } } ] }