This document describes the high-level architecture of Tabby, covering the major system layers, data flow between components, and the storage strategy.
Tabby is a system-wide AI assistant that runs as an Electron desktop app. It intercepts keyboard/screen activity, routes requests through an API orchestration layer to multiple LLM providers, and maintains a persistent memory layer for personalized responses.
The system is composed of four main layers:
| Layer | Tech | Purpose |
|---|---|---|
| Desktop App | Electron + Next.js + React | UI rendering, OS-level hooks, local storage |
| API Backend | Next.js (standalone) | LLM orchestration, streaming, tool routing |
| Memory Layer | Python + FastAPI + Mem0 | Persistent memory with vector search and knowledge graph |
| Storage | SQLite (local) + Supabase + Neo4j | Conversations, auth, memories, graph data |
Directory: frontend/
The desktop app is an Electron shell embedding a Next.js frontend. It handles all OS-level interactions and renders the user interface.
The main process boots the app, creates windows, registers global shortcuts, and manages background services.
Entry point: main.ts - Creates the main window, system tray, initializes context capture, registers global shortcuts, and sets up IPC handlers. On first launch, opens the settings window for onboarding.
App State (app-state.ts): Centralized singleton holding references to all windows, services, and settings. Uses electron-store for persistence of user preferences (model selection, user ID, cached memories, onboarding status).
Four Electron BrowserWindow types, each loading a different Next.js route:
| Window | Route | Purpose |
|---|---|---|
| Main | / |
Action menu, chat, interview copilot |
| Settings | /settings |
Model selection, API keys, preferences |
| Brain Panel | /brain-panel |
Memory dashboard, knowledge graph viewer |
| Suggestion | /suggestion |
Inline AI suggestion popup |
All windows are frameless, always-on-top, and transparent - designed to overlay on top of other applications without being intrusive.
Background services running in the main process:
| Service | File | Purpose |
|---|---|---|
| Local Database | local-db.ts |
SQLite via better-sqlite3 for conversations and messages |
| File Storage | local-file-storage.ts |
Local filesystem for screenshots and chat attachments |
| Text Handler | text-handler.ts |
Clipboard operations, typewriter-mode text injection |
| Context Capture | context-capture.ts |
Periodic screenshot capture for ambient awareness |
| Ghost Text Overlay | ghost-overlay.ts |
Transparent overlay window for inline suggestions |
| Keyboard Monitor | keyboard-monitor.ts |
Debounced keystroke buffer for auto-suggestions |
| Keystroke Listener | keystroke-listener.ts |
Low-level keyboard hook for character capture |
| Interview Ghost | interview-ghost.ts |
Code suggestion overlay for coding interviews |
| Caret Tracker | caret-tracker.ts |
Tracks cursor position for overlay placement |
| Transcribe Service | transcribe-service.ts |
Audio recording and transcription handling |
| Voice Agent Panel | voice-agent-panel.ts |
Voice conversation panel management |
| Transcribe Indicator | transcribe-indicator.ts |
Visual indicator for recording/processing state |
Bridge between the main process and renderer via Electron IPC. Handlers are grouped by domain:
db-handlers.ts- Conversation and message CRUD (backed by local SQLite)settings-handlers.ts- Model, API key, and preference managementtext-handlers.ts- Text injection and clipboard operationscapture-handlers.ts- Screen capture triggerswindow-handlers.ts- Window show/hide/resizevoice-agent-handlers.ts- Voice agent panel toggletranscribe-handlers.ts- Transcription controlsonboarding-handlers.ts- First-launch onboarding flow
System-wide keyboard shortcuts registered via Electron's globalShortcut API. See the README for the full shortcut reference.
Uses contextBridge.exposeInMainWorld to safely expose IPC methods to the renderer process under window.electron. This is the security boundary between the sandboxed renderer and the privileged main process.
The renderer is a standard Next.js application rendered inside Electron.
| Route | Purpose |
|---|---|
/ |
Main action menu - chat, copilot, quick actions |
/(auth) |
Sign-in / sign-up flows |
/dashboard |
Usage analytics and stats |
/brain-panel |
Memory viewer and knowledge graph |
/settings |
Preferences and configuration |
/suggestion |
Inline suggestion popup |
/ghost-overlay |
Transparent ghost text overlay |
/voice-agent-panel |
Voice conversation interface |
/transcribe-indicator |
Recording/processing status indicator |
/onboarding |
First-launch setup wizard |
/preferences |
User preferences |
| Group | Purpose |
|---|---|
action-menu/ |
Main AI interaction surface - chat, copilot, action triggers |
brain-panel/ |
Memory dashboard, graph visualization |
chat/ |
Chat conversation UI and message rendering |
ai-elements/ |
AI-specific UI components (message parts, code blocks, markdown) |
settings/ |
Settings forms and controls |
supaauth/ |
Supabase authentication components |
dashboard/ |
Analytics charts and stats |
navigation/ |
Tab bar, nav components |
ui/ |
Shared design system (53 components) |
Directory: nextjs-backend/
A standalone Next.js application that serves as the AI orchestration layer. The frontend makes HTTP requests to these API routes; the Next.js backend handles prompt construction, LLM API calls (with streaming), and tool execution.
| Route | Method | Purpose |
|---|---|---|
/api/chat |
POST | General chat with streaming, memory retrieval, and tool use |
/api/suggest |
POST | AI suggestions for selected text (action menu) |
/api/suggest-inline |
POST | Inline autocomplete suggestions (ghost text) |
/api/completion |
POST | Text completion |
/api/interview-copilot |
POST | Multi-tab coding interview analysis (idea, code, walkthrough, tests) |
/api/interview-ghost-suggest |
POST | Code suggestion for interview ghost overlay |
/api/prep-mode |
POST | Interview preparation mode |
/api/voice-agent |
POST | Voice conversation with tool calling |
/api/voice-agent/execute-tool |
POST | Execute tools on behalf of voice agent |
/api/voice-command |
POST | Voice command interpretation |
/api/voice-generate |
POST | Generate voice responses |
/api/speech |
POST | Text-to-speech synthesis |
/api/transcribe |
POST | Audio-to-text transcription |
/api/search |
POST | Web search via Tavily |
/api/generate-title |
POST | Auto-generate conversation titles |
/api/download |
GET | File download handler |
/api/auth/signup |
POST | User registration |
/api/auth/verify-otp |
POST | OTP verification |
/api/dashboard/stats |
GET | Usage statistics |
/api/dashboard/analytics |
GET | Analytics data |
The backend uses the Vercel AI SDK to interface with multiple LLM providers through a unified API:
- OpenAI - GPT-4.1, GPT-4.1-mini, GPT-4.1-nano
- Google Gemini - Gemini models
- Groq - Fast inference
- Cerebras - Ultra-fast inference
- OpenRouter - Multi-provider routing
- LM Studio - Local model support
The backend supports Model Context Protocol (MCP) tools that extend LLM capabilities:
- Memory tools - Add, search, update, and retrieve user memories via the Python backend
- Desktop Automation MCP - System-level operations on Windows (via a separate
windows-mcpserver)
Directory: backend/
A Python FastAPI server providing the AI memory system powered by Mem0.
- Memory Extraction - Mem0 automatically extracts key facts from conversations using an LLM (GPT-4.1-nano).
- Memory Classification - An LLM-based classifier categorizes each memory into one of five types:
LONG_TERM- Persistent preferences and factsSHORT_TERM- Temporary contextEPISODIC- Specific experiences and eventsSEMANTIC- General knowledge and conceptsPROCEDURAL- How-to knowledge and workflows
- Vector Storage - Memories are embedded and stored in Supabase (pgvector) for semantic search.
- Knowledge Graph - Optionally, memories are also stored in Neo4j for relationship-based retrieval (Hybrid RAG).
| Endpoint | Method | Purpose |
|---|---|---|
/add |
POST | Extract and store memories from conversation messages |
/search |
POST | Semantic search across memories |
/get-all |
POST | Retrieve all memories for a user (optionally filtered by type) |
/add-image |
POST | Add memory from an image (visual memory) |
/memory/{id} |
GET | Get a specific memory |
/memory/update |
PUT | Update a memory |
/memory/{id} |
DELETE | Delete a memory |
/memory/user/{id} |
DELETE | Delete all memories for a user |
/memory/{id}/history |
GET | Get the history/changelog of a memory |
Tabby uses a split storage strategy - conversations stay on-device, while memories and auth live in Supabase.
| Store | Technology | Data |
|---|---|---|
| Conversations DB | SQLite (better-sqlite3) |
Conversations and messages (chat, copilot, etc.) |
| Attachments | Local filesystem | Screenshots, chat file attachments |
| User Preferences | electron-store |
Model settings, API keys, UI preferences |
The SQLite database lives at {userData}/tabby.db with WAL mode enabled. Conversations and messages are fully local - they never leave the device.
| Purpose | How |
|---|---|
| Authentication | User sign-up, OTP verification |
| Memory Vector Store | pgvector extension for semantic memory search |
| Storage Buckets | context-captures and project-assets for media files |
In development, Supabase runs locally via Docker. In production, it can point to Supabase Cloud.
Used for the knowledge graph portion of the memory system. Stores relationships between memories for Hybrid RAG (vector + graph retrieval). Optional - the system works without it, falling back to vector-only search.
User types in Action Menu
|
v
Frontend (Next.js renderer)
| HTTP POST /api/chat
v
Next.js Backend
|-- Fetches relevant memories from Memory API (/search)
|-- Constructs prompt with context + memories
|-- Calls LLM provider (streaming)
|-- Returns streamed response
v
Frontend renders response
|
| IPC: save to local DB
v
Electron Main Process
|-- Saves conversation + messages to SQLite
|-- Stores new memories via Memory API (/add)
User types in any application
|
v
Keystroke Listener (low-level keyboard hook)
|
v
Keyboard Monitor (debounced buffer)
| HTTP POST /api/suggest-inline
v
Next.js Backend
|-- Includes cached memories for context
|-- Calls fast LLM (GPT-4.1-mini)
|-- Returns suggestion
v
Ghost Text Overlay (transparent window)
| Shift+Tab to accept
v
Text Handler → injects text into active application
User presses Alt+X
|
v
Electron captures screen (screenshot)
| HTTP POST /api/interview-copilot
v
Next.js Backend
|-- Analyzes screenshot (vision model)
|-- Generates multi-tab response:
| - Idea (problem breakdown)
| - Code (implementation)
| - Walkthrough (explanation)
| - Test Cases (edge cases)
v
Frontend renders in tabbed UI
User presses Ctrl+Alt+J
|
v
Voice Agent Panel opens
| Audio recording
v
Transcribe Service → POST /api/transcribe
|
v
POST /api/voice-agent (with tools)
|-- LLM processes with tool calling
|-- May execute: memory tools, desktop automation
|-- POST /api/speech (response TTS)
v
Audio playback + text display
| Component | Platform | Trigger |
|---|---|---|
| Desktop App | GitHub Releases | pnpm run release from frontend/ |
| Next.js Backend | Vercel | Auto-deploy on push to main |
| Python Backend | Azure Container Apps | Push to backend/ on main |
| Supabase | Local Docker (dev) / Cloud (prod) | npx supabase start |
See the README for detailed setup and deployment instructions.
