Tabby - Architecture

This document describes the high-level architecture of Tabby, covering the major system layers, data flow between components, and the storage strategy.

System Overview

Tabby is a system-wide AI assistant that runs as an Electron desktop app. It intercepts keyboard/screen activity, routes requests through an API orchestration layer to multiple LLM providers, and maintains a persistent memory layer for personalized responses.

The system is composed of four main layers:

Layer	Tech	Purpose
Desktop App	Electron + Next.js + React	UI rendering, OS-level hooks, local storage
API Backend	Next.js (standalone)	LLM orchestration, streaming, tool routing
Memory Layer	Python + FastAPI + Mem0	Persistent memory with vector search and knowledge graph
Storage	SQLite (local) + Supabase + Neo4j	Conversations, auth, memories, graph data

Layer 1: Desktop App (Electron + Next.js)

Directory: frontend/

The desktop app is an Electron shell embedding a Next.js frontend. It handles all OS-level interactions and renders the user interface.

Electron Main Process (`frontend/electron/src/`)

The main process boots the app, creates windows, registers global shortcuts, and manages background services.

Entry point: main.ts - Creates the main window, system tray, initializes context capture, registers global shortcuts, and sets up IPC handlers. On first launch, opens the settings window for onboarding.

App State (app-state.ts): Centralized singleton holding references to all windows, services, and settings. Uses electron-store for persistence of user preferences (model selection, user ID, cached memories, onboarding status).

Windows (`windows/`)

Four Electron BrowserWindow types, each loading a different Next.js route:

Window	Route	Purpose
Main	`/`	Action menu, chat, interview copilot
Settings	`/settings`	Model selection, API keys, preferences
Brain Panel	`/brain-panel`	Memory dashboard, knowledge graph viewer
Suggestion	`/suggestion`	Inline AI suggestion popup

All windows are frameless, always-on-top, and transparent - designed to overlay on top of other applications without being intrusive.

Services (`services/`)

Background services running in the main process:

Service	File	Purpose
Local Database	`local-db.ts`	SQLite via `better-sqlite3` for conversations and messages
File Storage	`local-file-storage.ts`	Local filesystem for screenshots and chat attachments
Text Handler	`text-handler.ts`	Clipboard operations, typewriter-mode text injection
Context Capture	`context-capture.ts`	Periodic screenshot capture for ambient awareness
Ghost Text Overlay	`ghost-overlay.ts`	Transparent overlay window for inline suggestions
Keyboard Monitor	`keyboard-monitor.ts`	Debounced keystroke buffer for auto-suggestions
Keystroke Listener	`keystroke-listener.ts`	Low-level keyboard hook for character capture
Interview Ghost	`interview-ghost.ts`	Code suggestion overlay for coding interviews
Caret Tracker	`caret-tracker.ts`	Tracks cursor position for overlay placement
Transcribe Service	`transcribe-service.ts`	Audio recording and transcription handling
Voice Agent Panel	`voice-agent-panel.ts`	Voice conversation panel management
Transcribe Indicator	`transcribe-indicator.ts`	Visual indicator for recording/processing state

IPC Handlers (`ipc/`)

Bridge between the main process and renderer via Electron IPC. Handlers are grouped by domain:

db-handlers.ts - Conversation and message CRUD (backed by local SQLite)
settings-handlers.ts - Model, API key, and preference management
text-handlers.ts - Text injection and clipboard operations
capture-handlers.ts - Screen capture triggers
window-handlers.ts - Window show/hide/resize
voice-agent-handlers.ts - Voice agent panel toggle
transcribe-handlers.ts - Transcription controls
onboarding-handlers.ts - First-launch onboarding flow

Global Shortcuts (`shortcuts/`)

System-wide keyboard shortcuts registered via Electron's globalShortcut API. See the README for the full shortcut reference.

Preload Script (`preload.ts`)

Uses contextBridge.exposeInMainWorld to safely expose IPC methods to the renderer process under window.electron. This is the security boundary between the sandboxed renderer and the privileged main process.

Frontend Renderer (`frontend/src/`)

The renderer is a standard Next.js application rendered inside Electron.

Pages (`app/`)

Route	Purpose
`/`	Main action menu - chat, copilot, quick actions
`/(auth)`	Sign-in / sign-up flows
`/dashboard`	Usage analytics and stats
`/brain-panel`	Memory viewer and knowledge graph
`/settings`	Preferences and configuration
`/suggestion`	Inline suggestion popup
`/ghost-overlay`	Transparent ghost text overlay
`/voice-agent-panel`	Voice conversation interface
`/transcribe-indicator`	Recording/processing status indicator
`/onboarding`	First-launch setup wizard
`/preferences`	User preferences

Key Component Groups (`components/`)

Group	Purpose
`action-menu/`	Main AI interaction surface - chat, copilot, action triggers
`brain-panel/`	Memory dashboard, graph visualization
`chat/`	Chat conversation UI and message rendering
`ai-elements/`	AI-specific UI components (message parts, code blocks, markdown)
`settings/`	Settings forms and controls
`supaauth/`	Supabase authentication components
`dashboard/`	Analytics charts and stats
`navigation/`	Tab bar, nav components
`ui/`	Shared design system (53 components)

Layer 2: API & Orchestration Backend (Next.js)

Directory: nextjs-backend/

A standalone Next.js application that serves as the AI orchestration layer. The frontend makes HTTP requests to these API routes; the Next.js backend handles prompt construction, LLM API calls (with streaming), and tool execution.

API Routes (`src/app/api/`)

Route	Method	Purpose
`/api/chat`	POST	General chat with streaming, memory retrieval, and tool use
`/api/suggest`	POST	AI suggestions for selected text (action menu)
`/api/suggest-inline`	POST	Inline autocomplete suggestions (ghost text)
`/api/completion`	POST	Text completion
`/api/interview-copilot`	POST	Multi-tab coding interview analysis (idea, code, walkthrough, tests)
`/api/interview-ghost-suggest`	POST	Code suggestion for interview ghost overlay
`/api/prep-mode`	POST	Interview preparation mode
`/api/voice-agent`	POST	Voice conversation with tool calling
`/api/voice-agent/execute-tool`	POST	Execute tools on behalf of voice agent
`/api/voice-command`	POST	Voice command interpretation
`/api/voice-generate`	POST	Generate voice responses
`/api/speech`	POST	Text-to-speech synthesis
`/api/transcribe`	POST	Audio-to-text transcription
`/api/search`	POST	Web search via Tavily
`/api/generate-title`	POST	Auto-generate conversation titles
`/api/download`	GET	File download handler
`/api/auth/signup`	POST	User registration
`/api/auth/verify-otp`	POST	OTP verification
`/api/dashboard/stats`	GET	Usage statistics
`/api/dashboard/analytics`	GET	Analytics data

LLM Providers

The backend uses the Vercel AI SDK to interface with multiple LLM providers through a unified API:

OpenAI - GPT-4.1, GPT-4.1-mini, GPT-4.1-nano
Google Gemini - Gemini models
Groq - Fast inference
Cerebras - Ultra-fast inference
OpenRouter - Multi-provider routing
LM Studio - Local model support

MCP Tools

The backend supports Model Context Protocol (MCP) tools that extend LLM capabilities:

Memory tools - Add, search, update, and retrieve user memories via the Python backend
Desktop Automation MCP - System-level operations on Windows (via a separate windows-mcp server)

Layer 3: Memory Layer (Python + FastAPI)

Directory: backend/

A Python FastAPI server providing the AI memory system powered by Mem0.

How It Works

Memory Extraction - Mem0 automatically extracts key facts from conversations using an LLM (GPT-4.1-nano).
Memory Classification - An LLM-based classifier categorizes each memory into one of five types:
- LONG_TERM - Persistent preferences and facts
- SHORT_TERM - Temporary context
- EPISODIC - Specific experiences and events
- SEMANTIC - General knowledge and concepts
- PROCEDURAL - How-to knowledge and workflows
Vector Storage - Memories are embedded and stored in Supabase (pgvector) for semantic search.
Knowledge Graph - Optionally, memories are also stored in Neo4j for relationship-based retrieval (Hybrid RAG).

API Endpoints

Endpoint	Method	Purpose
`/add`	POST	Extract and store memories from conversation messages
`/search`	POST	Semantic search across memories
`/get-all`	POST	Retrieve all memories for a user (optionally filtered by type)
`/add-image`	POST	Add memory from an image (visual memory)
`/memory/{id}`	GET	Get a specific memory
`/memory/update`	PUT	Update a memory
`/memory/{id}`	DELETE	Delete a memory
`/memory/user/{id}`	DELETE	Delete all memories for a user
`/memory/{id}/history`	GET	Get the history/changelog of a memory

Layer 4: Storage

Tabby uses a split storage strategy - conversations stay on-device, while memories and auth live in Supabase.

Local Storage (On-Device)

Store	Technology	Data
Conversations DB	SQLite (`better-sqlite3`)	Conversations and messages (chat, copilot, etc.)
Attachments	Local filesystem	Screenshots, chat file attachments
User Preferences	`electron-store`	Model settings, API keys, UI preferences

The SQLite database lives at {userData}/tabby.db with WAL mode enabled. Conversations and messages are fully local - they never leave the device.

Supabase (Cloud / Local Docker)

Purpose	How
Authentication	User sign-up, OTP verification
Memory Vector Store	pgvector extension for semantic memory search
Storage Buckets	`context-captures` and `project-assets` for media files

In development, Supabase runs locally via Docker. In production, it can point to Supabase Cloud.

Neo4j (Optional)

Used for the knowledge graph portion of the memory system. Stores relationships between memories for Hybrid RAG (vector + graph retrieval). Optional - the system works without it, falling back to vector-only search.

Data Flow

Chat Flow

User types in Action Menu
    |
    v
Frontend (Next.js renderer)
    |  HTTP POST /api/chat
    v
Next.js Backend
    |-- Fetches relevant memories from Memory API (/search)
    |-- Constructs prompt with context + memories
    |-- Calls LLM provider (streaming)
    |-- Returns streamed response
    v
Frontend renders response
    |
    |  IPC: save to local DB
    v
Electron Main Process
    |-- Saves conversation + messages to SQLite
    |-- Stores new memories via Memory API (/add)

Ghost Text (Inline Suggestion) Flow

User types in any application
    |
    v
Keystroke Listener (low-level keyboard hook)
    |
    v
Keyboard Monitor (debounced buffer)
    |  HTTP POST /api/suggest-inline
    v
Next.js Backend
    |-- Includes cached memories for context
    |-- Calls fast LLM (GPT-4.1-mini)
    |-- Returns suggestion
    v
Ghost Text Overlay (transparent window)
    |  Shift+Tab to accept
    v
Text Handler → injects text into active application

Interview Copilot Flow

User presses Alt+X
    |
    v
Electron captures screen (screenshot)
    |  HTTP POST /api/interview-copilot
    v
Next.js Backend
    |-- Analyzes screenshot (vision model)
    |-- Generates multi-tab response:
    |   - Idea (problem breakdown)
    |   - Code (implementation)
    |   - Walkthrough (explanation)
    |   - Test Cases (edge cases)
    v
Frontend renders in tabbed UI

Voice Agent Flow

User presses Ctrl+Alt+J
    |
    v
Voice Agent Panel opens
    |  Audio recording
    v
Transcribe Service → POST /api/transcribe
    |
    v
POST /api/voice-agent (with tools)
    |-- LLM processes with tool calling
    |-- May execute: memory tools, desktop automation
    |-- POST /api/speech (response TTS)
    v
Audio playback + text display

Deployment

Component	Platform	Trigger
Desktop App	GitHub Releases	`pnpm run release` from `frontend/`
Next.js Backend	Vercel	Auto-deploy on push to `main`
Python Backend	Azure Container Apps	Push to `backend/` on `main`
Supabase	Local Docker (dev) / Cloud (prod)	`npx supabase start`

See the README for detailed setup and deployment instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tabby - Architecture

System Overview

Layer 1: Desktop App (Electron + Next.js)

Electron Main Process (`frontend/electron/src/`)

Windows (`windows/`)

Services (`services/`)

IPC Handlers (`ipc/`)

Global Shortcuts (`shortcuts/`)

Preload Script (`preload.ts`)

Frontend Renderer (`frontend/src/`)

Pages (`app/`)

Key Component Groups (`components/`)

Layer 2: API & Orchestration Backend (Next.js)

API Routes (`src/app/api/`)

LLM Providers

MCP Tools

Layer 3: Memory Layer (Python + FastAPI)

How It Works

API Endpoints

Layer 4: Storage

Local Storage (On-Device)

Supabase (Cloud / Local Docker)

Neo4j (Optional)

Data Flow

Chat Flow

Ghost Text (Inline Suggestion) Flow

Interview Copilot Flow

Voice Agent Flow

Deployment

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Tabby - Architecture

System Overview

Layer 1: Desktop App (Electron + Next.js)

Electron Main Process (frontend/electron/src/)

Windows (windows/)

Services (services/)

IPC Handlers (ipc/)

Global Shortcuts (shortcuts/)

Preload Script (preload.ts)

Frontend Renderer (frontend/src/)

Pages (app/)

Key Component Groups (components/)

Layer 2: API & Orchestration Backend (Next.js)

API Routes (src/app/api/)

LLM Providers

MCP Tools

Layer 3: Memory Layer (Python + FastAPI)

How It Works

API Endpoints

Layer 4: Storage

Local Storage (On-Device)

Supabase (Cloud / Local Docker)

Neo4j (Optional)

Data Flow

Chat Flow

Ghost Text (Inline Suggestion) Flow

Interview Copilot Flow

Voice Agent Flow

Deployment

Electron Main Process (`frontend/electron/src/`)

Windows (`windows/`)

Services (`services/`)

IPC Handlers (`ipc/`)

Global Shortcuts (`shortcuts/`)

Preload Script (`preload.ts`)

Frontend Renderer (`frontend/src/`)

Pages (`app/`)

Key Component Groups (`components/`)

API Routes (`src/app/api/`)