The pluggable engine interface (docs/ENGINES.md) means anyone can add a SQLite backend without touching the CLI, MCP server, or skills. This document is the full plan.
Postgres is the right choice for the primary user (7K+ pages, production RAG, zero-ops via Supabase). But a lot of people want something simpler:
- No server. One file.
brain.db. Done. - Git-friendly. You can (with care) commit a SQLite database alongside your notes.
- Offline. Works on a plane, in a coffee shop, wherever.
- Zero cost. No Supabase subscription. No hosting. No API keys for search (keyword-only mode works without OpenAI).
- Portable. Copy the file to another machine. That's it.
Tools like Khoj, Obsidian plugins, and various "local-first AI" projects already use SQLite with vector extensions. The patterns exist. This is well-trodden ground.
Compared to PostgresEngine:
| Feature | Postgres | SQLite | Impact |
|---|---|---|---|
| Full-text search quality | tsvector + ts_rank (excellent) | FTS5 + bm25 (good) | Slightly less precise ranking |
| Fuzzy slug matching | pg_trgm (excellent) | LIKE + Levenshtein (ok) | Fuzzier matching, more false positives |
| Vector search | pgvector HNSW (fast, accurate) | sqlite-vss or vec0 (good enough) | Slower at scale, good for <50K chunks |
| Concurrent access | Connection pooling, many readers/writers | Single writer, many readers | Not an issue for single-user CLI |
| JSONB queries | GIN index, rich operators | json_extract, no index | Slower frontmatter queries |
| Graph traversal | Recursive CTE (native) | Recursive CTE (supported since 3.8.3) | Same |
| Hosted option | Supabase, RDS, etc. | Turso (libSQL), Cloudflare D1 | SQLite has cloud options too |
For a single user with <10K pages and no concurrent access needs, these tradeoffs are fine.
SQLite equivalent of the Postgres schema. Key differences called out.
-- Enable WAL mode for better read concurrency
PRAGMA journal_mode=WAL;
PRAGMA foreign_keys=ON;
-- ============================================================
-- pages
-- ============================================================
CREATE TABLE pages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
slug TEXT NOT NULL UNIQUE,
type TEXT NOT NULL,
title TEXT NOT NULL,
compiled_truth TEXT NOT NULL DEFAULT '',
timeline TEXT NOT NULL DEFAULT '',
frontmatter TEXT NOT NULL DEFAULT '{}', -- JSON string, not JSONB
content_hash TEXT, -- SHA-256 for import idempotency
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_pages_type ON pages(type);
-- ============================================================
-- Full-text search via FTS5 (replaces tsvector)
-- ============================================================
CREATE VIRTUAL TABLE pages_fts USING fts5(
title,
compiled_truth,
timeline,
content='pages',
content_rowid='id',
tokenize='porter unicode61'
);
-- Triggers to keep FTS5 in sync
CREATE TRIGGER pages_fts_insert AFTER INSERT ON pages BEGIN
INSERT INTO pages_fts(rowid, title, compiled_truth, timeline)
VALUES (new.id, new.title, new.compiled_truth, new.timeline);
END;
CREATE TRIGGER pages_fts_update AFTER UPDATE ON pages BEGIN
INSERT INTO pages_fts(pages_fts, rowid, title, compiled_truth, timeline)
VALUES ('delete', old.id, old.title, old.compiled_truth, old.timeline);
INSERT INTO pages_fts(rowid, title, compiled_truth, timeline)
VALUES (new.id, new.title, new.compiled_truth, new.timeline);
END;
CREATE TRIGGER pages_fts_delete AFTER DELETE ON pages BEGIN
INSERT INTO pages_fts(pages_fts, rowid, title, compiled_truth, timeline)
VALUES ('delete', old.id, old.title, old.compiled_truth, old.timeline);
END;
-- ============================================================
-- content_chunks
-- ============================================================
CREATE TABLE content_chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
chunk_text TEXT NOT NULL,
chunk_source TEXT NOT NULL DEFAULT 'compiled_truth',
embedding BLOB, -- Float32Array as raw bytes
model TEXT NOT NULL DEFAULT 'text-embedding-3-large',
token_count INTEGER,
embedded_at TEXT,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_chunks_page ON content_chunks(page_id);
-- Vector search index created separately via sqlite-vss or vec0
-- See "Vector search options" section below
-- ============================================================
-- links
-- ============================================================
CREATE TABLE links (
id INTEGER PRIMARY KEY AUTOINCREMENT,
from_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
to_page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
link_type TEXT NOT NULL DEFAULT '',
context TEXT NOT NULL DEFAULT '',
created_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(from_page_id, to_page_id)
);
CREATE INDEX idx_links_from ON links(from_page_id);
CREATE INDEX idx_links_to ON links(to_page_id);
-- ============================================================
-- tags
-- ============================================================
CREATE TABLE tags (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
tag TEXT NOT NULL,
UNIQUE(page_id, tag)
);
CREATE INDEX idx_tags_tag ON tags(tag);
CREATE INDEX idx_tags_page_id ON tags(page_id);
-- ============================================================
-- raw_data
-- ============================================================
CREATE TABLE raw_data (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
source TEXT NOT NULL,
data TEXT NOT NULL, -- JSON string
fetched_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(page_id, source)
);
CREATE INDEX idx_raw_data_page ON raw_data(page_id);
-- ============================================================
-- timeline_entries
-- ============================================================
CREATE TABLE timeline_entries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
date TEXT NOT NULL, -- ISO date string
source TEXT NOT NULL DEFAULT '',
summary TEXT NOT NULL,
detail TEXT NOT NULL DEFAULT '',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_timeline_page ON timeline_entries(page_id);
CREATE INDEX idx_timeline_date ON timeline_entries(date);
-- ============================================================
-- page_versions
-- ============================================================
CREATE TABLE page_versions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
page_id INTEGER NOT NULL REFERENCES pages(id) ON DELETE CASCADE,
compiled_truth TEXT NOT NULL,
frontmatter TEXT NOT NULL DEFAULT '{}',
snapshot_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_versions_page ON page_versions(page_id);
-- ============================================================
-- ingest_log
-- ============================================================
CREATE TABLE ingest_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_type TEXT NOT NULL,
source_ref TEXT NOT NULL,
pages_updated TEXT NOT NULL DEFAULT '[]', -- JSON array
summary TEXT NOT NULL DEFAULT '',
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
-- ============================================================
-- config
-- ============================================================
CREATE TABLE config (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
);
INSERT INTO config (key, value) VALUES
('version', '1'),
('engine', 'sqlite'),
('embedding_model', 'text-embedding-3-large'),
('embedding_dimensions', '1536'),
('chunk_strategy', 'semantic');| Feature | Postgres | SQLite |
|---|---|---|
| Types | SERIAL, TIMESTAMPTZ, JSONB, vector(1536) |
INTEGER, TEXT, TEXT (JSON), BLOB |
| Full-text search | tsvector generated column + GIN |
FTS5 virtual table + triggers |
| Vector storage | vector(1536) column type |
BLOB (raw Float32Array bytes) |
| Vector index | HNSW via pgvector | Separate via sqlite-vss or vec0 |
| Fuzzy search | pg_trgm GIN index |
LIKE queries or Levenshtein UDF |
| JSON queries | JSONB + GIN index |
json_extract() function |
| Timestamps | TIMESTAMPTZ (native) |
TEXT with ISO format |
Two main choices for vector search in SQLite:
-- Load extension
.load ./vector0
.load ./vss0
-- Create virtual table linked to content_chunks
CREATE VIRTUAL TABLE chunks_vss USING vss0(
embedding(1536)
);
-- Insert embeddings (linked by rowid to content_chunks)
INSERT INTO chunks_vss(rowid, embedding)
SELECT id, embedding FROM content_chunks WHERE embedding IS NOT NULL;
-- Search
SELECT rowid, distance
FROM chunks_vss
WHERE vss_search(embedding, :query_embedding)
LIMIT 20;Pros: mature, well-documented, used by many projects. Cons: requires loading native extensions (platform-specific binaries).
-- Create virtual table
CREATE VIRTUAL TABLE chunks_vec USING vec0(
chunk_id INTEGER PRIMARY KEY,
embedding float[1536]
);
-- Search
SELECT chunk_id, distance
FROM chunks_vec
WHERE embedding MATCH :query_embedding
ORDER BY distance
LIMIT 20;Pros: simpler API, better integration with SQLite ecosystem. Cons: newer, less battle-tested.
For users who don't want to deal with vector extensions or OpenAI API keys, the brain still works with keyword search only. FTS5 + bm25 is genuinely good for structured wiki content where you know the terms. searchVector returns [], hybrid search degrades gracefully to keyword-only.
This is a valid configuration. Not everyone needs embeddings.
gbrain init --sqlite
# or: gbrain init --sqlite --path ~/brain.db
# 1. Create database file at specified path (default: ~/.gbrain/brain.db)
# 2. Run schema (all CREATE TABLE + FTS5 + triggers)
# 3. Write config to ~/.gbrain/config.json:
# { "engine": "sqlite", "database_path": "~/.gbrain/brain.db" }
# 4. Verify database is ready for import
# 5. "Brain ready. Run: gbrain import <your-repo>"No Supabase account needed. No API keys needed (keyword-only mode). No server. Just a file.
For vector search, the user additionally needs:
- OpenAI API key in
~/.gbrain/config.jsonorOPENAI_API_KEYenv var - sqlite-vss or vec0 extension binary for their platform
Postgres uses pg_trgm GIN index for fast fuzzy matching. SQLite doesn't have this. Options:
- LIKE with wildcards.
WHERE slug LIKE '%dont%scale%'. Simple, works for partial matches, but no ranking. - Levenshtein distance via UDF. Load a user-defined function (or implement in TS) that computes edit distance. Sort by distance. Slower but more accurate.
- Trigram simulation in TS. Compute trigrams in TypeScript, store in a separate table, query by trigram overlap. Fast but requires maintaining the trigram index.
Recommendation: start with LIKE + fallback to Levenshtein UDF. Good enough for single-user, <10K pages.
If you're building this, here's the order:
src/core/sqlite-engine.tsimplementingBrainEngine- Schema migration (the SQL above)
- CRUD operations (getPage, putPage, listPages, deletePage). Straightforward SQL.
- FTS5 keyword search (searchKeyword). Map
websearch_to_tsquerysemantics to FTS5 query syntax. - Tags, links, timeline, raw_data, versions, config, ingest_log. All straightforward.
- Graph traversal. SQLite supports recursive CTEs since 3.8.3. Port the Postgres CTE with max depth.
- Vector search (optional). Pick sqlite-vss or vec0, implement searchVector.
- Tests. Port the Postgres test suite. Most tests should be engine-agnostic.
Steps 1-6 are purely mechanical. Step 7 is the only one that requires a native extension.
{
"better-sqlite3": "^11.0.0"
}Or use Bun's built-in bun:sqlite driver (zero dependency).
For vector search, add one of:
sqlite-vss(native extension, platform-specific)vec0(native extension, platform-specific)
Most test cases should be engine-agnostic. The test runner should parameterize by engine:
const engines = [
{ name: 'postgres', factory: () => new PostgresEngine() },
{ name: 'sqlite', factory: () => new SQLiteEngine() },
];
for (const { name, factory } of engines) {
describe(`BrainEngine (${name})`, () => {
const engine = factory();
test('putPage + getPage round-trip', async () => {
await engine.putPage('test/slug', { title: 'Test', type: 'person', ... });
const page = await engine.getPage('test/slug');
expect(page.title).toBe('Test');
});
// ... all CRUD, search, link, tag, timeline tests
});
}Search tests may need engine-specific assertions (ranking differences between tsvector and FTS5 are expected). But the interface contract (returns SearchResult[], sorted by relevance) should hold across engines.
brain.db # ~750MB for 7K pages with embeddings
# ~150MB without embeddings (keyword-only)
~/.gbrain/config.json # { "engine": "sqlite", "database_path": "..." }
That's it. One file for the brain. One file for config.
Future work: gbrain migrate --from postgres --to sqlite (and vice versa). The engine interface makes this straightforward... export all data via one engine's methods, import via the other's. The data model is the same, only the storage format changes.
This is not built yet. For now, gbrain export to markdown and gbrain import into the other engine achieves the same result (with re-chunking and re-embedding).
If you want to build this:
- Fork the repo
- Create
src/core/sqlite-engine.ts - Use the schema from this document
- Run the existing test suite against your engine
- PR it
The interface is well-defined. The schema is documented. The test suite exists. This should be a few days of focused work with CC, or a weekend project for a human.
We'd love to see it.