Skip to content

brain-db-io/brain-db

brain — a cognitive substrate for AI agents

Brain

A cognitive database for AI agents — vector memory, a typed knowledge graph, and hybrid retrieval in one substrate. Cognitive operations (encode, recall, plan, reason, forget) are the primitives; entities, statements, and relations are the secondary layer that activates when a schema is declared.

Status: Specification complete — 32 sections covering both the substrate (§00–§16) and the knowledge layer (§17–§31). Implementation is phased: substrate phases 0–12 shipped; phases 13–14 (benchmarks + substrate acceptance) close the substrate; phases 15–24 deliver the knowledge layer. The v1.0.0 tag lands at the end of Phase 24, after the combined acceptance suite passes.

// 1. Substrate-only: encode an experience, recall what's relevant.
brain.encode("Had a difficult conversation with Alex about the project").await?;
let memories = brain.recall("conflicts with Alex").top_k(5).await?;
//   → ranked by semantic similarity, edge proximity, temporal recency,
//     and salience — not just vector distance.

// 2. Declare a schema; the knowledge layer comes alive.
brain.schema_upload(include_str!("acme.brain")).await?;
brain.encode("Priya kicked off the billing rewrite project today").await?;
//   → background extractors create: entity Priya, entity Project(billing rewrite),
//     statement Event(Priya, kicked_off, billing rewrite), relation Priya→Project.

// 3. Hybrid query: semantic + lexical + graph retrieval, RRF-fused.
let results = brain.query()
    .text("what's Priya working on?")
    .filter_subject_entity("priya")
    .top_k(10)
    .await?;

Table of contents


What Brain is

Brain is to AI agents what SQL is to applications: a substrate where the application says what it wants cognitively and the substrate handles how.

Today, building memory for an LLM agent typically looks like: glue a vector database for similarity, a graph database for relationships, a full-text store for keyword matching, an LLM extraction pipeline, an embedding service, a key-value store for state, a queue for async events, and write a thousand lines of orchestration to keep them in sync. Half of that orchestration is reinventing transaction semantics across systems that don't agree on what "committed" means.

Brain collapses that stack into one substrate:

  • A memory is the atomic record — text, embedding, salience, time, edges, provenance — stored once and queried many ways.
  • A schema (optional) declares the entity types, predicates, and relation types the user cares about. The substrate runs three tiers of extractors over incoming memories and surfaces typed entities, statements (Fact / Preference / Event), and relations.
  • Recall isn't top-k by cosine — it's similarity blended with recency, salience, edge proximity, and typed filters, with a hybrid retriever that fuses semantic + lexical + graph ranks via RRF.

Why this exists

Three observations drove the design:

  1. Agents need cognitive operations, not storage operations. "Remember this," "what's relevant?", "plan a path," "why do we think X?" map awkwardly onto INSERT / SELECT / JOIN. They map cleanly onto verbs that already carry the semantics: encode, recall, plan, reason, forget.

  2. The data has structure, but it's latent. An agent's stream of observations contains people, projects, decisions, preferences — but a vector database flattens that to similarity scores. A property graph captures the structure but doesn't help you find anything by meaning. You need both, with a query layer that can join them.

  3. Predictable tail latency is non-negotiable. Agents call the memory system on every turn. A p99 spike at 200 ms turns into perceived agent latency of seconds. Brain is built thread-per-core (Glommio + io_uring), with WAL group-commit, lock-free reads, and a single-writer-per-shard discipline — for tail-latency reasons, not throughput.

The two layers

Brain ships in two layers that share one shard, one storage system, and one wire protocol:

                           ┌───────────────────────────────────────────┐
                           │  KNOWLEDGE LAYER   (activates on schema)  │
                           │                                            │
   Layer 3:  STATEMENTS    │  Facts, Preferences, Events                │
                           │  (typed claims, with provenance + conf.)   │
                           │           ▲                                 │
                           │           │ derived from                    │
   Layer 2:  ENTITIES +    │           │                                 │
             RELATIONS     │  Canonical nouns, typed edges               │
                           │           ▲                                 │
                           │           │ references / anchored to        │
                           └───────────│─────────────────────────────────┘
                           ┌───────────│─────────────────────────────────┐
   Layer 1:  MEMORIES      │           │                                 │
             (substrate)   │  Raw episodic / semantic / consolidated     │
                           │  memories, embedded, indexed in HNSW        │
                           │                                              │
                           │           SUBSTRATE  (always active)         │
                           └───────────────────────────────────────────┘
  • Substrate-only mode (no schema declared): Brain runs as a pure vector memory store. The wire surface is the 8 cognitive primitives. Knowledge-layer tables exist on disk but are empty; knowledge-layer workers don't run. This is a first-class deployment posture, not a legacy mode.

  • Schema-declared mode: a SCHEMA_UPLOAD activates the knowledge layer. Background extractors (pattern → classifier → LLM) run over incoming memories. Entities, statements, and relations populate the knowledge-layer tables and indexes. RECALL transparently routes through the hybrid retriever; the typed knowledge SDK becomes useful.

A deployment can move in either direction. Declaring a schema after months of substrate-only use kicks off a backfill. Stopping schema operations returns the substrate to vector-only mode without losing knowledge data.

The cognitive primitives

Eight verbs at the substrate level (spec §09):

Verb What it does
ENCODE Store an experience. Brain embeds the text, picks a slot in the arena, writes the WAL record, updates redb metadata, inserts into HNSW, and (if a schema is declared) queues extractors.
RECALL Find memories relevant to a cue. Ranks by similarity blended with salience, recency, and edge proximity. With a schema declared, routes through the hybrid retriever.
PLAN Construct a path from one cognitive state to another. Pull-based executor with budgets (steps, wall time, branches).
REASON Multi-hop traversal explaining why X is connected to Y. Returns the path, the evidence memories, and the confidence.
FORGET Soft (mark as forgotten; grace period) or hard (zero the slot immediately) tombstoning. Cascades to derived knowledge-layer records when a schema is active.
LINK / UNLINK Manually assert / retract a typed edge between two memories.
SUBSCRIBE Stream events: memory created, statement created, extractor failed, schema updated, etc.
use brain_sdk::{Client, EdgeKind, ForgetMode, PlanState, PlanBudget};

let brain = Client::connect("127.0.0.1:7860", "mind-agent-1").await?;

// 1. Encode a memory; capture an edge to the previous turn.
let memory = brain
    .encode("Alex pushed the deadline to next Friday")
    .edge(EdgeKind::FollowedBy, prev_turn_id)
    .await?;
println!("stored as {}, salience {:.2}", memory.id, memory.salience);

// 2. Later: recall everything related to deadlines and Alex.
let mut results = brain.recall("when did Alex change the deadline?")
    .top_k(5)
    .include_edges(true)
    .stream()
    .await?;
while let Some(r) = results.next().await {
    println!("{:.2}  {}", r.confidence, r.text);
}

// 3. Plan from "current sprint" to "shipping the feature":
let mut plan = brain.plan(
        PlanState::ByText("current sprint state".into()),
        PlanState::ByText("feature shipped".into()),
    )
    .budget(PlanBudget { max_steps: 8, max_wall_time_ms: 1_000, max_branches_explored: 50 })
    .stream()
    .await?;
while let Some(step) = plan.next().await {
    println!("step {}: {}", step.step_index, step.text);
}

// 4. Soft-forget when something becomes private:
brain.forget(memory.id, ForgetMode::Soft).await?;

The knowledge layer

When a schema is declared, the substrate exposes typed cognition. Three layers, ten decisions, one set of guarantees.

The decisions (spec §17–§31):

  1. Property graph, not RDF. Operational knowledge graphs converged on property graphs; RDF reification is too expensive for fact-with-metadata workloads.
  2. Three statement kinds, shared storage. Fact / Preference / Event are distinct in the API (different mutation rules) but one table with a kind discriminator. Cross-kind queries work; per-kind queries are fast.
  3. Three retrievers + three filters + RRF fusion. Semantic (HNSW) + Lexical (tantivy BM25) + Graph (entity-joined) for retrieval; Type + Temporal + Confidence for filtering; Reciprocal Rank Fusion (k=60) for combining ranks.
  4. Three extractor tiers. Pattern (regex, free) → Classifier (small bundled model, cheap) → LLM (expensive, capable). Tiered fallback keeps cost manageable.
  5. Declarative schema DSL. Users declare entity types, relation types, predicates, and extractor bindings. The substrate picks the right extractor tier per declaration. Schema is versioned; data migrates.
  6. Valid time only (v1). valid_from / valid_to on statements and relations. Transaction-time "as of" queries are deferred.
  7. Confidence is first-class. Every derived record has confidence in [0,1] and an evidence list. Contradictions surface, don't auto-resolve.
  8. Schema is optional. No-schema runs as a pure vector substrate (real deployment posture, not legacy).
  9. Single node, single schema namespace. No federation, no multi-tenant schemas — one Brain, one knowledge graph.
  10. No silent failures. Schema-validated outputs only; ambiguous resolutions queue for review; contradictions surface; FORGET cascades audited.

The wire surface (spec §28) adds ~30 opcodes on top of the substrate's primitives:

Range Group Examples
0x20–0x2F Schema SCHEMA_UPLOAD, SCHEMA_GET, EXTRACTOR_LIST, EXTRACTOR_DISABLE
0x30–0x3F Entities ENTITY_CREATE, ENTITY_RESOLVE, ENTITY_MERGE, ENTITY_RENAME
0x40–0x4F Statements STATEMENT_CREATE, STATEMENT_SUPERSEDE, STATEMENT_HISTORY
0x50–0x5F Relations RELATION_CREATE, RELATION_TRAVERSE (1–3 hop), RELATION_LIST_FROM
0x60–0x6F Query QUERY (hybrid), QUERY_EXPLAIN, QUERY_TRACE, RECALL_HYBRID
0x70–0x7F Admin ADMIN_BACKFILL, ADMIN_LIST_PENDING_RESOLUTIONS, ADMIN_LIST_STALE_STATEMENTS

End-to-end (docs/development/usage/practical-guide.md walks through this in full):

ENCODE "Priya kicked off the billing rewrite today."
  │
  ▼  Substrate writes memory + embedding + WAL + HNSW.
  │
  ├─→ Pattern extractor (sync, regex):
  │     "Priya" → entity_resolver: tier 1 exact hit on existing entity_42
  │     "billing rewrite" → tier 2 fuzzy → create new entity_71 (Project)
  │
  ├─→ Classifier extractor (NER, near-foreground): confirms Person + Project tags.
  │
  ├─→ LLM extractor (async worker, cached): produces Event(Priya, kicked_off,
  │     billing rewrite, event_at=today, confidence=0.91, evidence=[memory_id]).
  │
  ▼
Statement and Relation written; tantivy + statement HNSW indexed.
Audit entries written. Subscribers notified.

LATER: QUERY "What's Priya working on right now?"
  │
  ▼  Query router classifies: known entity "Priya" → graph lane; known
  │  predicate-shape "working on" → type filter (Event/Fact).
  │
  ├─→ GraphRetriever: statements where subject=entity_42, status=current
  ├─→ SemanticRetriever (HNSW): statements similar to query text
  └─→ LexicalRetriever (tantivy BM25): statements with matching tokens
  │
  ▼  RRF fusion (k=60), confidence filter ≥ 0.5, temporal filter (last 30 days)
  │
  ▼
Result: top-N statements with provenance trail back to source memories.

End-to-end: what brain.encode("…") actually does

sequenceDiagram
    autonumber
    participant App as Agent app
    participant SDK as brain-sdk
    participant Conn as Connection layer<br/>(Tokio)
    participant Shard as Shard executor<br/>(Glommio)
    participant Embed as Embedding<br/>(BGE-small via candle)
    participant WAL as WAL<br/>(O_DIRECT + RWF_DSYNC)
    participant Arena as Arena<br/>(mmap'd 1600-byte slots)
    participant Meta as Metadata<br/>(redb)
    participant Index as ANN index<br/>(HNSW in RAM)
    participant Ext as Extractor pipeline<br/>(if schema declared)

    App->>SDK: encode("Alex said the deadline...")
    SDK->>SDK: build ENCODE_REQ frame<br/>(rkyv payload + 32B header + CRC)
    SDK->>Conn: TCP write (single frame, EOS)
    Conn->>Conn: validate frame, lookup shard<br/>by agent then shard_id
    Conn->>Shard: enqueue(EncodeRequest)
    Shard->>Embed: embed(text)
    Embed-->>Shard: f32 vector dim 384, L2-normalized
    Shard->>Shard: idempotency, dedup on RequestId
    Shard->>WAL: append(EncodeMemory record)
    WAL->>WAL: pwritev2(RWF_DSYNC)<br/>group commit
    WAL-->>Shard: Lsn (durable)
    Shard->>Arena: write slot(version, vector, metadata)<br/>plus slot CRC
    Shard->>Meta: redb txn,<br/>memory plus edges plus idempotency
    Meta-->>Shard: txn committed
    Shard->>Index: insert(memory_id, vector)
    Shard-->>Conn: ENCODE_RESP(memory_id, salience, was_dedup)
    Conn-->>SDK: TCP write
    SDK-->>App: Memory { id, salience, ... }
    Note over Shard,Ext: After response: pattern extractors run sync;<br/>classifier + LLM run as background workers.
    Shard-)Ext: dispatch (if schema declared)
Loading

The order matters. Step 5 (WAL fsync) is what makes the operation durable; the response is not sent until that fsync returns. Steps 6–9 happen after but before the response — if any of them fail, the WAL record is the source of truth and recovery replays it. This is the "WAL-before-acknowledge" invariant. Knowledge-layer extraction runs after the response is sent, so substrate-write latency is unaffected.

Architecture

System layers

flowchart TB
    SDK1[Agent A]
    SDK2[Agent B]
    SDK3[Agent C]

    SDK1 -->|TCP+TLS| Conn
    SDK2 -->|TCP+TLS| Conn
    SDK3 -->|TCP+TLS| Conn

    subgraph Server[Brain server · Linux]
        Conn[Connection layer<br/>Tokio · accept · TLS · frame validate · dispatch]

        subgraph S0[Shard 0 · Glommio · thread-per-core]
            S0Arena[Arena<br/>mmap]
            S0Wal[WAL]
            S0Meta[redb]
            S0Hnsw[HNSW · memory]
            S0Ehnsw[HNSW · entity]
            S0Shnsw[HNSW · statement]
            S0Tantivy[tantivy · memory + statement]
        end
        subgraph S1[Shard 1 · Glommio]
            S1All[Same per-shard set]
        end
        subgraph SN[Shard N · Glommio]
            SNAll[Same per-shard set]
        end

        Conn -->|message channel| S0
        Conn -->|message channel| S1
        Conn -->|message channel| SN

        Workers[Background workers<br/>decay · consolidation · HNSW maint · GC<br/>+ extractors · sweepers · migrations]
        Workers -.-> S0
        Workers -.-> S1
        Workers -.-> SN
    end
Loading

Two runtimes, one host:

  • Connection layer runs on Tokio. Many lightweight tasks: accept TCP, terminate TLS, decode the 32-byte frame header, validate, dispatch to the right shard via a bounded message channel. Tokio is great here — many tasks, varied shapes, async I/O.
  • Shard layer runs on Glommio. Thread-per-core, io_uring, single-task-per-shard for the writer. Each shard owns its files, its indexes, its caches. No cross-shard locks. Reads use ArcSwap + crossbeam-epoch for lock-free hot paths.

The two layers communicate via channels carrying messages (plain Send structs). Per-shard data never crosses the boundary; the connection task hands off an EncodeRequest, the shard hands back an EncodeResponse.

Sharding is by agent (AgentId → ShardId), so every operation for an agent goes to one shard. That makes the shard's discipline easy to reason about: one writer, no locks needed, no cross-shard coordination on the hot path.

Background workers keep the substrate healthy: salience decay over time, consolidation (multiple similar memories → one summary), HNSW link maintenance, slot reclamation past the tombstone grace window, idempotency-table TTL expiry, and so on. When a schema is declared, additional workers run: pattern + classifier + LLM extractors, the statement-embedding worker, the supersession sweeper, the backfill worker, the schema-migration runner. Workers run as their own Glommio tasks, scheduled around the per-shard writer with priorities (spec §11 + §27).

Inside a shard

flowchart LR
    Op[Op handler<br/>encode · recall · forget · query · ...]

    subgraph storage[Storage layer]
        WAL[WAL<br/>O_DIRECT + pwritev2<br/>+ RWF_DSYNC group commit]
        Arena[Arena<br/>mmap'd 1600 B slots<br/>per-slot CRC32C + version]
    end

    subgraph metadata[Metadata]
        Redb[redb B-tree<br/>memory · edges · context<br/>idempotency · tombstones<br/>+ entities · statements · relations<br/>+ predicates · audits]
        LLMCache[LLM cache redb<br/>extractor responses · TTL]
    end

    subgraph index[Indexes]
        Hnsw[HNSW · memory<br/>M=16 · ef=200/100<br/>cosine · ArcSwap-published]
        Ehnsw[HNSW · entity<br/>M=16 · ef=100/64]
        Shnsw[HNSW · statement<br/>M=32 · ef=200/128]
        Tant[tantivy · memory_text<br/>tantivy · statement_text<br/>BM25]
    end

    Op -->|1. fsync first| WAL
    WAL -->|2. then slot| Arena
    Op -->|3. metadata commit| Redb
    Op -->|4. index update| Hnsw

    Op -.->|ANN| Hnsw
    Op -.->|graph| Redb
    Op -.->|lexical| Tant
    Op -.->|entity res| Ehnsw
    Op -.->|stmt sem| Shnsw
    Op -.->|LLM extract| LLMCache
Loading

Six data structures per shard, each pinned to a spec section:

  • Arena (spec/05/02) — memory-mapped file of 1600-byte slots. Each slot is 1536 bytes of f32 vector (384 × 4) plus 64 bytes of metadata (kind, salience, timestamps, slot version, CRC). The file has a 4 KiB header recording the shard UUID, format version, slot count, embedding-model fingerprint, and a header CRC. Vectors are little-endian on disk.
  • WAL (spec/05/04..08) — append-only log of operations, one segment per ~64 MiB. Writes use O_DIRECT for predictable latency and pwritev2(RWF_DSYNC) for durable fsync; the WAL also batches concurrent writes via group commit so N pending records share one fsync. Recovery replays from the last checkpoint forward, tolerating torn-tail (the last record may be partial; we stop there). Knowledge-layer additions add frame types 0x10..0x50 for entity / statement / relation / schema / audit records (spec §26).
  • redb (spec/07 + spec/26) — embedded B-tree for metadata. Substrate tables: text bodies, edge lists, context names, idempotency dedupe, tombstones. Knowledge-layer tables: entities (+ aliases, trigrams, mentions), statements (+ chain, indexes by subject/predicate/object/event-time), relations (+ direction indexes), predicates, entity types, relation types, extractors, schema versions, audits, merge log. ACID transactions wrap multi-table writes from a single op.
  • HNSW (memory) (spec/06) — Hierarchical Navigable Small World index for ANN search over memory vectors. M=16, ef_construction=200, cosine distance over L2-normalized 384-dim vectors. Held in RAM; persisted incrementally; published to readers via ArcSwap.
  • HNSW (entity + statement) (spec §26) — smaller HNSW for entity embeddings (used by the entity resolver) and a separate HNSW for statement embeddings (used by the semantic retriever to find statements similar to a query).
  • tantivy (spec §26) — two per-shard BM25 indexes: one over memory text, one over statement text representations. Backs the lexical retriever in hybrid queries.

The discipline that makes this work without per-record locking:

  1. Single writer per shard. Only one task mutates shard state. Other Glommio tasks may read via ArcSwap-published snapshots (HNSW) or via the mmap (arena, redb). The writer never blocks on a lock because there's nobody to lock against.
  2. WAL-before-ack. Step 1 in the diagram: the WAL append + fsync happens before any other store is touched. If the process dies at step 2, the WAL record on restart drives the rest.
  3. CRC on every slot, every record. Two checksums: arena slot CRC32C catches in-flight slot corruption; WAL record CRC32C catches log corruption. Both halt the shard on mismatch — never overwrite a stored CRC.

The wire protocol

Brain ships a custom binary protocol over TCP (with optional TLS). The 32-byte frame header is fixed; the payload is rkyv-encoded structured data plus optional raw f32 vector blobs.

 Frame header (32 bytes, big-endian)
+--------+--------+--------+--------+
| magic = "BRN0"                     |
+--------+--------+--------+--------+
| ver(1) | op(1)  | flags  (2)      |
+--------+--------+--------+--------+
| header_crc32c (4)                  |
+--------+--------+--------+--------+
| stream_id (4)                      |
+--------+--------+--------+--------+
| payload_len (3) | reserved(1)      |
+--------+--------+--------+--------+
| payload_crc32c (4)                 |
+--------+--------+--------+--------+
| reserved (8)                       |
+--------+--------+--------+--------+

 Payload layout
+----------------------------------+
| rkyv-encoded body                 |  — request/response/event struct
+----------------------------------+
| padding (0–3 bytes)               |  — align next section to 4 bytes
+----------------------------------+
| raw f32 vectors (N × 1536 bytes)  |  — optional; bytemuck::cast_slice<u8, f32>
+----------------------------------+

The opcode space is laid out by group (spec §03/05 + §28):

0x00–0x0F   reserved
0x10–0x1F   substrate primitives (ENCODE, RECALL, PLAN, REASON, FORGET, LINK, UNLINK, TXN_*, SUBSCRIBE, …)
0x20–0x2F   schema operations
0x30–0x3F   entity operations
0x40–0x4F   statement operations
0x50–0x5F   relation operations
0x60–0x6F   query operations (hybrid retrieval, EXPLAIN, TRACE)
0x70–0x7F   admin operations (backfill, audit, pending resolutions)

Validation is layered (spec/03/11): frame-level (magic, version, CRC, length), payload-level (rkyv structural validation, vector norm checks), and operation-level (per-opcode field constraints).

Errors come back as a typed ERROR frame with a category (Protocol, Authentication, Validation, NotFound, Conflict, ResourceExhausted, Internal, Unavailable) and a stable code drawn from the §10 error table plus the knowledge-layer codes in §28. The category drives the SDK's retry policy.

The seven invariants

Non-negotiable rules. Code that violates them is wrong, regardless of test results.

# Invariant What it prevents
1 WAL-before-acknowledge. No operation returns success until its WAL record is fsynced. Lost writes after a crash.
2 Single writer per shard. No locks needed; the discipline enforces it. Lock contention on the hot path; two-writer races.
3 CRC everywhere. Every WAL record, every arena slot. Reads verify; mismatches halt. Silent corruption from bad disk / memory / cosmic ray.
4 Slot version on MemoryId. Encoded in the ID; stale references → NotFound. Reading the wrong memory after slot reuse.
5 Idempotency by RequestId. 24h TTL. Same params → cached response. Different params → Conflict. Duplicate effects on retry.
6 Tombstone grace before reclamation. Default 7 days. Hard FORGET zeroes immediately. Surprise: data still recoverable when soft-forgotten / data lingers when hard-forgotten.
7 No silent corruption. Fail-stop and alert. Never return wrong data. Trusting outputs that may be wrong; quietly papering over bit rot.

Tested per spec/16/06_durability_criteria.md. The random-kill recovery test exercises 1, 2, 3, 5, and 7 directly; the GC tests cover 4 and 6.

Latency targets

Hard targets from spec/16/02_latency_targets.md. Single-shard, 1M memories, mixed workload, 100 concurrent clients, reference hardware (16-core x86_64 / 64 GB RAM / NVMe SSD):

Operation p50 p95 p99 p99.9
ENCODE 8 ms 15 ms 25 ms 50 ms
RECALL (K=10, no text) 5 ms 12 ms 20 ms 40 ms
RECALL (K=10, with text) 7 ms 18 ms 30 ms 60 ms
PLAN (depth 3) 4 ms 10 ms 18 ms 35 ms
REASON (depth 3) 8 ms 20 ms 35 ms 70 ms
FORGET 3 ms 8 ms 15 ms 30 ms
LINK / UNLINK 2 ms 5 ms 10 ms 20 ms

These are MUST targets for v1.0. Brain optimizes for predictable tails, not minimum averages — a p50 of 5 ms with a p99 of 20 ms is preferred over a p50 of 2 ms with a p99 of 80 ms.

Tech stack

Component Crate Why
Async runtime (shards) glommio Thread-per-core, io_uring, no work-stealing — predictable per-core latency.
Async runtime (connection layer) tokio Many tasks, varied shapes, mature ecosystem.
Wire encoding rkyv + bytemuck Zero-copy structured deserialization (rkyv) + raw vector slice access (bytemuck::cast_slice).
Metadata store redb Pure-Rust ACID B-tree; embeddable; no external services.
ANN index hnsw_rs HNSW; battle-tested parameters (M=16, ef=200/100 for memory; tuned variants for entity / statement).
Lexical index tantivy Pure-Rust BM25 + tokenizer; backs the lexical retriever (phase 22).
Embedding candle family + tokenizers Pure-Rust inference; BGE-small-en-v1.5; substrate-owned.
SIMD math matrixmultiply + wide Cosine distance kernel; portable AVX2 / NEON fallbacks.
Lock-free swap arc-swap Cross-shard read snapshots without locking.
Epoch GC crossbeam-epoch Safe memory reclamation for lock-free reads.
CRC crc32c iSCSI Castagnoli polynomial; hardware-accelerated.
UUIDs uuid (v7) Time-ordered IDs; agents/contexts/requests/transactions, plus entities/statements/relations.
Errors thiserror (libs) + anyhow (bins) Typed errors at boundaries, ergonomic at the top.
Telemetry tracing + opentelemetry Spans, structured fields, OTel export.
HTTP transport hyper 1.x + tokio-tungstenite HTTP/1.1 + HTTP/2 + WebSocket + SSE for the admin / web surface.

Deps are pinned in the workspace Cargo.toml; new ones require commit-message justification.

Implementation status

The specification is complete — 32 sections, 17 substrate (§00–§16) + 15 knowledge layer (§17–§31). Implementation is phased.

Substrate (phases 0–14)

Phase Scope Status
0 Workspace skeleton, CI phase-0-complete
1 Wire protocol & core types phase-1-complete
2 Storage: arena + WAL + recovery phase-2-complete
3 Metadata + redb integration phase-3-complete
4 ANN index (HNSW) phase-4-complete
5 Embedding service phase-5-complete
6 Query planner phase-6-complete
7 Cognitive operations phase-7-complete
8 Background workers phase-8-complete
9 Server binary (Tokio + Glommio wiring) phase-9-complete
10 Rust SDK + admin CLI phase-10-complete
11 brain-http (HTTP/WS/SSE transport) phase-11-complete
12 Observability (metrics / logs / tracing / dashboards / alerts) phase-12-complete
13 Benchmarks & chaos planned
14 Substrate acceptance & v0.9.x-substrate-rc planned

Knowledge layer (phases 15–24)

Activates when a schema is declared. Substrate-only deployments are unaffected.

Phase Scope Status
15 Knowledge storage extensions (tables, WAL frames, indexes, flags) planned
16 Entity layer (resolver tiers 1–3, entity HNSW) planned
17 Statement layer (Fact / Preference / Event, supersession, contradictions) planned
18 Relation layer (cardinality, symmetry, 1–3 hop traversal) planned
19 Schema DSL (parser, validator, versioning, migration plan) planned
20 Pattern + classifier extractors (regex + bundled NER) planned
21 LLM extractor (cache, retry, cost budget, resolver tier 4) planned
22 Tantivy / lexical retrieval planned
23 Hybrid query engine (router, RRF fusion, filter chain, EXPLAIN/TRACE) planned
24 Sweepers, knowledge acceptance & v1.0.0 planned

See ROADMAP.md for the phase index and docs/development/phases/ for per-phase sub-task breakdowns. The dependency DAG for the knowledge-layer phases lives in docs/development/phases/README.md.

For a hands-on walkthrough of every feature in context (substrate primitives, schema declaration, extractors, hybrid query, FORGET cascade), see docs/development/usage/practical-guide.md.

Development environment

Linux x86_64 / aarch64, kernel ≥ 5.15. Brain depends on io_uring, O_DIRECT, pwritev2(RWF_DSYNC), and a few Linux-only madvise / fallocate flags — see spec/01_system_architecture/05_hardware.md §1.1 for why we chose a single-platform backend over portable shims.

What compiles where

Crate Linux macOS / Windows native
brain-core ✓ (pure value types)
brain-protocol ✓ (codec + DSL parser)
brain-cli ✓ (no runtime dep yet)
brain-sdk-rust ✓ (client-side)
brain-http ✓ (hyper-based; client side works)
brain-planner ✓ (pure logic)
brain-storage ✗ — compile_error!
brain-metadata ✗ (redb with O_DIRECT-aware paths)
brain-index ✗ (HNSW + tantivy persistence)
brain-embed ✗ (candle wiring)
brain-ops △ partial — wires runtime crates
brain-workers ✗ — runs on Glommio
brain-extractors ✗ (phase 20)
brain-llm ✗ (phase 21)
brain-server ✗ — Glommio + Tokio runtime

CI (.github/workflows/ci.yml) runs everything on ubuntu-latest and is the authoritative test gate.

Native Linux

rustup toolchain install stable
rustup component add rustfmt clippy
just verify        # fmt + build + clippy + test

macOS / Windows — dev container (recommended)

The repo ships a .devcontainer/ config. With Docker, OrbStack, or Colima running:

just shell        # builds the image on first run, drops you into bash
# inside the container:
just verify

Editor integration: VS Code, Cursor, JetBrains, and GitHub Codespaces auto-detect .devcontainer/devcontainer.json. Use "Reopen in Container."

Container runtime notes

Runtime macOS recommendation
OrbStack Recommended on Apple Silicon. Permissive seccomp; works with io_uring.
Docker Desktop Works, but 4.42+ needs a custom seccomp profile or --security-opt seccomp=unconfined to allow io_uring.
Colima Works similarly to Docker Desktop; uses Lima/QEMU.

Container caveats

  • O_DIRECT against bind-mounted source. macOS-hosted Docker mounts via VirtioFS / 9P; these may not support O_DIRECT. Storage tests should write under /tmp (tmpfs) or a named-volume path inside the container — not /workspaces/brain (the bind mount).
  • First build is slow (downloads ~100 crates). Persistent named volumes (brain-cargo-registry, brain-cargo-git, brain-target-cache) keep the cache across container restarts.

macOS / Windows — cross-compile-only

Validates compilation without running. Pair with the dev container (or CI) for actual tests.

rustup target add x86_64-unknown-linux-gnu
brew install lld   # or: cargo install --locked cargo-zigbuild

cargo check --workspace --target x86_64-unknown-linux-gnu

Common commands

just verify                                                       # full verify: fmt + build + clippy + test
cargo test -p brain-protocol                                      # per-crate (cross-platform crates)
cargo run --bin brain-server -- --config config/dev.toml          # Linux only
cargo run --bin brain-cli -- stats                                # cross-platform
cargo +nightly fuzz run protocol_frame -- -max_total_time=60      # nightly + Linux
just shell                                                        # enter the dev container (Docker required)

A step-by-step setup, CLI tour, SDK tour, and troubleshooting walkthrough lives under docs/development/usage/.

When something doesn't work

  • liburing-sys link error on macOS native: expected. Use the dev container.
  • compile_error! mentioning README.md: the friendly Linux gate; switch to the container.
  • io_uring_setup: Function not implemented: kernel too old or seccomp restricted; check host kernel and container runtime.
  • O_DIRECT returns EINVAL: the filesystem under the test path doesn't support direct I/O; use /tmp or a named volume.

Repository layout

brain/
├── README.md                     # This file
├── ROADMAP.md                    # Phase index (0–24)
├── .devcontainer/                # Linux dev container for non-Linux contributors
│   ├── Dockerfile
│   ├── devcontainer.json
│   └── post-create.sh
├── docs/                         # See docs/README.md for navigation
│   ├── README.md                 # Navigation hub (Diátaxis-shaped)
│   ├── tutorials/                # Learning-oriented (getting started)
│   ├── guides/                   # Task-oriented (install, configure, operate, upgrade, observability)
│   ├── reference/                # Info-oriented (perf targets, error codes)
│   ├── runbooks/                 # Operator procedures (RB-1 … RB-11)
│   └── development/              # Contributor-oriented
│       ├── usage/                # Build + run + debug + test workflow
│       ├── spec-deviations.md    # Recorded conscious deviations from the spec
│       └── phases/               # Per-phase plans (0–24); dev history
├── monitoring/                   # Deployment assets (Grafana dashboards + Alertmanager rules)
├── spec/                         # The 32-section specification (read-only)
│   ├── 00_master_overview/       # Substrate (§00–§16)
│   ├── 01_system_architecture/
│   ├── 02_data_model/
│   ├── 03_wire_protocol/
│   ├── 05_storage_arena_wal/
│   ├── …
│   ├── 16_benchmarks_acceptance/
│   ├── 17_knowledge_model/       # Knowledge layer (§17–§31)
│   ├── 18_entities/
│   ├── 19_statements/
│   ├── 20_relations/
│   ├── 21_schema_dsl/
│   ├── 22_extractors/
│   ├── 23_retrievers/
│   ├── 24_hybrid_query/
│   └── … (through 31_complete_acceptance/)
├── crates/
│   ├── brain-core/               # Shared value types (MemoryId, EntityId, StatementId, …)
│   ├── brain-protocol/           # Wire codec + schema DSL parser
│   ├── brain-storage/            # Arena + WAL + recovery (Linux only)
│   ├── brain-metadata/           # redb wrapper (substrate + knowledge tables)
│   ├── brain-index/              # HNSW (memory + entity + statement) + tantivy
│   ├── brain-embed/              # BGE inference
│   ├── brain-planner/            # Query planner + hybrid query router
│   ├── brain-ops/                # Cognitive operations
│   ├── brain-workers/            # Background workers
│   ├── brain-extractors/         # Pattern + classifier extractors (phase 20)
│   ├── brain-llm/                # LLM client + cache + budget (phase 21)
│   ├── brain-http/               # HTTP/WS/SSE transport
│   ├── brain-server/             # Server binary
│   ├── brain-sdk-rust/           # Rust SDK (substrate + typed knowledge SDK)
│   └── brain-cli/                # Admin CLI
├── benches/                      # criterion benches per crate
├── fuzz/                         # cargo-fuzz harnesses
├── tests/                        # cross-crate integration + chaos + soak
├── scripts/
│   └── acceptance/               # acceptance gate runners
├── config/                       # Example TOML configs
├── justfile                      # Convenience commands (`just verify`, `just shell`, …)
└── .github/workflows/ci.yml      # CI: build + test + clippy + fmt + miri + audit

License

Apache-2.0. Source code, spec, and documentation are all under the same license.

By submitting a pull request, you agree that your contribution is licensed under the Apache-2.0 terms (per Apache-2.0 §5 — "Submission of Contributions"). The Apache-2.0 patent grant applies; see LICENSE for details.

Repository: https://github.com/brain-db-io/brain-db

Packages

 
 
 

Contributors

Languages